HoneyClient-Agent

 view release on metacpan or  search on metacpan

lib/HoneyClient/Agent/Driver/Browser.pm  view on Meta::CPAN

  use HoneyClient::Agent::Driver::Browser;

  # Library used exclusively for debugging complex objects.
  use Data::Dumper;

  # Create a new Browser object, initialized with a collection
  # of URLs to visit.
  my $browser = HoneyClient::Agent::Driver::Browser->new(
      links_to_visit => {
          'http://www.google.com'  => 1,
          'http://www.cnn.com'     => 1,
      },
  );

  # If you want to see what type of "state information" is physically
  # inside $browser, try this command at any time.
  print Dumper($browser);

  # Continue to "drive" the driver, until it is finished.
  while (!$browser->isFinished()) {

      # Before we drive the application to a new set of resources,
      # find out where we will be going within the application, first.
      print "About to contact the following resources:\n";
      print Dumper($browser->next());

      # Now, drive browser for one iteration.
      $browser->drive();

      # Get the driver's progress.
      print "Status:\n";
      print Dumper($browser->status());

  }

  # At this stage, the driver has exhausted its collection of links
  # to visit.  Let's say we want to add the URL "http://www.mitre.org"
  # to the driver's list.
  $browser->{links_to_visit}->{'http://www.mitre.org'} = 1;

  # Now, drive the browser for one iteration.
  $browser->drive();

  # Or, we can specify the URL as an argument.
  $browser->drive(url => "http://www.mitre.org");

=head1 DESCRIPTION

This library allows the Agent module to drive an instance of any browser,
running inside the HoneyClient VM.  The purpose of this module is to
programmatically navigate the browser to different websites, in order to
become purposefully infected with new malware.

This module is object-oriented in design, retaining all state information
within itself for easy access.  A specific browser implementation, such as
'IE' or 'FF', must inherit from this package.

Fundamentally, the Browser driver is initialized with a set of absolute URLs
for the browser to drive to.  Upon visiting each URL, the driver collects
any B<new> links found and will attempt to drive the browser to each
valid URL upon subsequent iterations of work.

For each top-level URL given, the driver will attempt to process all
corresponding links that are hosted on the same server, in order to
simulate a complete 'spider' of each server.  

URLs are added and removed from hashtables, as keys.  For each URL, a
calculated "priority" (a positive integer) of the URL is assigned the
value.  When the Browser is ready to go to a new link, it will always go
to the next link that has the highest priority.  If two URLs have the same
priority, then the Browser will chose among those two at random.

Furthermore, the Browser driver will try to visit all links shared by a
common server in order before moving on to drive to other,
external links in an ordered fashion.  B<However>, the Browser may end
up re-visiting old sites, if new links were found that the
Browser has not visited yet. 

As the Browser driver navigates the browser to each link, it
maintains a set of hashtables that record when valid links were
visited (see L<links_visited>); when invalid links were found
(see L<links_ignored>); and when the browser attempted to visit
a link but the operation timed out (see L<links_timed_out>).
By maintaining this internal history, the driver will B<never>
navigate the browser to the same link twice.

Lastly, it is highly recommended that for each driver B<$object>,
one should call $object->isFinished() prior to making a subsequent
call to $object->drive(), in order to verify that the driver has
not exhausted its set of links to visit.  Otherwise, if
$object->drive() is called with an empty set of links to visit,
the corresponding operation will B<croak>.

=cut

package HoneyClient::Agent::Driver::Browser;

# XXX: Disabled version check, Honeywall does not have Perl v5.8 installed.
#use 5.008006;
use strict;
use warnings;
use Config;
use Carp ();

# Traps signals, allowing END: blocks to perform cleanup.
use sigtrap qw(die untrapped normal-signals error-signals);

#######################################################################
# Module Initialization                                               #
#######################################################################

BEGIN {
    # Defines which functions can be called externally.
    require Exporter;
    our (@ISA, @EXPORT, @EXPORT_OK, %EXPORT_TAGS, $VERSION);

    # Set our package version.
    $VERSION = 0.98;

    # Define inherited modules.
    use HoneyClient::Agent::Driver;

lib/HoneyClient/Agent/Driver/Browser.pm  view on Meta::CPAN


Indicates if the Browser driver B<$object> has driven the browser
process to all possible links it has found within its hashtables
and is unable to navigate the browser further without additional, external
input.

I<Output>: True if the Browser driver B<$object> is finished, false otherwise.

B<Note>: Additional links can be fed to this Browser driver at any time, by
simply adding new hashtable entries to the B<links_to_visit> hashtable
within the B<$object>.

For example, if you wanted to add the URL "http://www.mitre.org"
to the Browser driver B<$object>, simply use the following code:

  $object->{links_to_visit}->{'http://www.mitre.org'} = 1;

=back

=begin testing

# XXX: Test this.
1;

=end testing

=cut

sub isFinished {

    # Get the object state.
    my $self = shift;

    # Sanity check: Make sure we've been fed an object.
    unless (ref($self)) {
        $LOG->error("Error: Function must be called in reference to a " .
                    __PACKAGE__ . "->new() object!");
        Carp::croak "Error: Function must be called in reference to a " .
                    __PACKAGE__ . "->new() object!\n";
    }

    # Return whether or not all '*_to_visit' variables/hashtables are
    # empty.
    return (!(defined($self->next_link_to_visit) or
              scalar(%{$self->relative_links_to_visit}) or
              scalar(%{$self->links_to_visit})))

}

=pod

=head2 $object->status()

=over 4

Returns the current status of the Browser driver B<$object>, as it's state
exists, between subsequent calls to $object->driver().

Specifically, the data returned is a reference to a hashtable,
containing specific statistical information about the status
of the Browser driver's progress, between iterations of driving the
browser process.

The following is an example hashtable, containing all the
(key => value) pairs that would exist in the output.

  $hashref = {
      'relative_links_remaining' =>       10, # Number of URLs left to
                                              # process, at a given site.
      'links_remaining'          =>       56, # Number of URLs left to
                                              # process, for all sites.
      'links_processed'          =>       44, # Number of URLs processed.
      'links_total'              =>      100, # Total number of URLs given.
      'percent_complete'         => '44.00%', # Percent complete,
                                              #  (processed / total).
  };

I<Output>: A corresponding B<$hashref>, containing statistical information
about the Browser driver's progress, as previously mentioned.

# XXX: Resolve this, per parent Driver description.

=back

=begin testing

# XXX: Test this.
1;

=end testing

=cut

sub status {

    # Get the object state.
    my $self = shift;

    # Sanity check: Make sure we've been fed an object.
    unless (ref($self)) {
        $LOG->error("Error: Function must be called in reference to a " .
                    __PACKAGE__ . "->new() object!");
        Carp::croak "Error: Function must be called in reference to a " .
                    __PACKAGE__ . "->new() object!\n";
    }

    # Construct a new status hashtable.
    my $status = { };

    # Set the total number of links processed.
    $status->{links_processed} = scalar(keys(%{$self->links_visited})) +
                                 scalar(keys(%{$self->links_timed_out})) +
                                 scalar(keys(%{$self->links_ignored}));

    # Figure out if the next_link_to_visit is set.
    my $next_link_is_set = 0;
    if (defined($self->next_link_to_visit)) {
        $next_link_is_set = 1;
    }

    # Set the number of relative links to process.



( run in 1.204 second using v1.01-cache-2.11-cpan-71847e10f99 )