HTTP-LoadGen

 view release on metacpan or  search on metacpan

README  view on Meta::CPAN


     # start main processing and wait for then to finish
     %result=%{HTTP::LoadGen::start_proc $handle};

     # thread management

     # create a collection of threads
     $sem=HTTP::LoadGen::ramp_up
         $procnr, $nproc, $start, $max, $duration, $handler;

     # wait for them to finish
     $sem->down;

     # idle a bit
     HTTP::LoadGen::delay $prefix, \%param;

     # get current thread number
     $nr=HTTP::LoadGen::threadnr;

     # get the configuration hash
     $config=HTTP::LoadGen::options;

     # get/set thread-specific user data
     $data=HTTP::LoadGen::userdata;
     HTTP::LoadGen::userdata=$data;

     # get/set thread specific random number generator
     $rng=HTTP::LoadGen::rng;
     HTTP::LoadGen::rng=$rng;

     # next random number
     $random=HTTP::LoadGen::rnd $max;

INSTALLATION
     perl Makefile.PL
     make
     make test
     make install

DEPENDENCIES
    *   perl 5.8.8

    *   IPC::ScoreBoard

    *   Coro

    *   AnyEvent

    *   Async::Interrupt

    *   Net::SSLeay

DESCRIPTION
    This module implements a multi-process and multi-thread load generator
    for HTTP. It uses Coro threads. So, in reality it does not use threads
    but event-based IO.

  Features
    *   limited support for SSL connections

    *   keep-alive connections

    *   configurable delay before and after each request

    *   run a list of URLs many times

    *   compute next URL based on the current request

    *   DNS cache can be preinitialized

    *   slow ramp up

    *   request bodies

    *   custom request headers

  Overview
    Note, this POD is best view via Apache2::PodBrowser.

   Parallelism
    The load generator follows a 2-level supervisor-worker pattern. The
    central function, "loadgen", creates a certain number of child
    processes. Each child process then creates in a slow ramp up phase
    worker threads up to a configurable total upper thread limit.

    The thread limit is configured independent on the number of worker
    processes. You configure a number of processes that is about 1.5-5 times
    the number of available CPUs. The number of threads can then be say 50
    or 500 or even 5996 or so. Processes and threads are numbered starting
    from 0.

    So, assuming there are 3 processes and 10 threads configured the
    following table shows how the threads are spread among the processes:

     Process | Threads
     --------+------------
           0 | 0 3 6 9
           1 | 1 4 7
           2 | 2 5 8

    Process 0 will run 4 threads, the other 2 processes 3 threads each. The
    number of threads per process can be calculated as:

     $TotalThreadCount / $NProc + ($ProcNr < $TotalThreadCount % $NProc)

    where $NProc is the number of processes used, $ProcNr the number of the
    current process and $TotalThreadCount the system-wide thread number.
    $ProcNr ranges from 0 to "$NProc - 1".

    At the beginning ot the ramp-up phase each process starts up a certain
    number of threads (maybe 0) to reach the configured start-up thread
    number. The configured ramp-up duration then determines in which
    intervals new threads are added. So assuming the threads run long enough
    you start up with a certain level of parallelism which increases
    linearly over a certain time interval up to the configured maximum.

   The Scoreboard
    The multi-process model of "HTTP::LoadGen" means that each process knows
    only about its own threads. Sometimes you may want to log for example
    the overall number of active requests when a new request is started. Or
    you may want to increment a shared variable for each request to see the

README  view on Meta::CPAN


    For the lack of a better place the $rc element is described here.

    RC_STATUS (0)
        the HTTP status code. If the request failed because the connection
        couldn't be established a code 599 is set here. "RC_STATUSLINE"
        describes the problem in more detail in that case.

    RC_STATUSLINE (1)
        the HTTP status message. If the server responds with the following
        first line for example:

         HTTP/1.1 501 Method Not Implemented

        "RC_STATUS" is 501 while "RC_STATUSLINE" is "Method Not
        Implemented".

    RC_HTTPVERSION (2)
        the server HTTP protocol version. Normally 1.1 or 1.0.

    RC_STARTTIME (3)
        when the request has been started, fractional number.

    RC_CONNTIME (4)
        when the connection has been established, fractional number.

    RC_FIRSTTIME (5)
        when the first line of output has been received, fractional number.

    RC_HEADERTIME (6)
        when the response HTTP header has been completely received,
        fractional number.

    RC_BODYTIME (7)
        when the response body has been completely received, fractional
        number.

    RC_HEADERS (8)
        a hash containing the response HTTP headers. The values of this hash
        are arrays since HTTP header fields can be given multiple times.
        Keys (header names) are converted to lower case.

        Example:

         {
          'content-type' => ['text/html; charset=iso-8859-1'],
          'connection' => ['close'],
          'date' => ['Sun, 04 Jul 2010 18:21:12 GMT'],
          'content-length' => ['217'],
          'allow' => ['GET,HEAD,POST,OPTIONS,TRACE'],
          'server' => ['Apache'],
         }

    RC_BODY (9)
        the response body

    RC_DNSCACHED (10)
        boolean: has the DNS cache lookup resulted in a hit (1) or miss (0)?

    RC_CONNCACHED (11)
        boolean: has the has a kept-alive connection been used?

   The %data hash
    So, what can be specified in %data? Note, all keys here are case
    sensitive.

    NWorker (optional)
        specifies the number of worker processes to be used. Default is 1.

    RampUpStart (optional)
        the number of threads to started up immediately (after the
        "ProcInit" phase is over). Default is 1 thread per worker process,
        that is "NWorker".

    RampUpMax (optional)
        the number of threads that have to be started up after the ramp-up
        phase is over. That means all processes together will start this
        number of threads. If a thread finishes before the ramp-up phase is
        over this maximum level of parallelism will never be reached.

        Default is the same as "RampUpStart".

    RampUpDuration (optional)
        the duration of the ramp-up phase in seconds (may be fraction).

        Default is 300 (5 minutes).

    ParentInit (optional)
        the "ParentInit" handler called as

         $data->{ParentInit}->();

        One thing to do here is to create a scoreboard for interprocess
        communication, see HTTP::LoadGen::ScoreBoard or IPC::ScoreBoard.

        Example:

         ParentInit=>sub {
           # no parameters

           # create scoreboard
           # options() returns the config hash itself. The NWorker parameter
           # is known. SbSlotsz and SbExtra are new. This is to demonstrate
           # that the hook routines can access the configuration and evaluate
           # and even add custom parameters.
           HTTP::LoadGen::ScoreBoard::init_once
               @{HTTP::LoadGen::options()}{qw/NWorker SbSlotsz SbExtra/};
         }

    ParentExit (optional)
        the "ParentExit" handler called as

         $data->{ParentExit}->();

        If a scoreboard is used remember to disconnect.

        Example:

         ParentExit=>sub {
           # no parameters
           undef HTTP::LoadGen::ScoreBoard::scoreboard;

README  view on Meta::CPAN

                     HTTP::LoadGen::ScoreBoard::header_count,
                     HTTP::LoadGen::ScoreBoard::header_bytes,
                     HTTP::LoadGen::ScoreBoard::body_bytes,
                     $rc->[RC_STARTTIME],
                     $rc->[RC_CONNTIME]-$rc->[RC_STARTTIME],
                     $rc->[RC_FIRSTTIME]-$rc->[RC_STARTTIME],
                     $rc->[RC_HEADERTIME]-$rc->[RC_STARTTIME],
                     $rc->[RC_BODYTIME]-$rc->[RC_STARTTIME],
                     $rc->[RC_STATUS], $rc->[RC_STATUSLINE],
                     length($rc->[RC_BODY]),
                     sprintf('%s(%s://%s:%s%s)',
                             @{$rq}[RQ_METHOD, RQ_SCHEME, RQ_HOST, RQ_PORT, RQ_URI]));
         }

    times (optional)
        the number of times the URL iterator is charged. That many times the
        URL list is fetched.

        If omitted or "<=0" the test runs forever.

    dnscache (optional)
        "loadgen" caches DNS query results. One can prevent DNS queries
        completely in 2 ways. One of them is to provide a hash here that
        maps names to IP addresses. The other is to have the URL iterator
        generate IP addresses instead of host names and optionally "Host"
        request header fields.

        Another use of this item is to cheat host name resolution. One can
        for example test a newly installed or development server while the
        real server continues to work unaffected.

        Example:

         dnscache=>{
                    'foertsch.name'=>'127.0.0.1',
                   },

    InitURLs (either InitURLs or URLList or both must be present)
        "InitURLs" initializes the URL iterator. It may be a string
        describing one of the predefined iterators or a "CODE" reference.

        In the latter case it is called without parameters as

         $it=$data->{InitURLs}->();

        It is expected to return a function that when called as

         $new_rq=$it->($rc, $rq);

        returns the next request item or "undef" when it runs out of items.
        The parameters $rc and $rq describe the previous request ($rq) and
        its result ($rc).

        For a description of the $rq and $new_rq format see URLList below.

        Example:

         InitURLs=>sub {
           my $url=[qw!GET http foertsch.name 80 /-redir!,
                    {
                     keepalive=>KEEPALIVE,
                     headers=>[
                               'X-auth'=>1, # necessary to trigger 401 for that URL
                              ],            # it also shows a custom request header
                    }];
           return sub {
             my ($rc, $rq)=@_;
             if( $rc->[RC_STATUS]==401 ) {
               # redo with Authorization header
               push @{$rq->[RQ_PARAM]->{headers}}, Authorization=>'Basic YmxhOmJsdWI=';
               return $rq;
             }
             my $new_rq=$url;
             undef $url;                # next time return undef (out-of-requests)
             return $new_rq;
           };
         }

        The iterator generator initializes the variable $url and then
        returns a closure. Hence, $url is a static variable with respect to
        the returned iterator.

        The iterator itself checks the HTTP code of the previous request. In
        case of a 401 (Authorization Required) it adds an "Authorization"
        header to the request header list and retries the operation.

        If the previous operation has ended with an other HTTP code it
        copies $url to an auxiliary variable, undefines it and returns the
        auxiliary variable. Thus, only the first time the iterator is called
        it returns $url. After that it is always "undef" which signals
        *Out-of-Requests*.

        If "InitURLs" is a string it is the name of a predefined iterator
        generator.

        Example:

         InitURLs=>'follow'

        There are currently 4 such generators. All of them expect an
        "URLList" (see below) to be provided.

        default
            simply walks the "URLList" from start to end.

            This one is also used if "InitURLs" is omitted.

        random_start
            similar to "default" but starts at a random offset in "URLList".
            At the end of the list it continues at the beginning until all
            "URLList" elements are done once.

        follow
            similar to "default" but if a request results in a "3xx" HTTP
            code and a "Location" header is provided by the server it tries
            to follow it recursively.

            If the request starting a series of redirections contains a
            "postdelay" statement (see below) the delay is postponed until
            after the last request of the series. Subsequent requests are
            issues without delay.

README  view on Meta::CPAN

            Infact, the built-in "random_start_follow" iterator is
            implemented for example as

             register_iterator random_start_follow=>sub {
                @_=get_iterator('random_start')->();
                goto &{get_iterator 'follow'};
             };

            To turn your own iterator into a following you could write:

             InitURLs=>sub {
               return get_iterator->('follow')->($my_own_iterator);
             }

            where $my_own_iterator is an iterator function.

        random_start_follow
            a combination of the 2 above.

        You can register your own named iterators by calling
        register_iterator below.

    URLList (either InitURLs or URLList or both must be present)
        See also InitURLs above.

        An "URLList" is an array of arrays. Each of these sub-arrays
        describes one request. If consists of 6 elements:

         [$method, $scheme, $host, $port, $uri, $param]

        $method is the HTTP request method, e.g. "GET", "POST", ...

        $scheme is either "http" or "https".

        $host is the hostname or IP address of the server, e.g.
        "foertsch.name" or 109.73.51.50.

        $port is the server port to connect. Usually port 80 is used for
        "http" and port 443 for "https".

        $uri is the request URI normally starting with a slash ("/"), e.g.
        "/impressum.html".

        $param is a hash with further options.

        To access the elements of a request description HTTP::LoadGen::Run
        exports a few constants. They may be used to increase readability.

         RQ_METHOD == 0
         RQ_SCHEME == 1
         RQ_HOST   == 2
         RQ_PORT   == 3
         RQ_URI    == 4
         RQ_PARAM  == 5

        Example:

         URLList=>[
                   [qw!GET http 109.73.51.50 80 /-redir!,
                    {
                     keepalive=>KEEPALIVE,
                     headers=>[
                               Authorization=>'Basic YmxhOmJsdWI=',
                               Host=>'foertsch.name',
                              ],
                    }],
                   [qw!HUGO https www.kabatinte.net 443 /!,
                    {
                     keepalive=>KEEPALIVE,
                     predelay=>0.5,
                     prejitter=>1,
                     postdelay=>3,
                     postjitter=>1.5,
                     body=>'blablub',
                    }]
                  ]

        This "URLList" contains 2 requests, one for a server with the IP
        address 109.73.51.50 and one for the host "www.kabatinte.net".

        The first one will send the following HTTP request to the server (IP
        109.73.51.50, port 80):

         GET /-redir HTTP/1.1
         Authorization: Basic YmxhOmJsdWI=
         Host: foertsch.name

        If you need more header fields, "User-Agent" for example, add them
        to the "headers" array of the options hash.

        The second request is converted into the following HTTP message sent
        over SSL to "84.38.75.176:443" assuming that "www.kabatinte.net"
        resolves to 84.38.75.176:

         HUGO / HTTP/1.1
         Host: www.kabatinte.net
         Content-Length: 7

         blablub

        Although no "Host" header is specified in the request element one is
        sent. If the request element does not contain a "Host" header one is
        added automatically based on $host and $port.

        You may also notice the "Content-Length" header. It is sent because
        a request body is specified (the "body" item in $param).

        So, what can be specified in the $param part?

        keepalive
            HTTP::LoadGen::Run exports 3 constants to be used as values.
            "KEEPALIVE_USE" permits to use a previously kept alive
            connection. "KEEPALIVE_STORE" allows to keep the connection
            alive after the request. "KEEPALIVE" combines both of the above.

            If you hate readability you can also use the numerical values:

             KEEPALIVE_USE==1
             KEEPALIVE_STORE==2
             KEEPALIVE==3

        predelay and prejitter
            These statements define a period to wait before sending the
            request. The wait is done after the request description has been
            pulled off the iterator but before the "ReqStart" handler is
            run.

            Both numbers can be fractions. Read them as

             predelay ± prejitter

            The actual waiting time is calculated as

             interval = predelay - prejitter + rand( 2 * prejitter )

            If "prejitter >= predelay" interval can become negative. In this
            cases you won't jump back in time but simply not wait.

            To achieve repeatable results a thread-specific random number
            generator must be used. See the "rng" function below.

        postdelay and postjitter
            The same as "predelay" but waiting occurs after the request is
            done or more precisely after the "ReqDone" handler returns.

        headers
            an array (not a hash!) of header fields to be appended to the
            HTTP request.

        body
            a request body

        conn_timeout
            here you can specify the return value of the prepare-callback
            function passed to "AnyEvent::Socket::tcp_connect" when
            establishing a connection.

            See AnyEvent::Socket for more information.

        timeout
            the "timeout" parameter used when a connection is converted into
            a AnyEvent::Handle object.

            See AnyEvent::Handle for more information.

        tls_ctx
            the "tls_ctx" parameter used when a connection is converted into
            a AnyEvent::Handle object.

            See AnyEvent::Handle for more information.

            By now AnyEvent::Handle supports SSL features like client
            certificates and server certificate verification. However, some
            things are still missing like SSL session caching. How about

README  view on Meta::CPAN


         # thread accounting
         thread_start;

         # set a thread specific RNG
         rng=Math::Random::MT->new(threadnr);

         return [];                 # initializes thread specific user data
       },
       ThreadExit=>sub {
         # no parameters
         thread_done;
       },

       ReqStart=>sub {
         my ($el)=@_;

         # request accounting
         req_start;

         # started - succeeded - failed = currently pending number of requests
         @{userdata()}=(thread_count, req_started-req_success-req_failed);
       },

       ReqDone=>sub {
         my ($rc, $el)=@_;

         # request accounting: HTTP status 2xx and 3xx are successful
         #                     other requests are counted as failures.
         req_done +($rc->[RC_STATUS]=~/^[23]/), $rc->[RC_HEADERS], $rc->[RC_BODY];

         $logger->(threadnr,
                   @{$rc}[RC_DNSCACHED, RC_CONNCACHED],
                   @{userdata()},
                   req_success,
                   req_failed,
                   $rc->[RC_STARTTIME],
                   $rc->[RC_CONNTIME]-$rc->[RC_STARTTIME],
                   $rc->[RC_FIRSTTIME]-$rc->[RC_STARTTIME],
                   $rc->[RC_HEADERTIME]-$rc->[RC_STARTTIME],
                   $rc->[RC_BODYTIME]-$rc->[RC_STARTTIME],
                   $rc->[RC_STATUS],
                   length($rc->[RC_BODY]),
                   @{$el}[RQ_METHOD, RQ_SCHEME, RQ_HOST, RQ_PORT, RQ_URI],
                   $rc->[RC_STATUSLINE]);
       },

       dnscache=>{
                  localhost=>'127.0.0.1',
                  'kabatinte.net'=>'84.38.75.176',
                  'www.kabatinte.net'=>'84.38.75.176',
                  'foertsch.name'=>'109.73.51.50',
                 },

       times=>3,                    # run the URL list 3 times

       InitURLs=>'random_start',

       URLList=>do {
         my $o={
                keepalive=>KEEPALIVE,
                qw!predelay 0.05 prejitter 0.1 postdelay 0.5 postjitter 1!,
               };
         [[qw!GET http foertsch.name 80 /-redir!, $o],
          [qw!HUGO https www.kabatinte.net 443 /!, $o]
         ];
       },
      }

SEE ALSO
    *   HTTP::LoadGen::Run

    *   HTTP::LoadGen::ScoreBoard

    *   HTTP::LoadGen::Logger

    *   loadgen

AUTHOR
    Torsten Förtsch, <torsten.foertsch@gmx.net>

COPYRIGHT AND LICENSE
    Copyright (C) 2010 by Torsten Förtsch

    This library is free software; you can redistribute it and/or modify it
    under the same terms as Perl itself, either Perl version 5.10.0 or, at
    your option, any later version of Perl 5 you may have available.



( run in 0.981 second using v1.01-cache-2.11-cpan-5511b514fd6 )