HTTP-LoadGen
view release on metacpan or search on metacpan
lib/HTTP/LoadGen.pod view on Meta::CPAN
######################
# the load generator #
######################
HTTP::LoadGen::loadgen \%config;
#######################
# auxiliary functions #
#######################
# process management
# create a collection of worker processes
$handle=HTTP::LoadGen::create_proc $nproc, $inithnd, $handler, $exithnd;
# start main processing and wait for then to finish
%result=%{HTTP::LoadGen::start_proc $handle};
# thread management
# create a collection of threads
$sem=HTTP::LoadGen::ramp_up
$procnr, $nproc, $start, $max, $duration, $handler;
# wait for them to finish
$sem->down;
# idle a bit
HTTP::LoadGen::delay $prefix, \%param;
# get current thread number
$nr=HTTP::LoadGen::threadnr;
# get the configuration hash
$config=HTTP::LoadGen::options;
# get/set thread-specific user data
$data=HTTP::LoadGen::userdata;
HTTP::LoadGen::userdata=$data;
# get/set thread specific random number generator
$rng=HTTP::LoadGen::rng;
HTTP::LoadGen::rng=$rng;
# next random number
$random=HTTP::LoadGen::rnd $max;
=head1 DESCRIPTION
This module implements a multi-process and multi-thread load generator
for HTTP. It uses L<Coro> threads. So, in reality it does not
use threads but event-based IO.
=head2 Features
=over 4
=item * limited support for SSL connections
=item * keep-alive connections
=item * configurable delay before and after each request
=item * run a list of URLs many times
=item * compute next URL based on the current request
=item * DNS cache can be preinitialized
=item * slow ramp up
=item * request bodies
=item * custom request headers
=back
=head2 Overview
Note, this POD is best view via L<Apache2::PodBrowser>.
=head3 Parallelism
The load generator follows a 2-level supervisor-worker pattern. The central
function, C<loadgen>, creates a certain number of child processes. Each
child process then creates in a slow ramp up phase worker threads up to
a configurable total upper thread limit.
The thread limit is configured independent on the number of worker processes.
You configure a number of processes that is about 1.5-5 times the number
of available CPUs. The number of threads can then be say 50 or 500 or even
5996 or so. Processes and threads are numbered starting from 0.
So, assuming there are 3 processes and 10 threads configured the following
table shows how the threads are spread among the processes:
Process | Threads
--------+------------
0 | 0 3 6 9
1 | 1 4 7
2 | 2 5 8
Process 0 will run 4 threads, the other 2 processes 3 threads each.
The number of threads per process can be calculated as:
$TotalThreadCount / $NProc + ($ProcNr < $TotalThreadCount % $NProc)
where C<$NProc> is the number of processes used, C<$ProcNr> the number
of the current process and C<$TotalThreadCount> the system-wide thread
number. C<$ProcNr> ranges from C<0> to C<$NProc - 1>.
At the beginning ot the ramp-up phase each process starts up a certain number
of threads (maybe C<0>) to reach the configured start-up thread number.
The configured ramp-up duration then determines in which intervals new
threads are added. So assuming the threads run long enough you start up
with a certain level of parallelism which increases linearly over a
certain time interval up to the configured maximum.
=head3 The Scoreboard
lib/HTTP/LoadGen.pod view on Meta::CPAN
=item RC_STATUSLINE (1)
the HTTP status message. If the server responds with the following first line
for example:
HTTP/1.1 501 Method Not Implemented
C<RC_STATUS> is C<501> while C<RC_STATUSLINE> is C<Method Not Implemented>.
=item RC_HTTPVERSION (2)
the server HTTP protocol version. Normally C<1.1> or C<1.0>.
=item RC_STARTTIME (3)
when the request has been started, fractional number.
=item RC_CONNTIME (4)
when the connection has been established, fractional number.
=item RC_FIRSTTIME (5)
when the first line of output has been received, fractional number.
=item RC_HEADERTIME (6)
when the response HTTP header has been completely received, fractional number.
=item RC_BODYTIME (7)
when the response body has been completely received, fractional number.
=item RC_HEADERS (8)
a hash containing the response HTTP headers. The values of this hash are
arrays since HTTP header fields can be given multiple times. Keys
(header names) are converted to lower case.
Example:
{
'content-type' => ['text/html; charset=iso-8859-1'],
'connection' => ['close'],
'date' => ['Sun, 04 Jul 2010 18:21:12 GMT'],
'content-length' => ['217'],
'allow' => ['GET,HEAD,POST,OPTIONS,TRACE'],
'server' => ['Apache'],
}
=item RC_BODY (9)
the response body
=item RC_DNSCACHED (10)
boolean: has the DNS cache lookup resulted in a hit (C<1>) or miss (C<0>)?
=item RC_CONNCACHED (11)
boolean: has the has a kept-alive connection been used?
=back
=head3 The %data hash
So, what can be specified in C<%data>? Note, all keys here are case
sensitive.
=over 4
=item NWorker (optional)
specifies the number of worker processes to be used. Default is C<1>.
=item RampUpStart (optional)
the number of threads to started up immediately (after the C<ProcInit>
phase is over). Default is 1 thread per worker process, that is C<NWorker>.
=item RampUpMax (optional)
the number of threads that have to be started up after the ramp-up
phase is over. That means all processes together will start this number
of threads. If a thread finishes before the ramp-up phase is over this
maximum level of parallelism will never be reached.
Default is the same as C<RampUpStart>.
=item RampUpDuration (optional)
the duration of the ramp-up phase in seconds (may be fraction).
Default is C<300> (5 minutes).
=item ParentInit (optional)
the C<ParentInit> handler called as
$data->{ParentInit}->();
One thing to do here is to create a scoreboard for interprocess
communication, see L<HTTP::LoadGen::ScoreBoard> or
L<IPC::ScoreBoard>.
Example:
ParentInit=>sub {
# no parameters
# create scoreboard
# options() returns the config hash itself. The NWorker parameter
# is known. SbSlotsz and SbExtra are new. This is to demonstrate
# that the hook routines can access the configuration and evaluate
# and even add custom parameters.
HTTP::LoadGen::ScoreBoard::init_once
@{HTTP::LoadGen::options()}{qw/NWorker SbSlotsz SbExtra/};
}
=item ParentExit (optional)
lib/HTTP/LoadGen.pod view on Meta::CPAN
$rc->[RC_CONNTIME]-$rc->[RC_STARTTIME],
$rc->[RC_FIRSTTIME]-$rc->[RC_STARTTIME],
$rc->[RC_HEADERTIME]-$rc->[RC_STARTTIME],
$rc->[RC_BODYTIME]-$rc->[RC_STARTTIME],
$rc->[RC_STATUS], $rc->[RC_STATUSLINE],
length($rc->[RC_BODY]),
sprintf('%s(%s://%s:%s%s)',
@{$rq}[RQ_METHOD, RQ_SCHEME, RQ_HOST, RQ_PORT, RQ_URI]));
}
=item times (optional)
the number of times the URL iterator is charged. That many times the URL list
is fetched.
If omitted or C<< <=0 >> the test runs forever.
=item dnscache (optional)
C<loadgen> caches DNS query results. One can prevent DNS queries completely in
2 ways. One of them is to provide a hash here that maps names to IP addresses.
The other is to have the URL iterator generate IP addresses instead of host
names and optionally C<Host> request header fields.
Another use of this item is to cheat host name resolution. One can for example
test a newly installed or development server while the real server continues
to work unaffected.
Example:
dnscache=>{
'foertsch.name'=>'127.0.0.1',
},
=item InitURLs (either InitURLs or URLList or both must be present)
C<InitURLs> initializes the URL iterator. It may be a string describing
one of the predefined iterators or a C<CODE> reference.
In the latter case it is called without parameters as
$it=$data->{InitURLs}->();
It is expected to return a function that when called as
$new_rq=$it->($rc, $rq);
returns the next request item or C<undef> when it runs out of items.
The parameters C<$rc> and C<$rq> describe the previous request (C<$rq>)
and its result (C<$rc>).
For a description of the C<$rq> and C<$new_rq> format see
L<URLList|/URLList (either InitURLs or URLList or both must be present)> below.
Example:
InitURLs=>sub {
my $url=[qw!GET http foertsch.name 80 /-redir!,
{
keepalive=>KEEPALIVE,
headers=>[
'X-auth'=>1, # necessary to trigger 401 for that URL
], # it also shows a custom request header
}];
return sub {
my ($rc, $rq)=@_;
if( $rc->[RC_STATUS]==401 ) {
# redo with Authorization header
push @{$rq->[RQ_PARAM]->{headers}}, Authorization=>'Basic YmxhOmJsdWI=';
return $rq;
}
my $new_rq=$url;
undef $url; # next time return undef (out-of-requests)
return $new_rq;
};
}
The iterator generator initializes the variable C<$url> and then returns
a closure. Hence, C<$url> is a static variable with respect to the returned
iterator.
The iterator itself checks the HTTP code of the previous request. In case of
a C<401> (Authorization Required) it adds an C<Authorization> header to
the request header list and retries the operation.
If the previous operation has ended with an other HTTP code it copies
C<$url> to an auxiliary variable, undefines it and returns the auxiliary
variable. Thus, only the first time the iterator is called it returns C<$url>.
After that it is always C<undef> which signals I<Out-of-Requests>.
If C<InitURLs> is a string it is the name of a predefined iterator generator.
Example:
InitURLs=>'follow'
There are currently 4 such generators. All of them expect an C<URLList>
(see below) to be provided.
=over 4
=item default
simply walks the C<URLList> from start to end.
This one is also used if C<InitURLs> is omitted.
=item random_start
similar to C<default> but starts at a random offset in C<URLList>. At the
end of the list it continues at the beginning until all C<URLList> elements
are done once.
=item follow
similar to C<default> but if a request results in a C<3xx> HTTP code and a
C<Location> header is provided by the server it tries to follow it
recursively.
If the request starting a series of redirections contains a C<postdelay>
lib/HTTP/LoadGen.pod view on Meta::CPAN
};
To turn your own iterator into a following you could write:
InitURLs=>sub {
return get_iterator->('follow')->($my_own_iterator);
}
where C<$my_own_iterator> is an iterator function.
=item random_start_follow
a combination of the 2 above.
=back
You can register your own named iterators by calling
L<register_iterator|/register_iterator> below.
=item URLList (either InitURLs or URLList or both must be present)
See also
L<InitURLs|/InitURLs (either InitURLs or URLList or both must be present)>
above.
An C<URLList> is an array of arrays. Each of these sub-arrays describes
one request. If consists of 6 elements:
[$method, $scheme, $host, $port, $uri, $param]
C<$method> is the HTTP request method, e.g. C<GET>, C<POST>, ...
C<$scheme> is either C<http> or C<https>.
C<$host> is the hostname or IP address of the server, e.g.
C<foertsch.name> or C<109.73.51.50>.
C<$port> is the server port to connect. Usually port C<80> is used for
C<http> and port C<443> for C<https>.
C<$uri> is the request URI normally starting with a slash (C</>), e.g.
C</impressum.html>.
C<$param> is a hash with further options.
To access the elements of a request description L<HTTP::LoadGen::Run>
exports a few constants. They may be used to increase readability.
RQ_METHOD == 0
RQ_SCHEME == 1
RQ_HOST == 2
RQ_PORT == 3
RQ_URI == 4
RQ_PARAM == 5
Example:
URLList=>[
[qw!GET http 109.73.51.50 80 /-redir!,
{
keepalive=>KEEPALIVE,
headers=>[
Authorization=>'Basic YmxhOmJsdWI=',
Host=>'foertsch.name',
],
}],
[qw!HUGO https www.kabatinte.net 443 /!,
{
keepalive=>KEEPALIVE,
predelay=>0.5,
prejitter=>1,
postdelay=>3,
postjitter=>1.5,
body=>'blablub',
}]
]
This C<URLList> contains 2 requests, one for a server with the IP address
C<109.73.51.50> and one for the host C<www.kabatinte.net>.
The first one will send the following HTTP request to the server
(IP C<109.73.51.50>, port C<80>):
GET /-redir HTTP/1.1
Authorization: Basic YmxhOmJsdWI=
Host: foertsch.name
If you need more header fields, C<User-Agent> for example, add them to the
C<headers> array of the options hash.
The second request is converted into the following HTTP message sent
over SSL to C<84.38.75.176:443> assuming that C<www.kabatinte.net>
resolves to C<84.38.75.176>:
HUGO / HTTP/1.1
Host: www.kabatinte.net
Content-Length: 7
blablub
Although no C<Host> header is specified in the request element one is sent.
If the request element does not contain a C<Host> header one is added
automatically based on C<$host> and C<$port>.
You may also notice the C<Content-Length> header. It is sent because a
request body is specified (the C<body> item in C<$param>).
So, what can be specified in the C<$param> part?
=over 4
=item keepalive
L<HTTP::LoadGen::Run> exports 3 constants to be used as values.
C<KEEPALIVE_USE> permits to use a previously kept alive connection.
C<KEEPALIVE_STORE> allows to keep the connection alive after the request.
C<KEEPALIVE> combines both of the above.
If you hate readability you can also use the numerical values:
KEEPALIVE_USE==1
KEEPALIVE_STORE==2
KEEPALIVE==3
=item predelay and prejitter
These statements define a period to wait before sending the request.
The wait is done after the request description has been pulled off
the iterator but before the C<ReqStart> handler is run.
Both numbers can be fractions. Read them as
predelay ± prejitter
The actual waiting time is calculated as
interval = predelay - prejitter + rand( 2 * prejitter )
If C<< prejitter >= predelay >> interval can become negative. In this cases
you won't jump back in time but simply not wait.
To achieve repeatable results a thread-specific random number generator
must be used. See the C<rng> function below.
=item postdelay and postjitter
The same as C<predelay> but waiting occurs after the request is done or
more precisely after the C<ReqDone> handler returns.
=item headers
an array (not a hash!) of header fields to be appended to the HTTP
request.
=item body
a request body
=item conn_timeout
here you can specify the return value of the prepare-callback function
passed to C<AnyEvent::Socket::tcp_connect> when establishing a connection.
See L<AnyEvent::Socket> for more information.
=item timeout
the C<timeout> parameter used when a connection is converted into a
L<AnyEvent::Handle> object.
See L<AnyEvent::Handle> for more information.
=item tls_ctx
the C<tls_ctx> parameter used when a connection is converted into a
L<AnyEvent::Handle> object.
lib/HTTP/LoadGen.pod view on Meta::CPAN
# thread accounting
thread_start;
# set a thread specific RNG
rng=Math::Random::MT->new(threadnr);
return []; # initializes thread specific user data
},
ThreadExit=>sub {
# no parameters
thread_done;
},
ReqStart=>sub {
my ($el)=@_;
# request accounting
req_start;
# started - succeeded - failed = currently pending number of requests
@{userdata()}=(thread_count, req_started-req_success-req_failed);
},
ReqDone=>sub {
my ($rc, $el)=@_;
# request accounting: HTTP status 2xx and 3xx are successful
# other requests are counted as failures.
req_done +($rc->[RC_STATUS]=~/^[23]/), $rc->[RC_HEADERS], $rc->[RC_BODY];
$logger->(threadnr,
@{$rc}[RC_DNSCACHED, RC_CONNCACHED],
@{userdata()},
req_success,
req_failed,
$rc->[RC_STARTTIME],
$rc->[RC_CONNTIME]-$rc->[RC_STARTTIME],
$rc->[RC_FIRSTTIME]-$rc->[RC_STARTTIME],
$rc->[RC_HEADERTIME]-$rc->[RC_STARTTIME],
$rc->[RC_BODYTIME]-$rc->[RC_STARTTIME],
$rc->[RC_STATUS],
length($rc->[RC_BODY]),
@{$el}[RQ_METHOD, RQ_SCHEME, RQ_HOST, RQ_PORT, RQ_URI],
$rc->[RC_STATUSLINE]);
},
dnscache=>{
localhost=>'127.0.0.1',
'kabatinte.net'=>'84.38.75.176',
'www.kabatinte.net'=>'84.38.75.176',
'foertsch.name'=>'109.73.51.50',
},
times=>3, # run the URL list 3 times
InitURLs=>'random_start',
URLList=>do {
my $o={
keepalive=>KEEPALIVE,
qw!predelay 0.05 prejitter 0.1 postdelay 0.5 postjitter 1!,
};
[[qw!GET http foertsch.name 80 /-redir!, $o],
[qw!HUGO https www.kabatinte.net 443 /!, $o]
];
},
}
=head1 SEE ALSO
=over 4
=item * L<HTTP::LoadGen::Run>
=item * L<HTTP::LoadGen::ScoreBoard>
=item * L<HTTP::LoadGen::Logger>
=item * L<loadgen>
=back
=head1 AUTHOR
Torsten Förtsch, E<lt>torsten.foertsch@gmx.netE<gt>
=head1 COPYRIGHT AND LICENSE
Copyright (C) 2010 by Torsten Förtsch
This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself, either Perl version 5.10.0 or,
at your option, any later version of Perl 5 you may have available.
=cut
( run in 0.460 second using v1.01-cache-2.11-cpan-5511b514fd6 )