AnyEvent-Fork-Pool
view release on metacpan or search on metacpan
The maximum number of processes in the pool, in addition to the template
process. C<AnyEvent::Fork::Pool> will never have more than this number of
worker processes, although there can be more temporarily when a worker is
shut down and hasn't exited yet.
=item load => $count (default: 2)
The maximum number of concurrent jobs sent to a single worker process.
Jobs that cannot be sent to a worker immediately (because all workers are
busy) will be queued until a worker is available.
Setting this low improves latency. For example, at C<1>, every job that
is sent to a worker is sent to a completely idle worker that doesn't run
any other jobs. The downside is that throughput is reduced - a worker that
finishes a job needs to wait for a new job from the parent.
The default of C<2> is usually a good compromise.
=item start => $seconds (default: 0.1)
When there are fewer than C<idle> workers (or all workers are completely
busy), then a timer is started. If the timer elapses and there are still
jobs that cannot be queued to a worker, a new worker is started.
This sets the minimum time that all workers must be busy before a new
worker is started. Or, put differently, the minimum delay between starting
new workers.
The delay is small by default, which means new workers will be started
relatively quickly. A delay of C<0> is possible, and ensures that the pool
will grow as quickly as possible under load.
Non-zero values are useful to avoid "exploding" a pool because a lot of
jobs are queued in an instant.
$scheduler = sub {
if (@queue) {
while (@queue) {
@pool or $start_worker->();
my $proc = $pool[0];
if ($proc->[0] < $load) {
# found free worker, increase load
unless ($proc->[0]++) {
# worker became busy
--$nidle
or undef $stop_w;
$want_start->()
if $nidle < $idle && @pool < $max;
}
Array::Heap::adjust_heap_idx @pool, 0;
my $job = shift @queue;
my $ocb = pop @$job;
$proc->[2]->(@$job, sub {
# reduce load
--$proc->[0] # worker still busy?
or ++$nidle > $idle # not too many idle processes?
or $want_stop->();
Array::Heap::adjust_heap_idx @pool, $proc->[1]
if defined $proc->[1];
&$ocb;
$scheduler->();
});
resizing of the pool by setting C<idle> to the same value as C<max>.
=item high throughput, cpu bound jobs - set load >= 2, max = #cpus
To get high throughput with cpu-bound jobs, you should set the maximum
pool size to the number of cpus in your system, and C<load> to at least
C<2>, to make sure there can be another job waiting for the worker when it
has finished one.
The value of C<2> for C<load> is the minimum value that I<can> achieve
100% throughput, but if your parent process itself is sometimes busy, you
might need higher values. Also there is a limit on the amount of data that
can be "in flight" to the worker, so if you send big blobs of data to your
worker, C<load> might have much less of an effect.
=item high throughput, I/O bound jobs - set load >= 2, max = 1, or very high
When your jobs are I/O bound, using more workers usually boils down to
higher throughput, depending very much on your actual workload - sometimes
having only one worker is best, for example, when you read or write big
files at maximum speed, as a second worker will increase seek times.
the template process. "AnyEvent::Fork::Pool" will never have
more than this number of worker processes, although there
can be more temporarily when a worker is shut down and
hasn't exited yet.
load => $count (default: 2)
The maximum number of concurrent jobs sent to a single
worker process.
Jobs that cannot be sent to a worker immediately (because
all workers are busy) will be queued until a worker is
available.
Setting this low improves latency. For example, at 1, every
job that is sent to a worker is sent to a completely idle
worker that doesn't run any other jobs. The downside is that
throughput is reduced - a worker that finishes a job needs
to wait for a new job from the parent.
The default of 2 is usually a good compromise.
start => $seconds (default: 0.1)
When there are fewer than "idle" workers (or all workers are
completely busy), then a timer is started. If the timer
elapses and there are still jobs that cannot be queued to a
worker, a new worker is started.
This sets the minimum time that all workers must be busy
before a new worker is started. Or, put differently, the
minimum delay between starting new workers.
The delay is small by default, which means new workers will
be started relatively quickly. A delay of 0 is possible, and
ensures that the pool will grow as quickly as possible under
load.
Non-zero values are useful to avoid "exploding" a pool
because a lot of jobs are queued in an instant.
"max".
high throughput, cpu bound jobs - set load >= 2, max = #cpus
To get high throughput with cpu-bound jobs, you should set the
maximum pool size to the number of cpus in your system, and "load"
to at least 2, to make sure there can be another job waiting for the
worker when it has finished one.
The value of 2 for "load" is the minimum value that *can* achieve
100% throughput, but if your parent process itself is sometimes
busy, you might need higher values. Also there is a limit on the
amount of data that can be "in flight" to the worker, so if you send
big blobs of data to your worker, "load" might have much less of an
effect.
high throughput, I/O bound jobs - set load >= 2, max = 1, or very high
When your jobs are I/O bound, using more workers usually boils down
to higher throughput, depending very much on your actual workload -
sometimes having only one worker is best, for example, when you read
or write big files at maximum speed, as a second worker will
increase seek times.
( run in 0.644 second using v1.01-cache-2.11-cpan-87723dcf8b7 )