AnyEvent-Fork-Pool

 view release on metacpan or  search on metacpan

Pool.pm  view on Meta::CPAN

The maximum number of processes in the pool, in addition to the template
process. C<AnyEvent::Fork::Pool> will never have more than this number of
worker processes, although there can be more temporarily when a worker is
shut down and hasn't exited yet.

=item load => $count (default: 2)

The maximum number of concurrent jobs sent to a single worker process.

Jobs that cannot be sent to a worker immediately (because all workers are
busy) will be queued until a worker is available.

Setting this low improves latency. For example, at C<1>, every job that
is sent to a worker is sent to a completely idle worker that doesn't run
any other jobs. The downside is that throughput is reduced - a worker that
finishes a job needs to wait for a new job from the parent.

The default of C<2> is usually a good compromise.

=item start => $seconds (default: 0.1)

When there are fewer than C<idle> workers (or all workers are completely
busy), then a timer is started. If the timer elapses and there are still
jobs that cannot be queued to a worker, a new worker is started.

This sets the minimum time that all workers must be busy before a new
worker is started. Or, put differently, the minimum delay between starting
new workers.

The delay is small by default, which means new workers will be started
relatively quickly. A delay of C<0> is possible, and ensures that the pool
will grow as quickly as possible under load.

Non-zero values are useful to avoid "exploding" a pool because a lot of
jobs are queued in an instant.

Pool.pm  view on Meta::CPAN

   $scheduler = sub {
      if (@queue) {
         while (@queue) {
            @pool or $start_worker->();

            my $proc = $pool[0];

            if ($proc->[0] < $load) {
               # found free worker, increase load
               unless ($proc->[0]++) {
                  # worker became busy
                  --$nidle
                     or undef $stop_w;

                  $want_start->()
                     if $nidle < $idle && @pool < $max;
               }

               Array::Heap::adjust_heap_idx @pool, 0;

               my $job = shift @queue;
               my $ocb = pop @$job;

               $proc->[2]->(@$job, sub {
                  # reduce load
                  --$proc->[0] # worker still busy?
                     or ++$nidle > $idle # not too many idle processes?
                     or $want_stop->();

                  Array::Heap::adjust_heap_idx @pool, $proc->[1]
                     if defined $proc->[1];

                  &$ocb;

                  $scheduler->();
               });

Pool.pm  view on Meta::CPAN

resizing of the pool by setting C<idle> to the same value as C<max>.

=item high throughput, cpu bound jobs - set load >= 2, max = #cpus

To get high throughput with cpu-bound jobs, you should set the maximum
pool size to the number of cpus in your system, and C<load> to at least
C<2>, to make sure there can be another job waiting for the worker when it
has finished one.

The value of C<2> for C<load> is the minimum value that I<can> achieve
100% throughput, but if your parent process itself is sometimes busy, you
might need higher values. Also there is a limit on the amount of data that
can be "in flight" to the worker, so if you send big blobs of data to your
worker, C<load> might have much less of an effect.

=item high throughput, I/O bound jobs - set load >= 2, max = 1, or very high

When your jobs are I/O bound, using more workers usually boils down to
higher throughput, depending very much on your actual workload - sometimes
having only one worker is best, for example, when you read or write big
files at maximum speed, as a second worker will increase seek times.

README  view on Meta::CPAN

                the template process. "AnyEvent::Fork::Pool" will never have
                more than this number of worker processes, although there
                can be more temporarily when a worker is shut down and
                hasn't exited yet.

            load => $count (default: 2)
                The maximum number of concurrent jobs sent to a single
                worker process.

                Jobs that cannot be sent to a worker immediately (because
                all workers are busy) will be queued until a worker is
                available.

                Setting this low improves latency. For example, at 1, every
                job that is sent to a worker is sent to a completely idle
                worker that doesn't run any other jobs. The downside is that
                throughput is reduced - a worker that finishes a job needs
                to wait for a new job from the parent.

                The default of 2 is usually a good compromise.

            start => $seconds (default: 0.1)
                When there are fewer than "idle" workers (or all workers are
                completely busy), then a timer is started. If the timer
                elapses and there are still jobs that cannot be queued to a
                worker, a new worker is started.

                This sets the minimum time that all workers must be busy
                before a new worker is started. Or, put differently, the
                minimum delay between starting new workers.

                The delay is small by default, which means new workers will
                be started relatively quickly. A delay of 0 is possible, and
                ensures that the pool will grow as quickly as possible under
                load.

                Non-zero values are useful to avoid "exploding" a pool
                because a lot of jobs are queued in an instant.

README  view on Meta::CPAN

        "max".

    high throughput, cpu bound jobs - set load >= 2, max = #cpus
        To get high throughput with cpu-bound jobs, you should set the
        maximum pool size to the number of cpus in your system, and "load"
        to at least 2, to make sure there can be another job waiting for the
        worker when it has finished one.

        The value of 2 for "load" is the minimum value that *can* achieve
        100% throughput, but if your parent process itself is sometimes
        busy, you might need higher values. Also there is a limit on the
        amount of data that can be "in flight" to the worker, so if you send
        big blobs of data to your worker, "load" might have much less of an
        effect.

    high throughput, I/O bound jobs - set load >= 2, max = 1, or very high
        When your jobs are I/O bound, using more workers usually boils down
        to higher throughput, depending very much on your actual workload -
        sometimes having only one worker is best, for example, when you read
        or write big files at maximum speed, as a second worker will
        increase seek times.



( run in 0.644 second using v1.01-cache-2.11-cpan-87723dcf8b7 )