AnyEvent-Fork-Pool

 view release on metacpan or  search on metacpan

README  view on Meta::CPAN

                minimum delay between starting new workers.

                The delay is small by default, which means new workers will
                be started relatively quickly. A delay of 0 is possible, and
                ensures that the pool will grow as quickly as possible under
                load.

                Non-zero values are useful to avoid "exploding" a pool
                because a lot of jobs are queued in an instant.

                Higher values are often useful to improve efficiency at the
                cost of latency - when fewer processes can do the job over
                time, starting more and more is not necessarily going to
                help.

            stop => $seconds (default: 10)
                When a worker has no jobs to execute it becomes idle. An
                idle worker that hasn't executed a job within this amount of
                time will be stopped, unless the other parameters say
                otherwise.

                Setting this to a very high value means that workers stay
                around longer, even when they have nothing to do, which can
                be good as they don't have to be started on the netx load
                spike again.

                Setting this to a lower value can be useful to avoid memory
                or simply process table wastage.

                Usually, setting this to a time longer than the time between
                load spikes is best - if you expect a lot of requests every
                minute and little work in between, setting this to longer
                than a minute avoids having to stop and start workers. On
                the other hand, you have to ask yourself if letting workers
                run idle is a good use of your resources. Try to find a good
                balance between resource usage of your workers and the time
                to start new workers - the processes created by
                AnyEvent::Fork itself is fats at creating workers while not
                using much memory for them, so most of the overhead is
                likely from your own code.

            on_destroy => $callback->() (default: none)
                When a pool object goes out of scope, the outstanding
                requests are still handled till completion. Only after
                handling all jobs will the workers be destroyed (and also
                the template process if it isn't referenced otherwise).

                To find out when a pool *really* has finished its work, you
                can set this callback, which will be called when the pool
                has been destroyed.

        AnyEvent::Fork::RPC Parameters
            These parameters are all passed more or less directly to
            AnyEvent::Fork::RPC. They are only briefly mentioned here, for
            their full documentation please refer to the AnyEvent::Fork::RPC
            documentation. Also, the default values mentioned here are only
            documented as a best effort - the AnyEvent::Fork::RPC
            documentation is binding.

            async => $boolean (default: 0)
                Whether to use the synchronous or asynchronous RPC backend.

            on_error => $callback->($message) (default: die with message)
                The callback to call on any (fatal) errors.

            on_event => $callback->(...) (default: "sub { }", unlike
            AnyEvent::Fork::RPC)
                The callback to invoke on events.

            init => $initfunction (default: none)
                The function to call in the child, once before handling
                requests.

            serialiser => $serialiser (defailt:
            $AnyEvent::Fork::RPC::STRING_SERIALISER)
                The serialiser to use.

    $pool->(..., $cb->(...))
        Call the RPC function of a worker with the given arguments, and when
        the worker is done, call the $cb with the results, just like calling
        the RPC object durectly - see the AnyEvent::Fork::RPC documentation
        for details on the RPC API.

        If there is no free worker, the call will be queued until a worker
        becomes available.

        Note that there can be considerable time between calling this method
        and the call actually being executed. During this time, the
        parameters passed to this function are effectively read-only -
        modifying them after the call and before the callback is invoked
        causes undefined behaviour.

    $cpus = AnyEvent::Fork::Pool::ncpu [$default_cpus]
    ($cpus, $eus) = AnyEvent::Fork::Pool::ncpu [$default_cpus]
        Tries to detect the number of CPUs ($cpus often called CPU cores
        nowadays) and execution units ($eus) which include e.g. extra
        hyperthreaded units). When $cpus cannot be determined reliably,
        $default_cpus is returned for both values, or 1 if it is missing.

        For normal CPU bound uses, it is wise to have as many worker
        processes as CPUs in the system ($cpus), if nothing else uses the
        CPU. Using hyperthreading is usually detrimental to performance, but
        in those rare cases where that really helps it might be beneficial
        to use more workers ($eus).

        Currently, /proc/cpuinfo is parsed on GNU/Linux systems for both
        $cpus and $eus, and on {Free,Net,Open}BSD, sysctl -n hw.ncpu is used
        for $cpus.

        Example: create a worker pool with as many workers as CPU cores, or
        2, if the actual number could not be determined.

           $fork->AnyEvent::Fork::Pool::run ("myworker::function",
              max => (scalar AnyEvent::Fork::Pool::ncpu 2),
           );

CHILD USAGE
    In addition to the AnyEvent::Fork::RPC API, this module implements one
    more child-side function:

    AnyEvent::Fork::Pool::retire ()
        This function sends an event to the parent process to request
        retirement: the worker is removed from the pool and no new jobs will
        be sent to it, but it still has to handle the jobs that are already
        queued.

        The parentheses are part of the syntax: the function usually isn't
        defined when you compile your code (because that happens *before*
        handing the template process over to "AnyEvent::Fork::Pool::run", so
        you need the empty parentheses to tell Perl that the function is
        indeed a function.

        Retiring a worker can be useful to gracefully shut it down when the
        worker deems this useful. For example, after executing a job, it
        could check the process size or the number of jobs handled so far,
        and if either is too high, the worker could request to be retired,
        to avoid memory leaks to accumulate.

        Example: retire a worker after it has handled roughly 100 requests.
        It doesn't matter whether you retire at the beginning or end of your
        request, as the worker will continue to handle some outstanding
        requests. Likewise, it's ok to call retire multiple times.

           my $count = 0;

           sub my::worker {

              ++$count == 100
                 and AnyEvent::Fork::Pool::retire ();

              ... normal code goes here
           }

POOL PARAMETERS RECIPES
    This section describes some recipes for pool parameters. These are
    mostly meant for the synchronous RPC backend, as the asynchronous RPC
    backend changes the rules considerably, making workers themselves
    responsible for their scheduling.

    low latency - set load = 1
        If you need a deterministic low latency, you should set the "load"
        parameter to 1. This ensures that never more than one job is sent to
        each worker. This avoids having to wait for a previous job to
        finish.

        This makes most sense with the synchronous (default) backend, as the
        asynchronous backend can handle multiple requests concurrently.

    lowest latency - set load = 1 and idle = max
        To achieve the lowest latency, you additionally should disable any
        dynamic resizing of the pool by setting "idle" to the same value as
        "max".

    high throughput, cpu bound jobs - set load >= 2, max = #cpus
        To get high throughput with cpu-bound jobs, you should set the
        maximum pool size to the number of cpus in your system, and "load"
        to at least 2, to make sure there can be another job waiting for the
        worker when it has finished one.

        The value of 2 for "load" is the minimum value that *can* achieve
        100% throughput, but if your parent process itself is sometimes
        busy, you might need higher values. Also there is a limit on the
        amount of data that can be "in flight" to the worker, so if you send
        big blobs of data to your worker, "load" might have much less of an
        effect.

    high throughput, I/O bound jobs - set load >= 2, max = 1, or very high
        When your jobs are I/O bound, using more workers usually boils down
        to higher throughput, depending very much on your actual workload -
        sometimes having only one worker is best, for example, when you read
        or write big files at maximum speed, as a second worker will
        increase seek times.

EXCEPTIONS
    The same "policy" as with AnyEvent::Fork::RPC applies - exceptions will
    not be caught, and exceptions in both worker and in callbacks causes
    undesirable or undefined behaviour.

SEE ALSO
    AnyEvent::Fork, to create the processes in the first place.

    AnyEvent::Fork::Remote, likewise, but helpful for remote processes.

    AnyEvent::Fork::RPC, which implements the RPC protocol and API.

AUTHOR AND CONTACT INFORMATION
     Marc Lehmann <schmorp@schmorp.de>
     http://software.schmorp.de/pkg/AnyEvent-Fork-Pool



( run in 0.955 second using v1.01-cache-2.11-cpan-39bf76dae61 )