busy results from the CPAN

busy

AnyEvent-Fork-RPC

view release on metacpan or search on metacpan

    and asynchronous backends.

    Note that the parentheses in the call to "AnyEvent::Fork::RPC::event"
    are not optional. That is because the function isn't defined when the
    code is compiled. You can make sure it is visible by pre-loading the
    correct backend module in the call to "require":

          ->require ("AnyEvent::Fork::RPC::Sync", "MyWorker")

    Since the backend module declares the "event" function, loading it first
    ensures that perl will correctly interpret calls to it.

    And as a final remark, there is a fine module on CPAN that can
    asynchronously "rmdir" and "unlink" and a lot more, and more efficiently
    than this example, namely IO::AIO.

   Example 1a: the same with the asynchronous backend
    This example only shows what needs to be changed to use the async
    backend instead. Doing this is not very useful, the purpose of this
    example is to show the minimum amount of change that is required to go
    from the synchronous to the asynchronous backend.

    To use the async backend in the previous example, you need to add the
    "async" parameter to the "AnyEvent::Fork::RPC::run" call:

          ->AnyEvent::Fork::RPC::run ("MyWorker::run",
             async      => 1,
             ...

    And since the function call protocol is now changed, you need to adopt
    "MyWorker::run" to the async API.

    First, you need to accept the extra initial $done callback:

       sub run {
          my ($done, $cmd, $path) = @_;

    And since a response is now generated when $done is called, as opposed
    to when the function returns, we need to call the $done function with
    the status:

          $done->($status or (0, "$!"));

    A few remarks are in order. First, it's quite pointless to use the async
    backend for this example - but it *is* possible. Second, you can call
    $done before or after returning from the function. Third, having both
    returned from the function and having called the $done callback, the
    child process may exit at any time, so you should call $done only when
    you really *are* done.

  Example 2: Asynchronous Backend
    This example implements multiple count-downs in the child, using
    AnyEvent timers. While this is a bit silly (one could use timers in the
    parent just as well), it illustrates the ability to use AnyEvent in the
    child and the fact that responses can arrive in a different order then
    the requests.

    It also shows how to embed the actual child code into a "__DATA__"
    section, so it doesn't need any external files at all.

    And when your parent process is often busy, and you have stricter timing
    requirements, then running timers in a child process suddenly doesn't
    look so silly anymore.

    Without further ado, here is the code:

       use AnyEvent;
       use AnyEvent::Fork;
       use AnyEvent::Fork::RPC;

       my $done = AE::cv;

       my $rpc = AnyEvent::Fork
          ->new
          ->require ("AnyEvent::Fork::RPC::Async")
          ->eval (do { local $/; <DATA> })
          ->AnyEvent::Fork::RPC::run ("run",
             async      => 1,
             on_error   => sub { warn "ERROR: $_[0]"; exit 1 },
             on_event   => sub { print $_[0] },
             on_destroy => $done,
          );

       for my $count (3, 2, 1) {
          $rpc->($count, sub {
             warn "job $count finished\n";
          });
       }

       undef $rpc;

       $done->recv;

       __DATA__

       # this ends up in main, as we don't use a package declaration

       use AnyEvent;

       sub run {
          my ($done, $count) = @_;

          my $n;

          AnyEvent::Fork::RPC::event "starting to count up to $count\n";

          my $w; $w = AE::timer 1, 1, sub {
             ++$n;

             AnyEvent::Fork::RPC::event "count $n of $count\n";

             if ($n == $count) {
                undef $w;
                $done->();
             }
          };
       }

    The parent part (the one before the "__DATA__" section) isn't very
    different from the earlier examples. It sets async mode, preloads the
    backend module (so the "AnyEvent::Fork::RPC::event" function is
    declared), uses a slightly different "on_event" handler (which we use
    simply for logging purposes) and then, instead of loading a module with
    the actual worker code, it "eval"'s the code from the data section in
    the child process.

    It then starts three countdowns, from 3 to 1 seconds downwards, destroys
    the rpc object so the example finishes eventually, and then just waits
    for the stuff to trickle in.

    The worker code uses the event function to log some progress messages,
    but mostly just creates a recurring one-second timer.

    The timer callback increments a counter, logs a message, and eventually,
    when the count has been reached, calls the finish callback.

    On my system, this results in the following output. Since all timers
    fire at roughly the same time, the actual order isn't guaranteed, but
    the order shown is very likely what you would get, too.

       starting to count up to 3
       starting to count up to 2
       starting to count up to 1
       count 1 of 3
       count 1 of 2
       count 1 of 1
       job 1 finished
       count 2 of 2
       job 2 finished
       count 2 of 3
       count 3 of 3
       job 3 finished

    While the overall ordering isn't guaranteed, the async backend still
    guarantees that events and responses are delivered to the parent process
    in the exact same ordering as they were generated in the child process.

    And unless your system is *very* busy, it should clearly show that the
    job started last will finish first, as it has the lowest count.

    This concludes the async example. Since AnyEvent::Fork does not actually
    fork, you are free to use about any module in the child, not just
    AnyEvent, but also IO::AIO, or Tk for example.

  Example 3: Asynchronous backend with Coro
    With Coro you can create a nice asynchronous backend implementation by
    defining an rpc server function that creates a new Coro thread for every
    request that calls a function "normally", i.e. the parameters from the
    parent process are passed to it, and any return values are returned to
    the parent process, e.g.:

       package My::Arith;

       sub add {
          return $_[0] + $_[1];
       }

       sub mul {
          return $_[0] * $_[1];
       }

       sub run {
          my ($done, $func, @arg) = @_;

          Coro::async_pool {
             $done->($func->(@arg));
          };
       }

    The "run" function creates a new thread for every invocation, using the
    first argument as function name, and calls the $done callback on it's
    return values. This makes it quite natural to define the "add" and "mul"
    functions to add or multiply two numbers and return the result.

    Since this is the asynchronous backend, it's quite possible to define
    RPC function that do I/O or wait for external events - their execution
    will overlap as needed.

    The above could be used like this:

       my $rpc = AnyEvent::Fork
          ->new
          ->require ("MyWorker")
          ->AnyEvent::Fork::RPC::run ("My::Arith::run",
             on_error => ..., on_event => ..., on_destroy => ...,
          );

       $rpc->(add => 1, 3, Coro::rouse_cb); say Coro::rouse_wait;
       $rpc->(mul => 3, 2, Coro::rouse_cb); say Coro::rouse_wait;

    The "say"'s will print 4 and 6.

  Example 4: Forward AnyEvent::Log messages using "on_event"
    This partial example shows how to use the "event" function to forward
    AnyEvent::Log messages to the parent.

    For this, the parent needs to provide a suitable "on_event":

README view on Meta::CPAN

        The synchronous backend is very simple: when the process waits for
        another request to arrive and the writing side (usually in the
        parent) is closed, it will exit normally, i.e. as if your main
        program reached the end of the file.

        That means that if your parent process exits, the RPC process will
        usually exit as well, either because it is idle anyway, or because
        it executes a request. In the latter case, you will likely get an
        error when the RPc process tries to send the results to the parent
        (because agruably, you shouldn't exit your parent while there are
        still outstanding requests).

        The process is usually quiescent when it happens, so it should
        rarely be a problem, and "END" handlers can be used to clean up.

    Asynchronous Backend
        For the asynchronous backend, things are more complicated: Whenever
        it listens for another request by the parent, it might detect that
        the socket was closed (e.g. because the parent exited). It will sotp
        listening for new requests and instead try to write out any
        remaining data (if any) or simply check whether the socket can be
        written to. After this, the RPC process is effectively done - no new
        requests are incoming, no outstanding request data can be written
        back.

        Since chances are high that there are event watchers that the RPC
        server knows nothing about (why else would one use the async backend
        if not for the ability to register watchers?), the event loop would
        often happily continue.

        This is why the asynchronous backend explicitly calls "CORE::exit"
        when it is done (under other circumstances, such as when there is an
        I/O error and there is outstanding data to write, it will log a
        fatal message via AnyEvent::Log, also causing the program to exit).

        You can override this by specifying a function name to call via the
        "done" parameter instead.

ADVANCED TOPICS
  Choosing a backend
    So how do you decide which backend to use? Well, that's your problem to
    solve, but here are some thoughts on the matter:

    Synchronous
        The synchronous backend does not rely on any external modules (well,
        except common::sense, which works around a bug in how perl's warning
        system works). This keeps the process very small, for example, on my
        system, an empty perl interpreter uses 1492kB RSS, which becomes
        2020kB after "use warnings; use strict" (for people who grew up with
        C64s around them this is probably shocking every single time they
        see it). The worker process in the first example in this document
        uses 1792kB.

        Since the calls are done synchronously, slow jobs will keep newer
        jobs from executing.

        The synchronous backend also has no overhead due to running an event
        loop - reading requests is therefore very efficient, while writing
        responses is less so, as every response results in a write syscall.

        If the parent process is busy and a bit slow reading responses, the
        child waits instead of processing further requests. This also limits
        the amount of memory needed for buffering, as never more than one
        response has to be buffered.

        The API in the child is simple - you just have to define a function
        that does something and returns something.

        It's hard to use modules or code that relies on an event loop, as
        the child cannot execute anything while it waits for more input.

    Asynchronous
        The asynchronous backend relies on AnyEvent, which tries to be
        small, but still comes at a price: On my system, the worker from
        example 1a uses 3420kB RSS (for AnyEvent, which loads EV, which
        needs XSLoader which in turn loads a lot of other modules such as
        warnings, strict, vars, Exporter...).

        It batches requests and responses reasonably efficiently, doing only
        as few reads and writes as needed, but needs to poll for events via
        the event loop.

        Responses are queued when the parent process is busy. This means the
        child can continue to execute any queued requests. It also means
        that a child might queue a lot of responses in memory when it
        generates them and the parent process is slow accepting them.

        The API is not a straightforward RPC pattern - you have to call a
        "done" callback to pass return values and signal completion. Also,
        more importantly, the API starts jobs as fast as possible - when
        1000 jobs are queued and the jobs are slow, they will all run
        concurrently. The child must implement some queueing/limiting
        mechanism if this causes problems. Alternatively, the parent could
        limit the amount of rpc calls that are outstanding.

        Blocking use of condvars is not supported (in the main thread,
        outside of e.g. Coro threads).

        Using event-based modules such as IO::AIO, Gtk2, Tk and so on is
        easy.

  Passing file descriptors
    Unlike AnyEvent::Fork, this module has no in-built file handle or file
    descriptor passing abilities.

    The reason is that passing file descriptors is extraordinary tricky
    business, and conflicts with efficient batching of messages.

    There still is a method you can use: Create a
    "AnyEvent::Util::portable_socketpair" and "send_fh" one half of it to
    the process before you pass control to "AnyEvent::Fork::RPC::run".

    Whenever you want to pass a file descriptor, send an rpc request to the
    child process (so it expects the descriptor), then send it over the
    other half of the socketpair. The child should fetch the descriptor from
    the half it has passed earlier.

    Here is some (untested) pseudocode to that effect:

       use AnyEvent::Util;
       use AnyEvent::Fork;
       use AnyEvent::Fork::RPC;
       use IO::FDPass;

       my ($s1, $s2) = AnyEvent::Util::portable_socketpair;

       my $rpc = AnyEvent::Fork
          ->new
          ->send_fh ($s2)
          ->require ("MyWorker")
          ->AnyEvent::Fork::RPC::run ("MyWorker::run"
               init => "MyWorker::init",
            );

       undef $s2; # no need to keep it around

       # pass an fd
       $rpc->("i'll send some fd now, please expect it!", my $cv = AE::cv);

       IO::FDPass fileno $s1, fileno $handle_to_pass;

       $cv->recv;

( run in 1.778 second using v1.01-cache-2.11-cpan-bbe5e583499 )