AnyEvent-Net-Curl-Queued

 view release on metacpan or  search on metacpan

README  view on Meta::CPAN

        package YourSubclassingModule;
        use Mouse;
        use MouseX::NonMoose;
        extends 'AnyEvent::Net::Curl::Queued::Easy';
        ...

    Or the Any::Moose equivalent:

        package YourSubclassingModule;
        use Any::Moose;
        use Any::Moose qw(X::NonMoose);
        extends 'AnyEvent::Net::Curl::Queued::Easy';
        ...

    However, the recommended approach is to switch your subclassing module
    to Moo altogether (you can use MooX::late to smoothen the transition):

        package YourSubclassingModule;
        use Moo;
        use MooX::late;
        extends 'AnyEvent::Net::Curl::Queued::Easy';
        ...

DESCRIPTION

    AnyEvent::Net::Curl::Queued (a.k.a. YADA, Yet Another Download
    Accelerator) is an efficient and flexible batch downloader with a
    straight-forward interface capable of:

      * create a queue;

      * append/prepend URLs;

      * wait for downloads to end (retry on errors).

    Download init/finish/error handling is defined through Moose's method
    modifiers.

 MOTIVATION

    I am very unhappy with the performance of LWP. It's almost perfect for
    properly handling HTTP headers, cookies & stuff, but it comes at the
    cost of speed. While this doesn't matter when you make single
    downloads, batch downloading becomes a real pain.

    When I download large batch of documents, I don't care about cookies or
    headers, only content and proper redirection matters. And, as it is
    clearly an I/O bottleneck operation, I want to make as many parallel
    requests as possible.

    So, this is what CPAN offers to fulfill my needs:

      * Net::Curl: Perl interface to the all-mighty libcurl
      <http://curl.haxx.se/libcurl/>, is well-documented (opposite to
      WWW::Curl);

      * AnyEvent: the DBI of event loops. Net::Curl also provides a nice
      and well-documented example of AnyEvent usage (03-multi-event.pl).

    AnyEvent::Net::Curl::Queued is a glue module to wrap it all together.
    It offers no callbacks and (almost) no default handlers. It's up to you
    to extend the base class AnyEvent::Net::Curl::Queued::Easy so it will
    actually download something and store it somewhere.

 ALTERNATIVES

    As there's more than one way to do it, I'll list the alternatives which
    can be used to implement batch downloads:

      * WWW::Mechanize: no (builtin) parallelism, no (builtin) queueing.
      Slow, but very powerful for site traversal;

      * LWP::UserAgent: no parallelism, no queueing. WWW::Mechanize is
      built on top of LWP, by the way;

      * LWP::Protocol::Net::Curl: drop-in replacement for LWP::UserAgent,
      WWW::Mechanize and their derivatives to use Net::Curl as a backend;

      * LWP::Curl: LWP::UserAgent-alike interface for WWW::Curl. Not a
      drop-in, no parallelism, no queueing. Fast and simple to use;

      * HTTP::Tiny: no parallelism, no queueing. Fast and part of CORE
      since Perl v5.13.9;

      * HTTP::Lite: no parallelism, no queueing. Also fast;

      * Furl: no parallelism, no queueing. Very fast, despite being
      pure-Perl;

      * Mojo::UserAgent: capable of non-blocking parallel requests, no
      queueing;

      * AnyEvent::Curl::Multi: queued parallel downloads via WWW::Curl.
      Queues are non-lazy, thus large ones can use many RAM;

      * Parallel::Downloader: queued parallel downloads via AnyEvent::HTTP.
      Very fast and is pure-Perl (compiling event driver is optional). No
      queue modification possible while batch is being processed.

 BENCHMARK

    (see also: CPAN modules for making HTTP requests
    <http://neilb.org/reviews/http-requesters.html>)

    Obviously, every download agent is (or, ideally, should be) I/O bound.
    However, it is not uncommon for large concurrent batch downloads to hog
    the processor cycles before consuming the full network bandwidth. The
    proposed benchmark measures the request rate of several concurrent
    download agents, trying hard to make all of them CPU bound (by removing
    the I/O constraint). On practice, this benchmark results mean that
    download agents with lower request rate are less appropriate for
    parallelized batch downloads. On the other hand, download agents with
    higher request rate are more likely to reach the full capacity of a
    network link while still leaving spare resources for data
    parsing/filtering.

    The script eg/benchmark.pl compares AnyEvent::Net::Curl::Queued (A.K.A.
    YADA) against several other download agents. Only
    AnyEvent::Net::Curl::Queued itself, AnyEvent::Curl::Multi,
    Parallel::Downloader, Mojo::UserAgent and lftp <http://lftp.yar.ru/>
    support concurrent downloads natively; thus, Parallel::ForkManager is



( run in 1.779 second using v1.01-cache-2.11-cpan-39bf76dae61 )