AnyEvent-Net-Curl-Queued

 view release on metacpan or  search on metacpan

lib/AnyEvent/Net/Curl/Queued.pm  view on Meta::CPAN

    package YourSubclassingModule;
    use Any::Moose;
    use Any::Moose qw(X::NonMoose);
    extends 'AnyEvent::Net::Curl::Queued::Easy';
    ...

However, the recommended approach is to switch your subclassing module to L<Moo> altogether (you can use L<MooX::late> to smoothen the transition):

    package YourSubclassingModule;
    use Moo;
    use MooX::late;
    extends 'AnyEvent::Net::Curl::Queued::Easy';
    ...

=head1 DESCRIPTION

B<AnyEvent::Net::Curl::Queued> (a.k.a. L<YADA>, I<Yet Another Download Accelerator>) is an efficient and flexible batch downloader with a straight-forward interface capable of:

=over 4

=item *

create a queue;

=item *

append/prepend URLs;

=item *

wait for downloads to end (retry on errors).

=back

Download init/finish/error handling is defined through L<Moose's method modifiers|Moose::Manual::MethodModifiers>.

=head2 MOTIVATION

I am very unhappy with the performance of L<LWP>.
It's almost perfect for properly handling HTTP headers, cookies & stuff, but it comes at the cost of I<speed>.
While this doesn't matter when you make single downloads, batch downloading becomes a real pain.

When I download large batch of documents, I don't care about cookies or headers, only content and proper redirection matters.
And, as it is clearly an I/O bottleneck operation, I want to make as many parallel requests as possible.

So, this is what L<CPAN> offers to fulfill my needs:

=over 4

=item *

L<Net::Curl>: Perl interface to the all-mighty L<libcurl|http://curl.haxx.se/libcurl/>, is well-documented (opposite to L<WWW::Curl>);

=item *

L<AnyEvent>: the L<DBI> of event loops. L<Net::Curl> also provides a nice and well-documented example of L<AnyEvent> usage (L<03-multi-event.pl|Net::Curl::examples/Multi::Event>).

=back

L<AnyEvent::Net::Curl::Queued> is a glue module to wrap it all together.
It offers no callbacks and (almost) no default handlers.
It's up to you to extend the base class L<AnyEvent::Net::Curl::Queued::Easy> so it will actually download something and store it somewhere.

=head2 ALTERNATIVES

As there's more than one way to do it, I'll list the alternatives which can be used to implement batch downloads:

=over 4

=item *

L<WWW::Mechanize>: no (builtin) parallelism, no (builtin) queueing. Slow, but very powerful for site traversal;

=item *

L<LWP::UserAgent>: no parallelism, no queueing. L<WWW::Mechanize> is built on top of LWP, by the way;

=item *

L<LWP::Protocol::Net::Curl>: I<drop-in> replacement for L<LWP::UserAgent>, L<WWW::Mechanize> and their derivatives to use L<Net::Curl> as a backend;

=item *

L<LWP::Curl>: L<LWP::UserAgent>-alike interface for L<WWW::Curl>. Not a I<drop-in>, no parallelism, no queueing. Fast and simple to use;

=item *

L<HTTP::Tiny>: no parallelism, no queueing. Fast and part of CORE since Perl v5.13.9;

=item *

L<HTTP::Lite>: no parallelism, no queueing. Also fast;

=item *

L<Furl>: no parallelism, no queueing. B<Very> fast, despite being pure-Perl;

=item *

L<Mojo::UserAgent>: capable of non-blocking parallel requests, no queueing;

=item *

L<AnyEvent::Curl::Multi>: queued parallel downloads via L<WWW::Curl>. Queues are non-lazy, thus large ones can use many RAM;

=item *

L<Parallel::Downloader>: queued parallel downloads via L<AnyEvent::HTTP>. Very fast and is pure-Perl (compiling event driver is optional). No queue modification possible while batch is being processed.

=back

=head2 BENCHMARK

(see also: L<CPAN modules for making HTTP requests|http://neilb.org/reviews/http-requesters.html>)

Obviously, every download agent is (or, ideally, should be) I<I/O bound>.
However, it is not uncommon for large concurrent batch downloads to hog the processor cycles B<before> consuming the full network bandwidth.
The proposed benchmark measures the request rate of several concurrent download agents, trying hard to make all of them I<CPU bound> (by removing the I/O constraint).
On practice, this benchmark results mean that download agents with lower request rate are less appropriate for parallelized batch downloads.
On the other hand, download agents with higher request rate are more likely to reach the full capacity of a network link while still leaving spare resources for data parsing/filtering.



( run in 2.141 seconds using v1.01-cache-2.11-cpan-cdf2f3d4e48 )