AnyEvent-HTTP

 view release on metacpan or  search on metacpan

HTTP.pm  view on Meta::CPAN


The pseudo-header C<Redirect> only exists when the request was a result
of an internal redirect. In that case it is an array reference with
the C<($data, $headers)> from the redirect response. Note that this
response could in turn be the result of a redirect itself, and C<<
$headers->{Redirect}[1]{Redirect} >> will then contain the original
response, and so on.

If the server sends a header multiple times, then their contents will be
joined together with a comma (C<,>), as per the HTTP spec.

If an internal error occurs, such as not being able to resolve a hostname,
then C<$data> will be C<undef>, C<< $headers->{Status} >> will be
C<590>-C<599> and the C<Reason> pseudo-header will contain an error
message. Currently the following status codes are used:

=over 4

=item 595 - errors during connection establishment, proxy handshake.

=item 596 - errors during TLS negotiation, request sending and header processing.

=item 597 - errors during body receiving or processing.

=item 598 - user aborted request via C<on_header> or C<on_body>.

=item 599 - other, usually nonretryable, errors (garbled URL etc.).

=back

A typical callback might look like this:

   sub {
      my ($body, $hdr) = @_;

      if ($hdr->{Status} =~ /^2/) {
         ... everything should be ok
      } else {
         print "error, $hdr->{Status} $hdr->{Reason}\n";
      }
   }

Additional parameters are key-value pairs, and are fully optional. They
include:

=over 4

=item recurse => $count (default: $MAX_RECURSE)

Whether to recurse requests or not, e.g. on redirects, authentication and
other retries and so on, and how often to do so.

Only redirects to http and https URLs are supported. While most common
redirection forms are handled entirely within this module, some require
the use of the optional L<URI> module. If it is required but missing, then
the request will fail with an error.

=item headers => hashref

The request headers to use. Currently, C<http_request> may provide its own
C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers and
will provide defaults at least for C<TE:>, C<Referer:> and C<User-Agent:>
(this can be suppressed by using C<undef> for these headers in which case
they won't be sent at all).

You really should provide your own C<User-Agent:> header value that is
appropriate for your program - I wouldn't be surprised if the default
AnyEvent string gets blocked by webservers sooner or later.

Also, make sure that your headers names and values do not contain any
embedded newlines.

=item timeout => $seconds

The time-out to use for various stages - each connect attempt will reset
the timeout, as will read or write activity, i.e. this is not an overall
timeout.

Default timeout is 5 minutes.

=item proxy => [$host, $port[, $scheme]] or undef

Use the given http proxy for all requests, or no proxy if C<undef> is
used.

C<$scheme> must be either missing or must be C<http> for HTTP.

If not specified, then the default proxy is used (see
C<AnyEvent::HTTP::set_proxy>).

Currently, if your proxy requires authorization, you have to specify an
appropriate "Proxy-Authorization" header in every request.

Note that this module will prefer an existing persistent connection,
even if that connection was made using another proxy. If you need to
ensure that a new connection is made in this case, you can either force
C<persistent> to false or e.g. use the proxy address in your C<sessionid>.

=item body => $string

The request body, usually empty. Will be sent as-is (future versions of
this module might offer more options).

=item cookie_jar => $hash_ref

Passing this parameter enables (simplified) cookie-processing, loosely
based on the original netscape specification.

The C<$hash_ref> must be an (initially empty) hash reference which
will get updated automatically. It is possible to save the cookie jar
to persistent storage with something like JSON or Storable - see the
C<AnyEvent::HTTP::cookie_jar_expire> function if you wish to remove
expired or session-only cookies, and also for documentation on the format
of the cookie jar.

Note that this cookie implementation is not meant to be complete. If
you want complete cookie management you have to do that on your
own. C<cookie_jar> is meant as a quick fix to get most cookie-using sites
working. Cookies are a privacy disaster, do not use them unless required
to.

When cookie processing is enabled, the C<Cookie:> and C<Set-Cookie:>
headers will be set and handled by this module, otherwise they will be
left untouched.

=item tls_ctx => $scheme | $tls_ctx

Specifies the AnyEvent::TLS context to be used for https connections. This
parameter follows the same rules as the C<tls_ctx> parameter to
L<AnyEvent::Handle>, but additionally, the two strings C<low> or
C<high> can be specified, which give you a predefined low-security (no
verification, highest compatibility) and high-security (CA and common-name
verification) TLS context.

The default for this option is C<low>, which could be interpreted as "give
me the page, no matter what".

See also the C<sessionid> parameter.

=item sessionid => $string

The module might reuse connections to the same host internally (regardless
of other settings, such as C<tcp_connect> or C<proxy>). Sometimes (e.g.
when using TLS or a specfic proxy), you do not want to reuse connections
from other sessions. This can be achieved by setting this parameter to
some unique ID (such as the address of an object storing your state data
or the TLS context, or the proxy IP) - only connections using the same
unique ID will be reused.

=item on_prepare => $callback->($fh)

In rare cases you need to "tune" the socket before it is used to
connect (for example, to bind it on a given IP address). This parameter
overrides the prepare callback passed to C<AnyEvent::Socket::tcp_connect>
and behaves exactly the same way (e.g. it has to provide a
timeout). See the description for the C<$prepare_cb> argument of
C<AnyEvent::Socket::tcp_connect> for details.

=item tcp_connect => $callback->($host, $service, $connect_cb, $prepare_cb)

In even rarer cases you want total control over how AnyEvent::HTTP
establishes connections. Normally it uses L<AnyEvent::Socket::tcp_connect>
to do this, but you can provide your own C<tcp_connect> function -
obviously, it has to follow the same calling conventions, except that it
may always return a connection guard object.

The connections made by this hook will be treated as equivalent to
connections made the built-in way, specifically, they will be put into
and taken from the persistent connection cache. If your C<$tcp_connect>
function is incompatible with this kind of re-use, consider switching off
C<persistent> connections and/or providing a C<sessionid> identifier.

There are probably lots of weird uses for this function, starting from
tracing the hosts C<http_request> actually tries to connect, to (inexact
but fast) host => IP address caching or even socks protocol support.

=item on_header => $callback->($headers)

When specified, this callback will be called with the header hash as soon
as headers have been successfully received from the remote server (not on
locally-generated errors).

HTTP.pm  view on Meta::CPAN

=over 4

=item AnyEvent::HTTP::set_proxy "proxy-url"

Sets the default proxy server to use. The proxy-url must begin with a
string of the form C<http://host:port>, croaks otherwise.

To clear an already-set proxy, use C<undef>.

When AnyEvent::HTTP is loaded for the first time it will query the
default proxy from the operating system, currently by looking at
C<$ENV{http_proxy>}.

=item AnyEvent::HTTP::cookie_jar_expire $jar[, $session_end]

Remove all cookies from the cookie jar that have been expired. If
C<$session_end> is given and true, then additionally remove all session
cookies.

You should call this function (with a true C<$session_end>) before you
save cookies to disk, and you should call this function after loading them
again. If you have a long-running program you can additionally call this
function from time to time.

A cookie jar is initially an empty hash-reference that is managed by this
module. Its format is subject to change, but currently it is as follows:

The key C<version> has to contain C<2>, otherwise the hash gets
cleared. All other keys are hostnames or IP addresses pointing to
hash-references. The key for these inner hash references is the
server path for which this cookie is meant, and the values are again
hash-references. Each key of those hash-references is a cookie name, and
the value, you guessed it, is another hash-reference, this time with the
key-value pairs from the cookie, except for C<expires> and C<max-age>,
which have been replaced by a C<_expires> key that contains the cookie
expiry timestamp. Session cookies are indicated by not having an
C<_expires> key.

Here is an example of a cookie jar with a single cookie, so you have a
chance of understanding the above paragraph:

   {
      version    => 2,
      "10.0.0.1" => {
         "/" => {
            "mythweb_id" => {
              _expires => 1293917923,
              value    => "ooRung9dThee3ooyXooM1Ohm",
            },
         },
      },
   }

=item $date = AnyEvent::HTTP::format_date $timestamp

Takes a POSIX timestamp (seconds since the epoch) and formats it as a HTTP
Date (RFC 2616).

=item $timestamp = AnyEvent::HTTP::parse_date $date

Takes a HTTP Date (RFC 2616) or a Cookie date (netscape cookie spec) or a
bunch of minor variations of those, and returns the corresponding POSIX
timestamp, or C<undef> if the date cannot be parsed.

=item $AnyEvent::HTTP::MAX_RECURSE

The default value for the C<recurse> request parameter (default: C<10>).

=item $AnyEvent::HTTP::TIMEOUT

The default timeout for connection operations (default: C<300>).

=item $AnyEvent::HTTP::USERAGENT

The default value for the C<User-Agent> header (the default is
C<Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>).

=item $AnyEvent::HTTP::MAX_PER_HOST

The maximum number of concurrent connections to the same host (identified
by the hostname). If the limit is exceeded, then additional requests
are queued until previous connections are closed. Both persistent and
non-persistent connections are counted in this limit.

The default value for this is C<4>, and it is highly advisable to not
increase it much.

For comparison: the RFC's recommend 4 non-persistent or 2 persistent
connections, older browsers used 2, newer ones (such as firefox 3)
typically use 6, and Opera uses 8 because like, they have the fastest
browser and give a shit for everybody else on the planet.

=item $AnyEvent::HTTP::PERSISTENT_TIMEOUT

The time after which idle persistent connections get closed by
AnyEvent::HTTP (default: C<3>).

=item $AnyEvent::HTTP::ACTIVE

The number of active connections. This is not the number of currently
running requests, but the number of currently open and non-idle TCP
connections. This number can be useful for load-leveling.

=back

=cut

our @month   = qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
our @weekday = qw(Sun Mon Tue Wed Thu Fri Sat);

sub format_date($) {
   my ($time) = @_;

   # RFC 822/1123 format
   my ($S, $M, $H, $mday, $mon, $year, $wday, $yday, undef) = gmtime $time;

   sprintf "%s, %02d %s %04d %02d:%02d:%02d GMT",
      $weekday[$wday], $mday, $month[$mon], $year + 1900,
      $H, $M, $S;
}



( run in 1.667 second using v1.01-cache-2.11-cpan-39bf76dae61 )