AnyEvent-HTTP
view release on metacpan or search on metacpan
original "Status" and "Reason" values from the header are available
as "OrigStatus" and "OrigReason".
The pseudo-header "URL" contains the actual URL (which can differ
from the requested URL when following redirects - for example, you
might get an error that your URL scheme is not supported even though
your URL is a valid http URL because it redirected to an ftp URL, in
which case you can look at the URL pseudo header).
The pseudo-header "Redirect" only exists when the request was a
result of an internal redirect. In that case it is an array
reference with the "($data, $headers)" from the redirect response.
Note that this response could in turn be the result of a redirect
itself, and "$headers->{Redirect}[1]{Redirect}" will then contain
the original response, and so on.
If the server sends a header multiple times, then their contents
will be joined together with a comma (","), as per the HTTP spec.
If an internal error occurs, such as not being able to resolve a
hostname, then $data will be "undef", "$headers->{Status}" will be
590-599 and the "Reason" pseudo-header will contain an error
message. Currently the following status codes are used:
595 - errors during connection establishment, proxy handshake.
596 - errors during TLS negotiation, request sending and header
processing.
597 - errors during body receiving or processing.
598 - user aborted request via "on_header" or "on_body".
599 - other, usually nonretryable, errors (garbled URL etc.).
A typical callback might look like this:
sub {
my ($body, $hdr) = @_;
if ($hdr->{Status} =~ /^2/) {
... everything should be ok
} else {
print "error, $hdr->{Status} $hdr->{Reason}\n";
}
}
Additional parameters are key-value pairs, and are fully optional.
They include:
recurse => $count (default: $MAX_RECURSE)
Whether to recurse requests or not, e.g. on redirects,
authentication and other retries and so on, and how often to do
so.
Only redirects to http and https URLs are supported. While most
common redirection forms are handled entirely within this
module, some require the use of the optional URI module. If it
is required but missing, then the request will fail with an
error.
headers => hashref
The request headers to use. Currently, "http_request" may
provide its own "Host:", "Content-Length:", "Connection:" and
"Cookie:" headers and will provide defaults at least for "TE:",
"Referer:" and "User-Agent:" (this can be suppressed by using
"undef" for these headers in which case they won't be sent at
all).
You really should provide your own "User-Agent:" header value
that is appropriate for your program - I wouldn't be surprised
if the default AnyEvent string gets blocked by webservers sooner
or later.
Also, make sure that your headers names and values do not
contain any embedded newlines.
timeout => $seconds
The time-out to use for various stages - each connect attempt
will reset the timeout, as will read or write activity, i.e.
this is not an overall timeout.
Default timeout is 5 minutes.
proxy => [$host, $port[, $scheme]] or undef
Use the given http proxy for all requests, or no proxy if
"undef" is used.
$scheme must be either missing or must be "http" for HTTP.
If not specified, then the default proxy is used (see
"AnyEvent::HTTP::set_proxy").
Currently, if your proxy requires authorization, you have to
specify an appropriate "Proxy-Authorization" header in every
request.
Note that this module will prefer an existing persistent
connection, even if that connection was made using another
proxy. If you need to ensure that a new connection is made in
this case, you can either force "persistent" to false or e.g.
use the proxy address in your "sessionid".
body => $string
The request body, usually empty. Will be sent as-is (future
versions of this module might offer more options).
cookie_jar => $hash_ref
Passing this parameter enables (simplified) cookie-processing,
loosely based on the original netscape specification.
The $hash_ref must be an (initially empty) hash reference which
will get updated automatically. It is possible to save the
cookie jar to persistent storage with something like JSON or
Storable - see the "AnyEvent::HTTP::cookie_jar_expire" function
if you wish to remove expired or session-only cookies, and also
for documentation on the format of the cookie jar.
Note that this cookie implementation is not meant to be
complete. If you want complete cookie management you have to do
that on your own. "cookie_jar" is meant as a quick fix to get
most cookie-using sites working. Cookies are a privacy disaster,
do not use them unless required to.
When cookie processing is enabled, the "Cookie:" and
"Set-Cookie:" headers will be set and handled by this module,
otherwise they will be left untouched.
tls_ctx => $scheme | $tls_ctx
Specifies the AnyEvent::TLS context to be used for https
connections. This parameter follows the same rules as the
"tls_ctx" parameter to AnyEvent::Handle, but additionally, the
two strings "low" or "high" can be specified, which give you a
predefined low-security (no verification, highest compatibility)
and high-security (CA and common-name verification) TLS context.
The default for this option is "low", which could be interpreted
as "give me the page, no matter what".
See also the "sessionid" parameter.
sessionid => $string
The module might reuse connections to the same host internally
(regardless of other settings, such as "tcp_connect" or
"proxy"). Sometimes (e.g. when using TLS or a specfic proxy),
you do not want to reuse connections from other sessions. This
can be achieved by setting this parameter to some unique ID
(such as the address of an object storing your state data or the
TLS context, or the proxy IP) - only connections using the same
unique ID will be reused.
on_prepare => $callback->($fh)
In rare cases you need to "tune" the socket before it is used to
connect (for example, to bind it on a given IP address). This
parameter overrides the prepare callback passed to
"AnyEvent::Socket::tcp_connect" and behaves exactly the same way
(e.g. it has to provide a timeout). See the description for the
$prepare_cb argument of "AnyEvent::Socket::tcp_connect" for
details.
tcp_connect => $callback->($host, $service, $connect_cb,
$prepare_cb)
In even rarer cases you want total control over how
AnyEvent::HTTP establishes connections. Normally it uses
AnyEvent::Socket::tcp_connect to do this, but you can provide
your own "tcp_connect" function - obviously, it has to follow
the same calling conventions, except that it may always return a
connection guard object.
The connections made by this hook will be treated as equivalent
to connections made the built-in way, specifically, they will be
put into and taken from the persistent connection cache. If your
$tcp_connect function is incompatible with this kind of re-use,
consider switching off "persistent" connections and/or providing
a "sessionid" identifier.
There are probably lots of weird uses for this function,
starting from tracing the hosts "http_request" actually tries to
connect, to (inexact but fast) host => IP address caching or
even socks protocol support.
on_header => $callback->($headers)
When specified, this callback will be called with the header
hash as soon as headers have been successfully received from the
remote server (not on locally-generated errors).
its own. If you want DNS caching, you currently have to provide your own
default resolver (by storing a suitable resolver object in
$AnyEvent::DNS::RESOLVER) or your own "tcp_connect" callback.
GLOBAL FUNCTIONS AND VARIABLES
AnyEvent::HTTP::set_proxy "proxy-url"
Sets the default proxy server to use. The proxy-url must begin with
a string of the form "http://host:port", croaks otherwise.
To clear an already-set proxy, use "undef".
When AnyEvent::HTTP is loaded for the first time it will query the
default proxy from the operating system, currently by looking at
"$ENV{http_proxy"}.
AnyEvent::HTTP::cookie_jar_expire $jar[, $session_end]
Remove all cookies from the cookie jar that have been expired. If
$session_end is given and true, then additionally remove all session
cookies.
You should call this function (with a true $session_end) before you
save cookies to disk, and you should call this function after
loading them again. If you have a long-running program you can
additionally call this function from time to time.
A cookie jar is initially an empty hash-reference that is managed by
this module. Its format is subject to change, but currently it is as
follows:
The key "version" has to contain 2, otherwise the hash gets cleared.
All other keys are hostnames or IP addresses pointing to
hash-references. The key for these inner hash references is the
server path for which this cookie is meant, and the values are again
hash-references. Each key of those hash-references is a cookie name,
and the value, you guessed it, is another hash-reference, this time
with the key-value pairs from the cookie, except for "expires" and
"max-age", which have been replaced by a "_expires" key that
contains the cookie expiry timestamp. Session cookies are indicated
by not having an "_expires" key.
Here is an example of a cookie jar with a single cookie, so you have
a chance of understanding the above paragraph:
{
version => 2,
"10.0.0.1" => {
"/" => {
"mythweb_id" => {
_expires => 1293917923,
value => "ooRung9dThee3ooyXooM1Ohm",
},
},
},
}
$date = AnyEvent::HTTP::format_date $timestamp
Takes a POSIX timestamp (seconds since the epoch) and formats it as
a HTTP Date (RFC 2616).
$timestamp = AnyEvent::HTTP::parse_date $date
Takes a HTTP Date (RFC 2616) or a Cookie date (netscape cookie spec)
or a bunch of minor variations of those, and returns the
corresponding POSIX timestamp, or "undef" if the date cannot be
parsed.
$AnyEvent::HTTP::MAX_RECURSE
The default value for the "recurse" request parameter (default: 10).
$AnyEvent::HTTP::TIMEOUT
The default timeout for connection operations (default: 300).
$AnyEvent::HTTP::USERAGENT
The default value for the "User-Agent" header (the default is
"Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION;
+http://software.schmorp.de/pkg/AnyEvent)").
$AnyEvent::HTTP::MAX_PER_HOST
The maximum number of concurrent connections to the same host
(identified by the hostname). If the limit is exceeded, then
additional requests are queued until previous connections are
closed. Both persistent and non-persistent connections are counted
in this limit.
The default value for this is 4, and it is highly advisable to not
increase it much.
For comparison: the RFC's recommend 4 non-persistent or 2 persistent
connections, older browsers used 2, newer ones (such as firefox 3)
typically use 6, and Opera uses 8 because like, they have the
fastest browser and give a shit for everybody else on the planet.
$AnyEvent::HTTP::PERSISTENT_TIMEOUT
The time after which idle persistent connections get closed by
AnyEvent::HTTP (default: 3).
$AnyEvent::HTTP::ACTIVE
The number of active connections. This is not the number of
currently running requests, but the number of currently open and
non-idle TCP connections. This number can be useful for
load-leveling.
SHOWCASE
This section contains some more elaborate "real-world" examples or code
snippets.
HTTP/1.1 FILE DOWNLOAD
Downloading files with HTTP can be quite tricky, especially when
something goes wrong and you want to resume.
Here is a function that initiates and resumes a download. It uses the
last modified time to check for file content changes, and works with
many HTTP/1.0 servers as well, and usually falls back to a complete
re-download on older servers.
It calls the completion callback with either "undef", which means a
nonretryable error occurred, 0 when the download was partial and should
be retried, and 1 if it was successful.
use AnyEvent::HTTP;
sub download($$$) {
( run in 1.196 second using v1.01-cache-2.11-cpan-39bf76dae61 )