• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

t/H14-May-2015-6042

COPYINGH A D26-May-200862 31

ChangesH A D14-May-201510.3 KiB226192

HTTP.pmH A D14-May-201549.8 KiB1,610629

MANIFESTH A D14-May-2015264 1211

META.jsonH A D14-May-2015930 4645

META.ymlH A D14-May-2015497 2625

Makefile.PLH A D08-Jun-2014491 2420

READMEH A D14-May-201526.5 KiB642506

README

1NAME
2    AnyEvent::HTTP - simple but non-blocking HTTP/HTTPS client
3
4SYNOPSIS
5       use AnyEvent::HTTP;
6
7       http_get "http://www.nethype.de/", sub { print $_[1] };
8
9       # ... do something else here
10
11DESCRIPTION
12    This module is an AnyEvent user, you need to make sure that you use and
13    run a supported event loop.
14
15    This module implements a simple, stateless and non-blocking HTTP client.
16    It supports GET, POST and other request methods, cookies and more, all
17    on a very low level. It can follow redirects, supports proxies, and
18    automatically limits the number of connections to the values specified
19    in the RFC.
20
21    It should generally be a "good client" that is enough for most HTTP
22    tasks. Simple tasks should be simple, but complex tasks should still be
23    possible as the user retains control over request and response headers.
24
25    The caller is responsible for authentication management, cookies (if the
26    simplistic implementation in this module doesn't suffice), referer and
27    other high-level protocol details for which this module offers only
28    limited support.
29
30  METHODS
31    http_get $url, key => value..., $cb->($data, $headers)
32        Executes an HTTP-GET request. See the http_request function for
33        details on additional parameters and the return value.
34
35    http_head $url, key => value..., $cb->($data, $headers)
36        Executes an HTTP-HEAD request. See the http_request function for
37        details on additional parameters and the return value.
38
39    http_post $url, $body, key => value..., $cb->($data, $headers)
40        Executes an HTTP-POST request with a request body of $body. See the
41        http_request function for details on additional parameters and the
42        return value.
43
44    http_request $method => $url, key => value..., $cb->($data, $headers)
45        Executes a HTTP request of type $method (e.g. "GET", "POST"). The
46        URL must be an absolute http or https URL.
47
48        When called in void context, nothing is returned. In other contexts,
49        "http_request" returns a "cancellation guard" - you have to keep the
50        object at least alive until the callback get called. If the object
51        gets destroyed before the callback is called, the request will be
52        cancelled.
53
54        The callback will be called with the response body data as first
55        argument (or "undef" if an error occurred), and a hash-ref with
56        response headers (and trailers) as second argument.
57
58        All the headers in that hash are lowercased. In addition to the
59        response headers, the "pseudo-headers" (uppercase to avoid clashing
60        with possible response headers) "HTTPVersion", "Status" and "Reason"
61        contain the three parts of the HTTP Status-Line of the same name. If
62        an error occurs during the body phase of a request, then the
63        original "Status" and "Reason" values from the header are available
64        as "OrigStatus" and "OrigReason".
65
66        The pseudo-header "URL" contains the actual URL (which can differ
67        from the requested URL when following redirects - for example, you
68        might get an error that your URL scheme is not supported even though
69        your URL is a valid http URL because it redirected to an ftp URL, in
70        which case you can look at the URL pseudo header).
71
72        The pseudo-header "Redirect" only exists when the request was a
73        result of an internal redirect. In that case it is an array
74        reference with the "($data, $headers)" from the redirect response.
75        Note that this response could in turn be the result of a redirect
76        itself, and "$headers->{Redirect}[1]{Redirect}" will then contain
77        the original response, and so on.
78
79        If the server sends a header multiple times, then their contents
80        will be joined together with a comma (","), as per the HTTP spec.
81
82        If an internal error occurs, such as not being able to resolve a
83        hostname, then $data will be "undef", "$headers->{Status}" will be
84        590-599 and the "Reason" pseudo-header will contain an error
85        message. Currently the following status codes are used:
86
87        595 - errors during connection establishment, proxy handshake.
88        596 - errors during TLS negotiation, request sending and header
89        processing.
90        597 - errors during body receiving or processing.
91        598 - user aborted request via "on_header" or "on_body".
92        599 - other, usually nonretryable, errors (garbled URL etc.).
93
94        A typical callback might look like this:
95
96           sub {
97              my ($body, $hdr) = @_;
98
99              if ($hdr->{Status} =~ /^2/) {
100                 ... everything should be ok
101              } else {
102                 print "error, $hdr->{Status} $hdr->{Reason}\n";
103              }
104           }
105
106        Additional parameters are key-value pairs, and are fully optional.
107        They include:
108
109        recurse => $count (default: $MAX_RECURSE)
110            Whether to recurse requests or not, e.g. on redirects,
111            authentication and other retries and so on, and how often to do
112            so.
113
114            Only redirects to http and https URLs are supported. While most
115            common redirection forms are handled entirely within this
116            module, some require the use of the optional URI module. If it
117            is required but missing, then the request will fail with an
118            error.
119
120        headers => hashref
121            The request headers to use. Currently, "http_request" may
122            provide its own "Host:", "Content-Length:", "Connection:" and
123            "Cookie:" headers and will provide defaults at least for "TE:",
124            "Referer:" and "User-Agent:" (this can be suppressed by using
125            "undef" for these headers in which case they won't be sent at
126            all).
127
128            You really should provide your own "User-Agent:" header value
129            that is appropriate for your program - I wouldn't be surprised
130            if the default AnyEvent string gets blocked by webservers sooner
131            or later.
132
133            Also, make sure that your headers names and values do not
134            contain any embedded newlines.
135
136        timeout => $seconds
137            The time-out to use for various stages - each connect attempt
138            will reset the timeout, as will read or write activity, i.e.
139            this is not an overall timeout.
140
141            Default timeout is 5 minutes.
142
143        proxy => [$host, $port[, $scheme]] or undef
144            Use the given http proxy for all requests, or no proxy if
145            "undef" is used.
146
147            $scheme must be either missing or must be "http" for HTTP.
148
149            If not specified, then the default proxy is used (see
150            "AnyEvent::HTTP::set_proxy").
151
152            Currently, if your proxy requires authorization, you have to
153            specify an appropriate "Proxy-Authorization" header in every
154            request.
155
156        body => $string
157            The request body, usually empty. Will be sent as-is (future
158            versions of this module might offer more options).
159
160        cookie_jar => $hash_ref
161            Passing this parameter enables (simplified) cookie-processing,
162            loosely based on the original netscape specification.
163
164            The $hash_ref must be an (initially empty) hash reference which
165            will get updated automatically. It is possible to save the
166            cookie jar to persistent storage with something like JSON or
167            Storable - see the "AnyEvent::HTTP::cookie_jar_expire" function
168            if you wish to remove expired or session-only cookies, and also
169            for documentation on the format of the cookie jar.
170
171            Note that this cookie implementation is not meant to be
172            complete. If you want complete cookie management you have to do
173            that on your own. "cookie_jar" is meant as a quick fix to get
174            most cookie-using sites working. Cookies are a privacy disaster,
175            do not use them unless required to.
176
177            When cookie processing is enabled, the "Cookie:" and
178            "Set-Cookie:" headers will be set and handled by this module,
179            otherwise they will be left untouched.
180
181        tls_ctx => $scheme | $tls_ctx
182            Specifies the AnyEvent::TLS context to be used for https
183            connections. This parameter follows the same rules as the
184            "tls_ctx" parameter to AnyEvent::Handle, but additionally, the
185            two strings "low" or "high" can be specified, which give you a
186            predefined low-security (no verification, highest compatibility)
187            and high-security (CA and common-name verification) TLS context.
188
189            The default for this option is "low", which could be interpreted
190            as "give me the page, no matter what".
191
192            See also the "sessionid" parameter.
193
194        session => $string
195            The module might reuse connections to the same host internally.
196            Sometimes (e.g. when using TLS), you do not want to reuse
197            connections from other sessions. This can be achieved by setting
198            this parameter to some unique ID (such as the address of an
199            object storing your state data, or the TLS context) - only
200            connections using the same unique ID will be reused.
201
202        on_prepare => $callback->($fh)
203            In rare cases you need to "tune" the socket before it is used to
204            connect (for example, to bind it on a given IP address). This
205            parameter overrides the prepare callback passed to
206            "AnyEvent::Socket::tcp_connect" and behaves exactly the same way
207            (e.g. it has to provide a timeout). See the description for the
208            $prepare_cb argument of "AnyEvent::Socket::tcp_connect" for
209            details.
210
211        tcp_connect => $callback->($host, $service, $connect_cb,
212        $prepare_cb)
213            In even rarer cases you want total control over how
214            AnyEvent::HTTP establishes connections. Normally it uses
215            AnyEvent::Socket::tcp_connect to do this, but you can provide
216            your own "tcp_connect" function - obviously, it has to follow
217            the same calling conventions, except that it may always return a
218            connection guard object.
219
220            There are probably lots of weird uses for this function,
221            starting from tracing the hosts "http_request" actually tries to
222            connect, to (inexact but fast) host => IP address caching or
223            even socks protocol support.
224
225        on_header => $callback->($headers)
226            When specified, this callback will be called with the header
227            hash as soon as headers have been successfully received from the
228            remote server (not on locally-generated errors).
229
230            It has to return either true (in which case AnyEvent::HTTP will
231            continue), or false, in which case AnyEvent::HTTP will cancel
232            the download (and call the finish callback with an error code of
233            598).
234
235            This callback is useful, among other things, to quickly reject
236            unwanted content, which, if it is supposed to be rare, can be
237            faster than first doing a "HEAD" request.
238
239            The downside is that cancelling the request makes it impossible
240            to re-use the connection. Also, the "on_header" callback will
241            not receive any trailer (headers sent after the response body).
242
243            Example: cancel the request unless the content-type is
244            "text/html".
245
246               on_header => sub {
247                  $_[0]{"content-type"} =~ /^text\/html\s*(?:;|$)/
248               },
249
250        on_body => $callback->($partial_body, $headers)
251            When specified, all body data will be passed to this callback
252            instead of to the completion callback. The completion callback
253            will get the empty string instead of the body data.
254
255            It has to return either true (in which case AnyEvent::HTTP will
256            continue), or false, in which case AnyEvent::HTTP will cancel
257            the download (and call the completion callback with an error
258            code of 598).
259
260            The downside to cancelling the request is that it makes it
261            impossible to re-use the connection.
262
263            This callback is useful when the data is too large to be held in
264            memory (so the callback writes it to a file) or when only some
265            information should be extracted, or when the body should be
266            processed incrementally.
267
268            It is usually preferred over doing your own body handling via
269            "want_body_handle", but in case of streaming APIs, where HTTP is
270            only used to create a connection, "want_body_handle" is the
271            better alternative, as it allows you to install your own event
272            handler, reducing resource usage.
273
274        want_body_handle => $enable
275            When enabled (default is disabled), the behaviour of
276            AnyEvent::HTTP changes considerably: after parsing the headers,
277            and instead of downloading the body (if any), the completion
278            callback will be called. Instead of the $body argument
279            containing the body data, the callback will receive the
280            AnyEvent::Handle object associated with the connection. In error
281            cases, "undef" will be passed. When there is no body (e.g.
282            status 304), the empty string will be passed.
283
284            The handle object might or might not be in TLS mode, might be
285            connected to a proxy, be a persistent connection, use chunked
286            transfer encoding etc., and configured in unspecified ways. The
287            user is responsible for this handle (it will not be used by this
288            module anymore).
289
290            This is useful with some push-type services, where, after the
291            initial headers, an interactive protocol is used (typical
292            example would be the push-style twitter API which starts a
293            JSON/XML stream).
294
295            If you think you need this, first have a look at "on_body", to
296            see if that doesn't solve your problem in a better way.
297
298        persistent => $boolean
299            Try to create/reuse a persistent connection. When this flag is
300            set (default: true for idempotent requests, false for all
301            others), then "http_request" tries to re-use an existing
302            (previously-created) persistent connection to the host and,
303            failing that, tries to create a new one.
304
305            Requests failing in certain ways will be automatically retried
306            once, which is dangerous for non-idempotent requests, which is
307            why it defaults to off for them. The reason for this is because
308            the bozos who designed HTTP/1.1 made it impossible to
309            distinguish between a fatal error and a normal connection
310            timeout, so you never know whether there was a problem with your
311            request or not.
312
313            When reusing an existent connection, many parameters (such as
314            TLS context) will be ignored. See the "session" parameter for a
315            workaround.
316
317        keepalive => $boolean
318            Only used when "persistent" is also true. This parameter decides
319            whether "http_request" tries to handshake a HTTP/1.0-style
320            keep-alive connection (as opposed to only a HTTP/1.1 persistent
321            connection).
322
323            The default is true, except when using a proxy, in which case it
324            defaults to false, as HTTP/1.0 proxies cannot support this in a
325            meaningful way.
326
327        handle_params => { key => value ... }
328            The key-value pairs in this hash will be passed to any
329            AnyEvent::Handle constructor that is called - not all requests
330            will create a handle, and sometimes more than one is created, so
331            this parameter is only good for setting hints.
332
333            Example: set the maximum read size to 4096, to potentially
334            conserve memory at the cost of speed.
335
336               handle_params => {
337                  max_read_size => 4096,
338               },
339
340        Example: do a simple HTTP GET request for http://www.nethype.de/ and
341        print the response body.
342
343           http_request GET => "http://www.nethype.de/", sub {
344              my ($body, $hdr) = @_;
345              print "$body\n";
346           };
347
348        Example: do a HTTP HEAD request on https://www.google.com/, use a
349        timeout of 30 seconds.
350
351           http_request
352              HEAD    => "https://www.google.com",
353              headers => { "user-agent" => "MySearchClient 1.0" },
354              timeout => 30,
355              sub {
356                 my ($body, $hdr) = @_;
357                 use Data::Dumper;
358                 print Dumper $hdr;
359              }
360           ;
361
362        Example: do another simple HTTP GET request, but immediately try to
363        cancel it.
364
365           my $request = http_request GET => "http://www.nethype.de/", sub {
366              my ($body, $hdr) = @_;
367              print "$body\n";
368           };
369
370           undef $request;
371
372  DNS CACHING
373    AnyEvent::HTTP uses the AnyEvent::Socket::tcp_connect function for the
374    actual connection, which in turn uses AnyEvent::DNS to resolve
375    hostnames. The latter is a simple stub resolver and does no caching on
376    its own. If you want DNS caching, you currently have to provide your own
377    default resolver (by storing a suitable resolver object in
378    $AnyEvent::DNS::RESOLVER) or your own "tcp_connect" callback.
379
380  GLOBAL FUNCTIONS AND VARIABLES
381    AnyEvent::HTTP::set_proxy "proxy-url"
382        Sets the default proxy server to use. The proxy-url must begin with
383        a string of the form "http://host:port", croaks otherwise.
384
385        To clear an already-set proxy, use "undef".
386
387        When AnyEvent::HTTP is loaded for the first time it will query the
388        default proxy from the operating system, currently by looking at
389        "$ENV{http_proxy"}.
390
391    AnyEvent::HTTP::cookie_jar_expire $jar[, $session_end]
392        Remove all cookies from the cookie jar that have been expired. If
393        $session_end is given and true, then additionally remove all session
394        cookies.
395
396        You should call this function (with a true $session_end) before you
397        save cookies to disk, and you should call this function after
398        loading them again. If you have a long-running program you can
399        additionally call this function from time to time.
400
401        A cookie jar is initially an empty hash-reference that is managed by
402        this module. Its format is subject to change, but currently it is as
403        follows:
404
405        The key "version" has to contain 1, otherwise the hash gets emptied.
406        All other keys are hostnames or IP addresses pointing to
407        hash-references. The key for these inner hash references is the
408        server path for which this cookie is meant, and the values are again
409        hash-references. Each key of those hash-references is a cookie name,
410        and the value, you guessed it, is another hash-reference, this time
411        with the key-value pairs from the cookie, except for "expires" and
412        "max-age", which have been replaced by a "_expires" key that
413        contains the cookie expiry timestamp. Session cookies are indicated
414        by not having an "_expires" key.
415
416        Here is an example of a cookie jar with a single cookie, so you have
417        a chance of understanding the above paragraph:
418
419           {
420              version    => 1,
421              "10.0.0.1" => {
422                 "/" => {
423                    "mythweb_id" => {
424                      _expires => 1293917923,
425                      value    => "ooRung9dThee3ooyXooM1Ohm",
426                    },
427                 },
428              },
429           }
430
431    $date = AnyEvent::HTTP::format_date $timestamp
432        Takes a POSIX timestamp (seconds since the epoch) and formats it as
433        a HTTP Date (RFC 2616).
434
435    $timestamp = AnyEvent::HTTP::parse_date $date
436        Takes a HTTP Date (RFC 2616) or a Cookie date (netscape cookie spec)
437        or a bunch of minor variations of those, and returns the
438        corresponding POSIX timestamp, or "undef" if the date cannot be
439        parsed.
440
441    $AnyEvent::HTTP::MAX_RECURSE
442        The default value for the "recurse" request parameter (default: 10).
443
444    $AnyEvent::HTTP::TIMEOUT
445        The default timeout for connection operations (default: 300).
446
447    $AnyEvent::HTTP::USERAGENT
448        The default value for the "User-Agent" header (the default is
449        "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION;
450        +http://software.schmorp.de/pkg/AnyEvent)").
451
452    $AnyEvent::HTTP::MAX_PER_HOST
453        The maximum number of concurrent connections to the same host
454        (identified by the hostname). If the limit is exceeded, then
455        additional requests are queued until previous connections are
456        closed. Both persistent and non-persistent connections are counted
457        in this limit.
458
459        The default value for this is 4, and it is highly advisable to not
460        increase it much.
461
462        For comparison: the RFC's recommend 4 non-persistent or 2 persistent
463        connections, older browsers used 2, newer ones (such as firefox 3)
464        typically use 6, and Opera uses 8 because like, they have the
465        fastest browser and give a shit for everybody else on the planet.
466
467    $AnyEvent::HTTP::PERSISTENT_TIMEOUT
468        The time after which idle persistent connections get closed by
469        AnyEvent::HTTP (default: 3).
470
471    $AnyEvent::HTTP::ACTIVE
472        The number of active connections. This is not the number of
473        currently running requests, but the number of currently open and
474        non-idle TCP connections. This number can be useful for
475        load-leveling.
476
477  SHOWCASE
478    This section contains some more elaborate "real-world" examples or code
479    snippets.
480
481  HTTP/1.1 FILE DOWNLOAD
482    Downloading files with HTTP can be quite tricky, especially when
483    something goes wrong and you want to resume.
484
485    Here is a function that initiates and resumes a download. It uses the
486    last modified time to check for file content changes, and works with
487    many HTTP/1.0 servers as well, and usually falls back to a complete
488    re-download on older servers.
489
490    It calls the completion callback with either "undef", which means a
491    nonretryable error occurred, 0 when the download was partial and should
492    be retried, and 1 if it was successful.
493
494       use AnyEvent::HTTP;
495
496       sub download($$$) {
497          my ($url, $file, $cb) = @_;
498
499          open my $fh, "+<", $file
500             or die "$file: $!";
501
502          my %hdr;
503          my $ofs = 0;
504
505          warn stat $fh;
506          warn -s _;
507          if (stat $fh and -s _) {
508             $ofs = -s _;
509             warn "-s is ", $ofs;
510             $hdr{"if-unmodified-since"} = AnyEvent::HTTP::format_date +(stat _)[9];
511             $hdr{"range"} = "bytes=$ofs-";
512          }
513
514          http_get $url,
515             headers   => \%hdr,
516             on_header => sub {
517                my ($hdr) = @_;
518
519                if ($hdr->{Status} == 200 && $ofs) {
520                   # resume failed
521                   truncate $fh, $ofs = 0;
522                }
523
524                sysseek $fh, $ofs, 0;
525
526                1
527             },
528             on_body   => sub {
529                my ($data, $hdr) = @_;
530
531                if ($hdr->{Status} =~ /^2/) {
532                   length $data == syswrite $fh, $data
533                      or return; # abort on write errors
534                }
535
536                1
537             },
538             sub {
539                my (undef, $hdr) = @_;
540
541                my $status = $hdr->{Status};
542
543                if (my $time = AnyEvent::HTTP::parse_date $hdr->{"last-modified"}) {
544                   utime $fh, $time, $time;
545                }
546
547                if ($status == 200 || $status == 206 || $status == 416) {
548                   # download ok || resume ok || file already fully downloaded
549                   $cb->(1, $hdr);
550
551                } elsif ($status == 412) {
552                   # file has changed while resuming, delete and retry
553                   unlink $file;
554                   $cb->(0, $hdr);
555
556                } elsif ($status == 500 or $status == 503 or $status =~ /^59/) {
557                   # retry later
558                   $cb->(0, $hdr);
559
560                } else {
561                   $cb->(undef, $hdr);
562                }
563             }
564          ;
565       }
566
567       download "http://server/somelargefile", "/tmp/somelargefile", sub {
568          if ($_[0]) {
569             print "OK!\n";
570          } elsif (defined $_[0]) {
571             print "please retry later\n";
572          } else {
573             print "ERROR\n";
574          }
575       };
576
577   SOCKS PROXIES
578    Socks proxies are not directly supported by AnyEvent::HTTP. You can
579    compile your perl to support socks, or use an external program such as
580    socksify (dante) or tsocks to make your program use a socks proxy
581    transparently.
582
583    Alternatively, for AnyEvent::HTTP only, you can use your own
584    "tcp_connect" function that does the proxy handshake - here is an
585    example that works with socks4a proxies:
586
587       use Errno;
588       use AnyEvent::Util;
589       use AnyEvent::Socket;
590       use AnyEvent::Handle;
591
592       # host, port and username of/for your socks4a proxy
593       my $socks_host = "10.0.0.23";
594       my $socks_port = 9050;
595       my $socks_user = "";
596
597       sub socks4a_connect {
598          my ($host, $port, $connect_cb, $prepare_cb) = @_;
599
600          my $hdl = new AnyEvent::Handle
601             connect    => [$socks_host, $socks_port],
602             on_prepare => sub { $prepare_cb->($_[0]{fh}) },
603             on_error   => sub { $connect_cb->() },
604          ;
605
606          $hdl->push_write (pack "CCnNZ*Z*", 4, 1, $port, 1, $socks_user, $host);
607
608          $hdl->push_read (chunk => 8, sub {
609             my ($hdl, $chunk) = @_;
610             my ($status, $port, $ipn) = unpack "xCna4", $chunk;
611
612             if ($status == 0x5a) {
613                $connect_cb->($hdl->{fh}, (format_address $ipn) . ":$port");
614             } else {
615                $! = Errno::ENXIO; $connect_cb->();
616             }
617          });
618
619          $hdl
620       }
621
622    Use "socks4a_connect" instead of "tcp_connect" when doing
623    "http_request"s, possibly after switching off other proxy types:
624
625       AnyEvent::HTTP::set_proxy undef; # usually you do not want other proxies
626
627       http_get 'http://www.google.com', tcp_connect => \&socks4a_connect, sub {
628          my ($data, $headers) = @_;
629          ...
630       };
631
632SEE ALSO
633    AnyEvent.
634
635AUTHOR
636       Marc Lehmann <schmorp@schmorp.de>
637       http://home.schmorp.de/
638
639    With many thanks to Дмитрий Шалашов, who provided countless testcases
640    and bugreports.
641
642