1:mod:`urllib.request` --- Extensible library for opening URLs 2============================================================= 3 4.. module:: urllib.request 5 :synopsis: Extensible library for opening URLs. 6 7.. moduleauthor:: Jeremy Hylton <jeremy@alum.mit.edu> 8.. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net> 9.. sectionauthor:: Senthil Kumaran <senthil@uthcode.com> 10 11**Source code:** :source:`Lib/urllib/request.py` 12 13-------------- 14 15The :mod:`urllib.request` module defines functions and classes which help in 16opening URLs (mostly HTTP) in a complex world --- basic and digest 17authentication, redirections, cookies and more. 18 19.. seealso:: 20 21 The `Requests package <https://requests.readthedocs.io/en/master/>`_ 22 is recommended for a higher-level HTTP client interface. 23 24 25The :mod:`urllib.request` module defines the following functions: 26 27 28.. function:: urlopen(url, data=None[, timeout], *, cafile=None, capath=None, cadefault=False, context=None) 29 30 Open the URL *url*, which can be either a string or a 31 :class:`Request` object. 32 33 *data* must be an object specifying additional data to be sent to the 34 server, or ``None`` if no such data is needed. See :class:`Request` 35 for details. 36 37 urllib.request module uses HTTP/1.1 and includes ``Connection:close`` header 38 in its HTTP requests. 39 40 The optional *timeout* parameter specifies a timeout in seconds for 41 blocking operations like the connection attempt (if not specified, 42 the global default timeout setting will be used). This actually 43 only works for HTTP, HTTPS and FTP connections. 44 45 If *context* is specified, it must be a :class:`ssl.SSLContext` instance 46 describing the various SSL options. See :class:`~http.client.HTTPSConnection` 47 for more details. 48 49 The optional *cafile* and *capath* parameters specify a set of trusted 50 CA certificates for HTTPS requests. *cafile* should point to a single 51 file containing a bundle of CA certificates, whereas *capath* should 52 point to a directory of hashed certificate files. More information can 53 be found in :meth:`ssl.SSLContext.load_verify_locations`. 54 55 The *cadefault* parameter is ignored. 56 57 This function always returns an object which can work as a 58 :term:`context manager` and has methods such as 59 60 * :meth:`~urllib.response.addinfourl.geturl` --- return the URL of the resource retrieved, 61 commonly used to determine if a redirect was followed 62 63 * :meth:`~urllib.response.addinfourl.info` --- return the meta-information of the page, such as headers, 64 in the form of an :func:`email.message_from_string` instance (see 65 `Quick Reference to HTTP Headers <http://jkorpela.fi/http.html>`_) 66 67 * :meth:`~urllib.response.addinfourl.getcode` -- return the HTTP status code of the response. 68 69 For HTTP and HTTPS URLs, this function returns a 70 :class:`http.client.HTTPResponse` object slightly modified. In addition 71 to the three new methods above, the msg attribute contains the 72 same information as the :attr:`~http.client.HTTPResponse.reason` 73 attribute --- the reason phrase returned by server --- instead of 74 the response headers as it is specified in the documentation for 75 :class:`~http.client.HTTPResponse`. 76 77 For FTP, file, and data URLs and requests explicitly handled by legacy 78 :class:`URLopener` and :class:`FancyURLopener` classes, this function 79 returns a :class:`urllib.response.addinfourl` object. 80 81 Raises :exc:`~urllib.error.URLError` on protocol errors. 82 83 Note that ``None`` may be returned if no handler handles the request (though 84 the default installed global :class:`OpenerDirector` uses 85 :class:`UnknownHandler` to ensure this never happens). 86 87 In addition, if proxy settings are detected (for example, when a ``*_proxy`` 88 environment variable like :envvar:`http_proxy` is set), 89 :class:`ProxyHandler` is default installed and makes sure the requests are 90 handled through the proxy. 91 92 The legacy ``urllib.urlopen`` function from Python 2.6 and earlier has been 93 discontinued; :func:`urllib.request.urlopen` corresponds to the old 94 ``urllib2.urlopen``. Proxy handling, which was done by passing a dictionary 95 parameter to ``urllib.urlopen``, can be obtained by using 96 :class:`ProxyHandler` objects. 97 98 .. audit-event:: urllib.Request fullurl,data,headers,method urllib.request.urlopen 99 100 The default opener raises an :ref:`auditing event <auditing>` 101 ``urllib.Request`` with arguments ``fullurl``, ``data``, ``headers``, 102 ``method`` taken from the request object. 103 104 .. versionchanged:: 3.2 105 *cafile* and *capath* were added. 106 107 .. versionchanged:: 3.2 108 HTTPS virtual hosts are now supported if possible (that is, if 109 :data:`ssl.HAS_SNI` is true). 110 111 .. versionadded:: 3.2 112 *data* can be an iterable object. 113 114 .. versionchanged:: 3.3 115 *cadefault* was added. 116 117 .. versionchanged:: 3.4.3 118 *context* was added. 119 120 .. deprecated:: 3.6 121 122 *cafile*, *capath* and *cadefault* are deprecated in favor of *context*. 123 Please use :meth:`ssl.SSLContext.load_cert_chain` instead, or let 124 :func:`ssl.create_default_context` select the system's trusted CA 125 certificates for you. 126 127 128.. function:: install_opener(opener) 129 130 Install an :class:`OpenerDirector` instance as the default global opener. 131 Installing an opener is only necessary if you want urlopen to use that 132 opener; otherwise, simply call :meth:`OpenerDirector.open` instead of 133 :func:`~urllib.request.urlopen`. The code does not check for a real 134 :class:`OpenerDirector`, and any class with the appropriate interface will 135 work. 136 137 138.. function:: build_opener([handler, ...]) 139 140 Return an :class:`OpenerDirector` instance, which chains the handlers in the 141 order given. *handler*\s can be either instances of :class:`BaseHandler`, or 142 subclasses of :class:`BaseHandler` (in which case it must be possible to call 143 the constructor without any parameters). Instances of the following classes 144 will be in front of the *handler*\s, unless the *handler*\s contain them, 145 instances of them or subclasses of them: :class:`ProxyHandler` (if proxy 146 settings are detected), :class:`UnknownHandler`, :class:`HTTPHandler`, 147 :class:`HTTPDefaultErrorHandler`, :class:`HTTPRedirectHandler`, 148 :class:`FTPHandler`, :class:`FileHandler`, :class:`HTTPErrorProcessor`. 149 150 If the Python installation has SSL support (i.e., if the :mod:`ssl` module 151 can be imported), :class:`HTTPSHandler` will also be added. 152 153 A :class:`BaseHandler` subclass may also change its :attr:`handler_order` 154 attribute to modify its position in the handlers list. 155 156 157.. function:: pathname2url(path) 158 159 Convert the pathname *path* from the local syntax for a path to the form used in 160 the path component of a URL. This does not produce a complete URL. The return 161 value will already be quoted using the :func:`~urllib.parse.quote` function. 162 163 164.. function:: url2pathname(path) 165 166 Convert the path component *path* from a percent-encoded URL to the local syntax for a 167 path. This does not accept a complete URL. This function uses 168 :func:`~urllib.parse.unquote` to decode *path*. 169 170.. function:: getproxies() 171 172 This helper function returns a dictionary of scheme to proxy server URL 173 mappings. It scans the environment for variables named ``<scheme>_proxy``, 174 in a case insensitive approach, for all operating systems first, and when it 175 cannot find it, looks for proxy information from Mac OSX System 176 Configuration for Mac OS X and Windows Systems Registry for Windows. 177 If both lowercase and uppercase environment variables exist (and disagree), 178 lowercase is preferred. 179 180 .. note:: 181 182 If the environment variable ``REQUEST_METHOD`` is set, which usually 183 indicates your script is running in a CGI environment, the environment 184 variable ``HTTP_PROXY`` (uppercase ``_PROXY``) will be ignored. This is 185 because that variable can be injected by a client using the "Proxy:" HTTP 186 header. If you need to use an HTTP proxy in a CGI environment, either use 187 ``ProxyHandler`` explicitly, or make sure the variable name is in 188 lowercase (or at least the ``_proxy`` suffix). 189 190 191The following classes are provided: 192 193.. class:: Request(url, data=None, headers={}, origin_req_host=None, unverifiable=False, method=None) 194 195 This class is an abstraction of a URL request. 196 197 *url* should be a string containing a valid URL. 198 199 *data* must be an object specifying additional data to send to the 200 server, or ``None`` if no such data is needed. Currently HTTP 201 requests are the only ones that use *data*. The supported object 202 types include bytes, file-like objects, and iterables of bytes-like objects. 203 If no ``Content-Length`` nor ``Transfer-Encoding`` header field 204 has been provided, :class:`HTTPHandler` will set these headers according 205 to the type of *data*. ``Content-Length`` will be used to send 206 bytes objects, while ``Transfer-Encoding: chunked`` as specified in 207 :rfc:`7230`, Section 3.3.1 will be used to send files and other iterables. 208 209 For an HTTP POST request method, *data* should be a buffer in the 210 standard :mimetype:`application/x-www-form-urlencoded` format. The 211 :func:`urllib.parse.urlencode` function takes a mapping or sequence 212 of 2-tuples and returns an ASCII string in this format. It should 213 be encoded to bytes before being used as the *data* parameter. 214 215 *headers* should be a dictionary, and will be treated as if 216 :meth:`add_header` was called with each key and value as arguments. 217 This is often used to "spoof" the ``User-Agent`` header value, which is 218 used by a browser to identify itself -- some HTTP servers only 219 allow requests coming from common browsers as opposed to scripts. 220 For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0 221 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while 222 :mod:`urllib`'s default user agent string is 223 ``"Python-urllib/2.6"`` (on Python 2.6). 224 225 An appropriate ``Content-Type`` header should be included if the *data* 226 argument is present. If this header has not been provided and *data* 227 is not None, ``Content-Type: application/x-www-form-urlencoded`` will 228 be added as a default. 229 230 The next two arguments are only of interest for correct handling 231 of third-party HTTP cookies: 232 233 *origin_req_host* should be the request-host of the origin 234 transaction, as defined by :rfc:`2965`. It defaults to 235 ``http.cookiejar.request_host(self)``. This is the host name or IP 236 address of the original request that was initiated by the user. 237 For example, if the request is for an image in an HTML document, 238 this should be the request-host of the request for the page 239 containing the image. 240 241 *unverifiable* should indicate whether the request is unverifiable, 242 as defined by :rfc:`2965`. It defaults to ``False``. An unverifiable 243 request is one whose URL the user did not have the option to 244 approve. For example, if the request is for an image in an HTML 245 document, and the user had no option to approve the automatic 246 fetching of the image, this should be true. 247 248 *method* should be a string that indicates the HTTP request method that 249 will be used (e.g. ``'HEAD'``). If provided, its value is stored in the 250 :attr:`~Request.method` attribute and is used by :meth:`get_method()`. 251 The default is ``'GET'`` if *data* is ``None`` or ``'POST'`` otherwise. 252 Subclasses may indicate a different default method by setting the 253 :attr:`~Request.method` attribute in the class itself. 254 255 .. note:: 256 The request will not work as expected if the data object is unable 257 to deliver its content more than once (e.g. a file or an iterable 258 that can produce the content only once) and the request is retried 259 for HTTP redirects or authentication. The *data* is sent to the 260 HTTP server right away after the headers. There is no support for 261 a 100-continue expectation in the library. 262 263 .. versionchanged:: 3.3 264 :attr:`Request.method` argument is added to the Request class. 265 266 .. versionchanged:: 3.4 267 Default :attr:`Request.method` may be indicated at the class level. 268 269 .. versionchanged:: 3.6 270 Do not raise an error if the ``Content-Length`` has not been 271 provided and *data* is neither ``None`` nor a bytes object. 272 Fall back to use chunked transfer encoding instead. 273 274.. class:: OpenerDirector() 275 276 The :class:`OpenerDirector` class opens URLs via :class:`BaseHandler`\ s chained 277 together. It manages the chaining of handlers, and recovery from errors. 278 279 280.. class:: BaseHandler() 281 282 This is the base class for all registered handlers --- and handles only the 283 simple mechanics of registration. 284 285 286.. class:: HTTPDefaultErrorHandler() 287 288 A class which defines a default handler for HTTP error responses; all responses 289 are turned into :exc:`~urllib.error.HTTPError` exceptions. 290 291 292.. class:: HTTPRedirectHandler() 293 294 A class to handle redirections. 295 296 297.. class:: HTTPCookieProcessor(cookiejar=None) 298 299 A class to handle HTTP Cookies. 300 301 302.. class:: ProxyHandler(proxies=None) 303 304 Cause requests to go through a proxy. If *proxies* is given, it must be a 305 dictionary mapping protocol names to URLs of proxies. The default is to read 306 the list of proxies from the environment variables 307 ``<protocol>_proxy``. If no proxy environment variables are set, then 308 in a Windows environment proxy settings are obtained from the registry's 309 Internet Settings section, and in a Mac OS X environment proxy information 310 is retrieved from the OS X System Configuration Framework. 311 312 To disable autodetected proxy pass an empty dictionary. 313 314 The :envvar:`no_proxy` environment variable can be used to specify hosts 315 which shouldn't be reached via proxy; if set, it should be a comma-separated 316 list of hostname suffixes, optionally with ``:port`` appended, for example 317 ``cern.ch,ncsa.uiuc.edu,some.host:8080``. 318 319 .. note:: 320 321 ``HTTP_PROXY`` will be ignored if a variable ``REQUEST_METHOD`` is set; 322 see the documentation on :func:`~urllib.request.getproxies`. 323 324 325.. class:: HTTPPasswordMgr() 326 327 Keep a database of ``(realm, uri) -> (user, password)`` mappings. 328 329 330.. class:: HTTPPasswordMgrWithDefaultRealm() 331 332 Keep a database of ``(realm, uri) -> (user, password)`` mappings. A realm of 333 ``None`` is considered a catch-all realm, which is searched if no other realm 334 fits. 335 336 337.. class:: HTTPPasswordMgrWithPriorAuth() 338 339 A variant of :class:`HTTPPasswordMgrWithDefaultRealm` that also has a 340 database of ``uri -> is_authenticated`` mappings. Can be used by a 341 BasicAuth handler to determine when to send authentication credentials 342 immediately instead of waiting for a ``401`` response first. 343 344 .. versionadded:: 3.5 345 346 347.. class:: AbstractBasicAuthHandler(password_mgr=None) 348 349 This is a mixin class that helps with HTTP authentication, both to the remote 350 host and to a proxy. *password_mgr*, if given, should be something that is 351 compatible with :class:`HTTPPasswordMgr`; refer to section 352 :ref:`http-password-mgr` for information on the interface that must be 353 supported. If *passwd_mgr* also provides ``is_authenticated`` and 354 ``update_authenticated`` methods (see 355 :ref:`http-password-mgr-with-prior-auth`), then the handler will use the 356 ``is_authenticated`` result for a given URI to determine whether or not to 357 send authentication credentials with the request. If ``is_authenticated`` 358 returns ``True`` for the URI, credentials are sent. If ``is_authenticated`` 359 is ``False``, credentials are not sent, and then if a ``401`` response is 360 received the request is re-sent with the authentication credentials. If 361 authentication succeeds, ``update_authenticated`` is called to set 362 ``is_authenticated`` ``True`` for the URI, so that subsequent requests to 363 the URI or any of its super-URIs will automatically include the 364 authentication credentials. 365 366 .. versionadded:: 3.5 367 Added ``is_authenticated`` support. 368 369 370.. class:: HTTPBasicAuthHandler(password_mgr=None) 371 372 Handle authentication with the remote host. *password_mgr*, if given, should 373 be something that is compatible with :class:`HTTPPasswordMgr`; refer to 374 section :ref:`http-password-mgr` for information on the interface that must 375 be supported. HTTPBasicAuthHandler will raise a :exc:`ValueError` when 376 presented with a wrong Authentication scheme. 377 378 379.. class:: ProxyBasicAuthHandler(password_mgr=None) 380 381 Handle authentication with the proxy. *password_mgr*, if given, should be 382 something that is compatible with :class:`HTTPPasswordMgr`; refer to section 383 :ref:`http-password-mgr` for information on the interface that must be 384 supported. 385 386 387.. class:: AbstractDigestAuthHandler(password_mgr=None) 388 389 This is a mixin class that helps with HTTP authentication, both to the remote 390 host and to a proxy. *password_mgr*, if given, should be something that is 391 compatible with :class:`HTTPPasswordMgr`; refer to section 392 :ref:`http-password-mgr` for information on the interface that must be 393 supported. 394 395 396.. class:: HTTPDigestAuthHandler(password_mgr=None) 397 398 Handle authentication with the remote host. *password_mgr*, if given, should 399 be something that is compatible with :class:`HTTPPasswordMgr`; refer to 400 section :ref:`http-password-mgr` for information on the interface that must 401 be supported. When both Digest Authentication Handler and Basic 402 Authentication Handler are both added, Digest Authentication is always tried 403 first. If the Digest Authentication returns a 40x response again, it is sent 404 to Basic Authentication handler to Handle. This Handler method will raise a 405 :exc:`ValueError` when presented with an authentication scheme other than 406 Digest or Basic. 407 408 .. versionchanged:: 3.3 409 Raise :exc:`ValueError` on unsupported Authentication Scheme. 410 411 412 413.. class:: ProxyDigestAuthHandler(password_mgr=None) 414 415 Handle authentication with the proxy. *password_mgr*, if given, should be 416 something that is compatible with :class:`HTTPPasswordMgr`; refer to section 417 :ref:`http-password-mgr` for information on the interface that must be 418 supported. 419 420 421.. class:: HTTPHandler() 422 423 A class to handle opening of HTTP URLs. 424 425 426.. class:: HTTPSHandler(debuglevel=0, context=None, check_hostname=None) 427 428 A class to handle opening of HTTPS URLs. *context* and *check_hostname* 429 have the same meaning as in :class:`http.client.HTTPSConnection`. 430 431 .. versionchanged:: 3.2 432 *context* and *check_hostname* were added. 433 434 435.. class:: FileHandler() 436 437 Open local files. 438 439.. class:: DataHandler() 440 441 Open data URLs. 442 443 .. versionadded:: 3.4 444 445.. class:: FTPHandler() 446 447 Open FTP URLs. 448 449 450.. class:: CacheFTPHandler() 451 452 Open FTP URLs, keeping a cache of open FTP connections to minimize delays. 453 454 455.. class:: UnknownHandler() 456 457 A catch-all class to handle unknown URLs. 458 459 460.. class:: HTTPErrorProcessor() 461 462 Process HTTP error responses. 463 464 465.. _request-objects: 466 467Request Objects 468--------------- 469 470The following methods describe :class:`Request`'s public interface, 471and so all may be overridden in subclasses. It also defines several 472public attributes that can be used by clients to inspect the parsed 473request. 474 475.. attribute:: Request.full_url 476 477 The original URL passed to the constructor. 478 479 .. versionchanged:: 3.4 480 481 Request.full_url is a property with setter, getter and a deleter. Getting 482 :attr:`~Request.full_url` returns the original request URL with the 483 fragment, if it was present. 484 485.. attribute:: Request.type 486 487 The URI scheme. 488 489.. attribute:: Request.host 490 491 The URI authority, typically a host, but may also contain a port 492 separated by a colon. 493 494.. attribute:: Request.origin_req_host 495 496 The original host for the request, without port. 497 498.. attribute:: Request.selector 499 500 The URI path. If the :class:`Request` uses a proxy, then selector 501 will be the full URL that is passed to the proxy. 502 503.. attribute:: Request.data 504 505 The entity body for the request, or ``None`` if not specified. 506 507 .. versionchanged:: 3.4 508 Changing value of :attr:`Request.data` now deletes "Content-Length" 509 header if it was previously set or calculated. 510 511.. attribute:: Request.unverifiable 512 513 boolean, indicates whether the request is unverifiable as defined 514 by :rfc:`2965`. 515 516.. attribute:: Request.method 517 518 The HTTP request method to use. By default its value is :const:`None`, 519 which means that :meth:`~Request.get_method` will do its normal computation 520 of the method to be used. Its value can be set (thus overriding the default 521 computation in :meth:`~Request.get_method`) either by providing a default 522 value by setting it at the class level in a :class:`Request` subclass, or by 523 passing a value in to the :class:`Request` constructor via the *method* 524 argument. 525 526 .. versionadded:: 3.3 527 528 .. versionchanged:: 3.4 529 A default value can now be set in subclasses; previously it could only 530 be set via the constructor argument. 531 532 533.. method:: Request.get_method() 534 535 Return a string indicating the HTTP request method. If 536 :attr:`Request.method` is not ``None``, return its value, otherwise return 537 ``'GET'`` if :attr:`Request.data` is ``None``, or ``'POST'`` if it's not. 538 This is only meaningful for HTTP requests. 539 540 .. versionchanged:: 3.3 541 get_method now looks at the value of :attr:`Request.method`. 542 543 544.. method:: Request.add_header(key, val) 545 546 Add another header to the request. Headers are currently ignored by all 547 handlers except HTTP handlers, where they are added to the list of headers sent 548 to the server. Note that there cannot be more than one header with the same 549 name, and later calls will overwrite previous calls in case the *key* collides. 550 Currently, this is no loss of HTTP functionality, since all headers which have 551 meaning when used more than once have a (header-specific) way of gaining the 552 same functionality using only one header. 553 554 555.. method:: Request.add_unredirected_header(key, header) 556 557 Add a header that will not be added to a redirected request. 558 559 560.. method:: Request.has_header(header) 561 562 Return whether the instance has the named header (checks both regular and 563 unredirected). 564 565 566.. method:: Request.remove_header(header) 567 568 Remove named header from the request instance (both from regular and 569 unredirected headers). 570 571 .. versionadded:: 3.4 572 573 574.. method:: Request.get_full_url() 575 576 Return the URL given in the constructor. 577 578 .. versionchanged:: 3.4 579 580 Returns :attr:`Request.full_url` 581 582 583.. method:: Request.set_proxy(host, type) 584 585 Prepare the request by connecting to a proxy server. The *host* and *type* will 586 replace those of the instance, and the instance's selector will be the original 587 URL given in the constructor. 588 589 590.. method:: Request.get_header(header_name, default=None) 591 592 Return the value of the given header. If the header is not present, return 593 the default value. 594 595 596.. method:: Request.header_items() 597 598 Return a list of tuples (header_name, header_value) of the Request headers. 599 600.. versionchanged:: 3.4 601 The request methods add_data, has_data, get_data, get_type, get_host, 602 get_selector, get_origin_req_host and is_unverifiable that were deprecated 603 since 3.3 have been removed. 604 605 606.. _opener-director-objects: 607 608OpenerDirector Objects 609---------------------- 610 611:class:`OpenerDirector` instances have the following methods: 612 613 614.. method:: OpenerDirector.add_handler(handler) 615 616 *handler* should be an instance of :class:`BaseHandler`. The following methods 617 are searched, and added to the possible chains (note that HTTP errors are a 618 special case). Note that, in the following, *protocol* should be replaced 619 with the actual protocol to handle, for example :meth:`http_response` would 620 be the HTTP protocol response handler. Also *type* should be replaced with 621 the actual HTTP code, for example :meth:`http_error_404` would handle HTTP 622 404 errors. 623 624 * :meth:`<protocol>_open` --- signal that the handler knows how to open *protocol* 625 URLs. 626 627 See |protocol_open|_ for more information. 628 629 * :meth:`http_error_\<type\>` --- signal that the handler knows how to handle HTTP 630 errors with HTTP error code *type*. 631 632 See |http_error_nnn|_ for more information. 633 634 * :meth:`<protocol>_error` --- signal that the handler knows how to handle errors 635 from (non-\ ``http``) *protocol*. 636 637 * :meth:`<protocol>_request` --- signal that the handler knows how to pre-process 638 *protocol* requests. 639 640 See |protocol_request|_ for more information. 641 642 * :meth:`<protocol>_response` --- signal that the handler knows how to 643 post-process *protocol* responses. 644 645 See |protocol_response|_ for more information. 646 647.. |protocol_open| replace:: :meth:`BaseHandler.<protocol>_open` 648.. |http_error_nnn| replace:: :meth:`BaseHandler.http_error_\<nnn\>` 649.. |protocol_request| replace:: :meth:`BaseHandler.<protocol>_request` 650.. |protocol_response| replace:: :meth:`BaseHandler.<protocol>_response` 651 652.. method:: OpenerDirector.open(url, data=None[, timeout]) 653 654 Open the given *url* (which can be a request object or a string), optionally 655 passing the given *data*. Arguments, return values and exceptions raised are 656 the same as those of :func:`urlopen` (which simply calls the :meth:`open` 657 method on the currently installed global :class:`OpenerDirector`). The 658 optional *timeout* parameter specifies a timeout in seconds for blocking 659 operations like the connection attempt (if not specified, the global default 660 timeout setting will be used). The timeout feature actually works only for 661 HTTP, HTTPS and FTP connections). 662 663 664.. method:: OpenerDirector.error(proto, *args) 665 666 Handle an error of the given protocol. This will call the registered error 667 handlers for the given protocol with the given arguments (which are protocol 668 specific). The HTTP protocol is a special case which uses the HTTP response 669 code to determine the specific error handler; refer to the :meth:`http_error_\<type\>` 670 methods of the handler classes. 671 672 Return values and exceptions raised are the same as those of :func:`urlopen`. 673 674OpenerDirector objects open URLs in three stages: 675 676The order in which these methods are called within each stage is determined by 677sorting the handler instances. 678 679#. Every handler with a method named like :meth:`<protocol>_request` has that 680 method called to pre-process the request. 681 682#. Handlers with a method named like :meth:`<protocol>_open` are called to handle 683 the request. This stage ends when a handler either returns a non-\ :const:`None` 684 value (ie. a response), or raises an exception (usually 685 :exc:`~urllib.error.URLError`). Exceptions are allowed to propagate. 686 687 In fact, the above algorithm is first tried for methods named 688 :meth:`default_open`. If all such methods return :const:`None`, the algorithm 689 is repeated for methods named like :meth:`<protocol>_open`. If all such methods 690 return :const:`None`, the algorithm is repeated for methods named 691 :meth:`unknown_open`. 692 693 Note that the implementation of these methods may involve calls of the parent 694 :class:`OpenerDirector` instance's :meth:`~OpenerDirector.open` and 695 :meth:`~OpenerDirector.error` methods. 696 697#. Every handler with a method named like :meth:`<protocol>_response` has that 698 method called to post-process the response. 699 700 701.. _base-handler-objects: 702 703BaseHandler Objects 704------------------- 705 706:class:`BaseHandler` objects provide a couple of methods that are directly 707useful, and others that are meant to be used by derived classes. These are 708intended for direct use: 709 710 711.. method:: BaseHandler.add_parent(director) 712 713 Add a director as parent. 714 715 716.. method:: BaseHandler.close() 717 718 Remove any parents. 719 720The following attribute and methods should only be used by classes derived from 721:class:`BaseHandler`. 722 723.. note:: 724 725 The convention has been adopted that subclasses defining 726 :meth:`<protocol>_request` or :meth:`<protocol>_response` methods are named 727 :class:`\*Processor`; all others are named :class:`\*Handler`. 728 729 730.. attribute:: BaseHandler.parent 731 732 A valid :class:`OpenerDirector`, which can be used to open using a different 733 protocol, or handle errors. 734 735 736.. method:: BaseHandler.default_open(req) 737 738 This method is *not* defined in :class:`BaseHandler`, but subclasses should 739 define it if they want to catch all URLs. 740 741 This method, if implemented, will be called by the parent 742 :class:`OpenerDirector`. It should return a file-like object as described in 743 the return value of the :meth:`open` of :class:`OpenerDirector`, or ``None``. 744 It should raise :exc:`~urllib.error.URLError`, unless a truly exceptional 745 thing happens (for example, :exc:`MemoryError` should not be mapped to 746 :exc:`URLError`). 747 748 This method will be called before any protocol-specific open method. 749 750 751.. _protocol_open: 752.. method:: BaseHandler.<protocol>_open(req) 753 :noindex: 754 755 This method is *not* defined in :class:`BaseHandler`, but subclasses should 756 define it if they want to handle URLs with the given protocol. 757 758 This method, if defined, will be called by the parent :class:`OpenerDirector`. 759 Return values should be the same as for :meth:`default_open`. 760 761 762.. method:: BaseHandler.unknown_open(req) 763 764 This method is *not* defined in :class:`BaseHandler`, but subclasses should 765 define it if they want to catch all URLs with no specific registered handler to 766 open it. 767 768 This method, if implemented, will be called by the :attr:`parent` 769 :class:`OpenerDirector`. Return values should be the same as for 770 :meth:`default_open`. 771 772 773.. method:: BaseHandler.http_error_default(req, fp, code, msg, hdrs) 774 775 This method is *not* defined in :class:`BaseHandler`, but subclasses should 776 override it if they intend to provide a catch-all for otherwise unhandled HTTP 777 errors. It will be called automatically by the :class:`OpenerDirector` getting 778 the error, and should not normally be called in other circumstances. 779 780 *req* will be a :class:`Request` object, *fp* will be a file-like object with 781 the HTTP error body, *code* will be the three-digit code of the error, *msg* 782 will be the user-visible explanation of the code and *hdrs* will be a mapping 783 object with the headers of the error. 784 785 Return values and exceptions raised should be the same as those of 786 :func:`urlopen`. 787 788 789.. _http_error_nnn: 790.. method:: BaseHandler.http_error_<nnn>(req, fp, code, msg, hdrs) 791 792 *nnn* should be a three-digit HTTP error code. This method is also not defined 793 in :class:`BaseHandler`, but will be called, if it exists, on an instance of a 794 subclass, when an HTTP error with code *nnn* occurs. 795 796 Subclasses should override this method to handle specific HTTP errors. 797 798 Arguments, return values and exceptions raised should be the same as for 799 :meth:`http_error_default`. 800 801 802.. _protocol_request: 803.. method:: BaseHandler.<protocol>_request(req) 804 :noindex: 805 806 This method is *not* defined in :class:`BaseHandler`, but subclasses should 807 define it if they want to pre-process requests of the given protocol. 808 809 This method, if defined, will be called by the parent :class:`OpenerDirector`. 810 *req* will be a :class:`Request` object. The return value should be a 811 :class:`Request` object. 812 813 814.. _protocol_response: 815.. method:: BaseHandler.<protocol>_response(req, response) 816 :noindex: 817 818 This method is *not* defined in :class:`BaseHandler`, but subclasses should 819 define it if they want to post-process responses of the given protocol. 820 821 This method, if defined, will be called by the parent :class:`OpenerDirector`. 822 *req* will be a :class:`Request` object. *response* will be an object 823 implementing the same interface as the return value of :func:`urlopen`. The 824 return value should implement the same interface as the return value of 825 :func:`urlopen`. 826 827 828.. _http-redirect-handler: 829 830HTTPRedirectHandler Objects 831--------------------------- 832 833.. note:: 834 835 Some HTTP redirections require action from this module's client code. If this 836 is the case, :exc:`~urllib.error.HTTPError` is raised. See :rfc:`2616` for 837 details of the precise meanings of the various redirection codes. 838 839 An :class:`HTTPError` exception raised as a security consideration if the 840 HTTPRedirectHandler is presented with a redirected URL which is not an HTTP, 841 HTTPS or FTP URL. 842 843 844.. method:: HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs, newurl) 845 846 Return a :class:`Request` or ``None`` in response to a redirect. This is called 847 by the default implementations of the :meth:`http_error_30\*` methods when a 848 redirection is received from the server. If a redirection should take place, 849 return a new :class:`Request` to allow :meth:`http_error_30\*` to perform the 850 redirect to *newurl*. Otherwise, raise :exc:`~urllib.error.HTTPError` if 851 no other handler should try to handle this URL, or return ``None`` if you 852 can't but another handler might. 853 854 .. note:: 855 856 The default implementation of this method does not strictly follow :rfc:`2616`, 857 which says that 301 and 302 responses to ``POST`` requests must not be 858 automatically redirected without confirmation by the user. In reality, browsers 859 do allow automatic redirection of these responses, changing the POST to a 860 ``GET``, and the default implementation reproduces this behavior. 861 862 863.. method:: HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs) 864 865 Redirect to the ``Location:`` or ``URI:`` URL. This method is called by the 866 parent :class:`OpenerDirector` when getting an HTTP 'moved permanently' response. 867 868 869.. method:: HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs) 870 871 The same as :meth:`http_error_301`, but called for the 'found' response. 872 873 874.. method:: HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs) 875 876 The same as :meth:`http_error_301`, but called for the 'see other' response. 877 878 879.. method:: HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs) 880 881 The same as :meth:`http_error_301`, but called for the 'temporary redirect' 882 response. 883 884 885.. _http-cookie-processor: 886 887HTTPCookieProcessor Objects 888--------------------------- 889 890:class:`HTTPCookieProcessor` instances have one attribute: 891 892.. attribute:: HTTPCookieProcessor.cookiejar 893 894 The :class:`http.cookiejar.CookieJar` in which cookies are stored. 895 896 897.. _proxy-handler: 898 899ProxyHandler Objects 900-------------------- 901 902 903.. method:: ProxyHandler.<protocol>_open(request) 904 :noindex: 905 906 The :class:`ProxyHandler` will have a method :meth:`<protocol>_open` for every 907 *protocol* which has a proxy in the *proxies* dictionary given in the 908 constructor. The method will modify requests to go through the proxy, by 909 calling ``request.set_proxy()``, and call the next handler in the chain to 910 actually execute the protocol. 911 912 913.. _http-password-mgr: 914 915HTTPPasswordMgr Objects 916----------------------- 917 918These methods are available on :class:`HTTPPasswordMgr` and 919:class:`HTTPPasswordMgrWithDefaultRealm` objects. 920 921 922.. method:: HTTPPasswordMgr.add_password(realm, uri, user, passwd) 923 924 *uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and 925 *passwd* must be strings. This causes ``(user, passwd)`` to be used as 926 authentication tokens when authentication for *realm* and a super-URI of any of 927 the given URIs is given. 928 929 930.. method:: HTTPPasswordMgr.find_user_password(realm, authuri) 931 932 Get user/password for given realm and URI, if any. This method will return 933 ``(None, None)`` if there is no matching user/password. 934 935 For :class:`HTTPPasswordMgrWithDefaultRealm` objects, the realm ``None`` will be 936 searched if the given *realm* has no matching user/password. 937 938 939.. _http-password-mgr-with-prior-auth: 940 941HTTPPasswordMgrWithPriorAuth Objects 942------------------------------------ 943 944This password manager extends :class:`HTTPPasswordMgrWithDefaultRealm` to support 945tracking URIs for which authentication credentials should always be sent. 946 947 948.. method:: HTTPPasswordMgrWithPriorAuth.add_password(realm, uri, user, \ 949 passwd, is_authenticated=False) 950 951 *realm*, *uri*, *user*, *passwd* are as for 952 :meth:`HTTPPasswordMgr.add_password`. *is_authenticated* sets the initial 953 value of the ``is_authenticated`` flag for the given URI or list of URIs. 954 If *is_authenticated* is specified as ``True``, *realm* is ignored. 955 956 957.. method:: HTTPPasswordMgrWithPriorAuth.find_user_password(realm, authuri) 958 959 Same as for :class:`HTTPPasswordMgrWithDefaultRealm` objects 960 961 962.. method:: HTTPPasswordMgrWithPriorAuth.update_authenticated(self, uri, \ 963 is_authenticated=False) 964 965 Update the ``is_authenticated`` flag for the given *uri* or list 966 of URIs. 967 968 969.. method:: HTTPPasswordMgrWithPriorAuth.is_authenticated(self, authuri) 970 971 Returns the current state of the ``is_authenticated`` flag for 972 the given URI. 973 974 975.. _abstract-basic-auth-handler: 976 977AbstractBasicAuthHandler Objects 978-------------------------------- 979 980 981.. method:: AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers) 982 983 Handle an authentication request by getting a user/password pair, and re-trying 984 the request. *authreq* should be the name of the header where the information 985 about the realm is included in the request, *host* specifies the URL and path to 986 authenticate for, *req* should be the (failed) :class:`Request` object, and 987 *headers* should be the error headers. 988 989 *host* is either an authority (e.g. ``"python.org"``) or a URL containing an 990 authority component (e.g. ``"http://python.org/"``). In either case, the 991 authority must not contain a userinfo component (so, ``"python.org"`` and 992 ``"python.org:80"`` are fine, ``"joe:password@python.org"`` is not). 993 994 995.. _http-basic-auth-handler: 996 997HTTPBasicAuthHandler Objects 998---------------------------- 999 1000 1001.. method:: HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs) 1002 1003 Retry the request with authentication information, if available. 1004 1005 1006.. _proxy-basic-auth-handler: 1007 1008ProxyBasicAuthHandler Objects 1009----------------------------- 1010 1011 1012.. method:: ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs) 1013 1014 Retry the request with authentication information, if available. 1015 1016 1017.. _abstract-digest-auth-handler: 1018 1019AbstractDigestAuthHandler Objects 1020--------------------------------- 1021 1022 1023.. method:: AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers) 1024 1025 *authreq* should be the name of the header where the information about the realm 1026 is included in the request, *host* should be the host to authenticate to, *req* 1027 should be the (failed) :class:`Request` object, and *headers* should be the 1028 error headers. 1029 1030 1031.. _http-digest-auth-handler: 1032 1033HTTPDigestAuthHandler Objects 1034----------------------------- 1035 1036 1037.. method:: HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs) 1038 1039 Retry the request with authentication information, if available. 1040 1041 1042.. _proxy-digest-auth-handler: 1043 1044ProxyDigestAuthHandler Objects 1045------------------------------ 1046 1047 1048.. method:: ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs) 1049 1050 Retry the request with authentication information, if available. 1051 1052 1053.. _http-handler-objects: 1054 1055HTTPHandler Objects 1056------------------- 1057 1058 1059.. method:: HTTPHandler.http_open(req) 1060 1061 Send an HTTP request, which can be either GET or POST, depending on 1062 ``req.has_data()``. 1063 1064 1065.. _https-handler-objects: 1066 1067HTTPSHandler Objects 1068-------------------- 1069 1070 1071.. method:: HTTPSHandler.https_open(req) 1072 1073 Send an HTTPS request, which can be either GET or POST, depending on 1074 ``req.has_data()``. 1075 1076 1077.. _file-handler-objects: 1078 1079FileHandler Objects 1080------------------- 1081 1082 1083.. method:: FileHandler.file_open(req) 1084 1085 Open the file locally, if there is no host name, or the host name is 1086 ``'localhost'``. 1087 1088 .. versionchanged:: 3.2 1089 This method is applicable only for local hostnames. When a remote 1090 hostname is given, an :exc:`~urllib.error.URLError` is raised. 1091 1092 1093.. _data-handler-objects: 1094 1095DataHandler Objects 1096------------------- 1097 1098.. method:: DataHandler.data_open(req) 1099 1100 Read a data URL. This kind of URL contains the content encoded in the URL 1101 itself. The data URL syntax is specified in :rfc:`2397`. This implementation 1102 ignores white spaces in base64 encoded data URLs so the URL may be wrapped 1103 in whatever source file it comes from. But even though some browsers don't 1104 mind about a missing padding at the end of a base64 encoded data URL, this 1105 implementation will raise an :exc:`ValueError` in that case. 1106 1107 1108.. _ftp-handler-objects: 1109 1110FTPHandler Objects 1111------------------ 1112 1113 1114.. method:: FTPHandler.ftp_open(req) 1115 1116 Open the FTP file indicated by *req*. The login is always done with empty 1117 username and password. 1118 1119 1120.. _cacheftp-handler-objects: 1121 1122CacheFTPHandler Objects 1123----------------------- 1124 1125:class:`CacheFTPHandler` objects are :class:`FTPHandler` objects with the 1126following additional methods: 1127 1128 1129.. method:: CacheFTPHandler.setTimeout(t) 1130 1131 Set timeout of connections to *t* seconds. 1132 1133 1134.. method:: CacheFTPHandler.setMaxConns(m) 1135 1136 Set maximum number of cached connections to *m*. 1137 1138 1139.. _unknown-handler-objects: 1140 1141UnknownHandler Objects 1142---------------------- 1143 1144 1145.. method:: UnknownHandler.unknown_open() 1146 1147 Raise a :exc:`~urllib.error.URLError` exception. 1148 1149 1150.. _http-error-processor-objects: 1151 1152HTTPErrorProcessor Objects 1153-------------------------- 1154 1155.. method:: HTTPErrorProcessor.http_response(request, response) 1156 1157 Process HTTP error responses. 1158 1159 For 200 error codes, the response object is returned immediately. 1160 1161 For non-200 error codes, this simply passes the job on to the 1162 :meth:`http_error_\<type\>` handler methods, via :meth:`OpenerDirector.error`. 1163 Eventually, :class:`HTTPDefaultErrorHandler` will raise an 1164 :exc:`~urllib.error.HTTPError` if no other handler handles the error. 1165 1166 1167.. method:: HTTPErrorProcessor.https_response(request, response) 1168 1169 Process HTTPS error responses. 1170 1171 The behavior is same as :meth:`http_response`. 1172 1173 1174.. _urllib-request-examples: 1175 1176Examples 1177-------- 1178 1179In addition to the examples below, more examples are given in 1180:ref:`urllib-howto`. 1181 1182This example gets the python.org main page and displays the first 300 bytes of 1183it. :: 1184 1185 >>> import urllib.request 1186 >>> with urllib.request.urlopen('http://www.python.org/') as f: 1187 ... print(f.read(300)) 1188 ... 1189 b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 1190 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html 1191 xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\n<head>\n 1192 <meta http-equiv="content-type" content="text/html; charset=utf-8" />\n 1193 <title>Python Programming ' 1194 1195Note that urlopen returns a bytes object. This is because there is no way 1196for urlopen to automatically determine the encoding of the byte stream 1197it receives from the HTTP server. In general, a program will decode 1198the returned bytes object to string once it determines or guesses 1199the appropriate encoding. 1200 1201The following W3C document, https://www.w3.org/International/O-charset\ , lists 1202the various ways in which an (X)HTML or an XML document could have specified its 1203encoding information. 1204 1205As the python.org website uses *utf-8* encoding as specified in its meta tag, we 1206will use the same for decoding the bytes object. :: 1207 1208 >>> with urllib.request.urlopen('http://www.python.org/') as f: 1209 ... print(f.read(100).decode('utf-8')) 1210 ... 1211 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 1212 "http://www.w3.org/TR/xhtml1/DTD/xhtm 1213 1214It is also possible to achieve the same result without using the 1215:term:`context manager` approach. :: 1216 1217 >>> import urllib.request 1218 >>> f = urllib.request.urlopen('http://www.python.org/') 1219 >>> print(f.read(100).decode('utf-8')) 1220 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 1221 "http://www.w3.org/TR/xhtml1/DTD/xhtm 1222 1223In the following example, we are sending a data-stream to the stdin of a CGI 1224and reading the data it returns to us. Note that this example will only work 1225when the Python installation supports SSL. :: 1226 1227 >>> import urllib.request 1228 >>> req = urllib.request.Request(url='https://localhost/cgi-bin/test.cgi', 1229 ... data=b'This data is passed to stdin of the CGI') 1230 >>> with urllib.request.urlopen(req) as f: 1231 ... print(f.read().decode('utf-8')) 1232 ... 1233 Got Data: "This data is passed to stdin of the CGI" 1234 1235The code for the sample CGI used in the above example is:: 1236 1237 #!/usr/bin/env python 1238 import sys 1239 data = sys.stdin.read() 1240 print('Content-type: text/plain\n\nGot Data: "%s"' % data) 1241 1242Here is an example of doing a ``PUT`` request using :class:`Request`:: 1243 1244 import urllib.request 1245 DATA = b'some data' 1246 req = urllib.request.Request(url='http://localhost:8080', data=DATA,method='PUT') 1247 with urllib.request.urlopen(req) as f: 1248 pass 1249 print(f.status) 1250 print(f.reason) 1251 1252Use of Basic HTTP Authentication:: 1253 1254 import urllib.request 1255 # Create an OpenerDirector with support for Basic HTTP Authentication... 1256 auth_handler = urllib.request.HTTPBasicAuthHandler() 1257 auth_handler.add_password(realm='PDQ Application', 1258 uri='https://mahler:8092/site-updates.py', 1259 user='klem', 1260 passwd='kadidd!ehopper') 1261 opener = urllib.request.build_opener(auth_handler) 1262 # ...and install it globally so it can be used with urlopen. 1263 urllib.request.install_opener(opener) 1264 urllib.request.urlopen('http://www.example.com/login.html') 1265 1266:func:`build_opener` provides many handlers by default, including a 1267:class:`ProxyHandler`. By default, :class:`ProxyHandler` uses the environment 1268variables named ``<scheme>_proxy``, where ``<scheme>`` is the URL scheme 1269involved. For example, the :envvar:`http_proxy` environment variable is read to 1270obtain the HTTP proxy's URL. 1271 1272This example replaces the default :class:`ProxyHandler` with one that uses 1273programmatically-supplied proxy URLs, and adds proxy authorization support with 1274:class:`ProxyBasicAuthHandler`. :: 1275 1276 proxy_handler = urllib.request.ProxyHandler({'http': 'http://www.example.com:3128/'}) 1277 proxy_auth_handler = urllib.request.ProxyBasicAuthHandler() 1278 proxy_auth_handler.add_password('realm', 'host', 'username', 'password') 1279 1280 opener = urllib.request.build_opener(proxy_handler, proxy_auth_handler) 1281 # This time, rather than install the OpenerDirector, we use it directly: 1282 opener.open('http://www.example.com/login.html') 1283 1284Adding HTTP headers: 1285 1286Use the *headers* argument to the :class:`Request` constructor, or:: 1287 1288 import urllib.request 1289 req = urllib.request.Request('http://www.example.com/') 1290 req.add_header('Referer', 'http://www.python.org/') 1291 # Customize the default User-Agent header value: 1292 req.add_header('User-Agent', 'urllib-example/0.1 (Contact: . . .)') 1293 r = urllib.request.urlopen(req) 1294 1295:class:`OpenerDirector` automatically adds a :mailheader:`User-Agent` header to 1296every :class:`Request`. To change this:: 1297 1298 import urllib.request 1299 opener = urllib.request.build_opener() 1300 opener.addheaders = [('User-agent', 'Mozilla/5.0')] 1301 opener.open('http://www.example.com/') 1302 1303Also, remember that a few standard headers (:mailheader:`Content-Length`, 1304:mailheader:`Content-Type` and :mailheader:`Host`) 1305are added when the :class:`Request` is passed to :func:`urlopen` (or 1306:meth:`OpenerDirector.open`). 1307 1308.. _urllib-examples: 1309 1310Here is an example session that uses the ``GET`` method to retrieve a URL 1311containing parameters:: 1312 1313 >>> import urllib.request 1314 >>> import urllib.parse 1315 >>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0}) 1316 >>> url = "http://www.musi-cal.com/cgi-bin/query?%s" % params 1317 >>> with urllib.request.urlopen(url) as f: 1318 ... print(f.read().decode('utf-8')) 1319 ... 1320 1321The following example uses the ``POST`` method instead. Note that params output 1322from urlencode is encoded to bytes before it is sent to urlopen as data:: 1323 1324 >>> import urllib.request 1325 >>> import urllib.parse 1326 >>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0}) 1327 >>> data = data.encode('ascii') 1328 >>> with urllib.request.urlopen("http://requestb.in/xrbl82xr", data) as f: 1329 ... print(f.read().decode('utf-8')) 1330 ... 1331 1332The following example uses an explicitly specified HTTP proxy, overriding 1333environment settings:: 1334 1335 >>> import urllib.request 1336 >>> proxies = {'http': 'http://proxy.example.com:8080/'} 1337 >>> opener = urllib.request.FancyURLopener(proxies) 1338 >>> with opener.open("http://www.python.org") as f: 1339 ... f.read().decode('utf-8') 1340 ... 1341 1342The following example uses no proxies at all, overriding environment settings:: 1343 1344 >>> import urllib.request 1345 >>> opener = urllib.request.FancyURLopener({}) 1346 >>> with opener.open("http://www.python.org/") as f: 1347 ... f.read().decode('utf-8') 1348 ... 1349 1350 1351Legacy interface 1352---------------- 1353 1354The following functions and classes are ported from the Python 2 module 1355``urllib`` (as opposed to ``urllib2``). They might become deprecated at 1356some point in the future. 1357 1358.. function:: urlretrieve(url, filename=None, reporthook=None, data=None) 1359 1360 Copy a network object denoted by a URL to a local file. If the URL 1361 points to a local file, the object will not be copied unless filename is supplied. 1362 Return a tuple ``(filename, headers)`` where *filename* is the 1363 local file name under which the object can be found, and *headers* is whatever 1364 the :meth:`info` method of the object returned by :func:`urlopen` returned (for 1365 a remote object). Exceptions are the same as for :func:`urlopen`. 1366 1367 The second argument, if present, specifies the file location to copy to (if 1368 absent, the location will be a tempfile with a generated name). The third 1369 argument, if present, is a callable that will be called once on 1370 establishment of the network connection and once after each block read 1371 thereafter. The callable will be passed three arguments; a count of blocks 1372 transferred so far, a block size in bytes, and the total size of the file. The 1373 third argument may be ``-1`` on older FTP servers which do not return a file 1374 size in response to a retrieval request. 1375 1376 The following example illustrates the most common usage scenario:: 1377 1378 >>> import urllib.request 1379 >>> local_filename, headers = urllib.request.urlretrieve('http://python.org/') 1380 >>> html = open(local_filename) 1381 >>> html.close() 1382 1383 If the *url* uses the :file:`http:` scheme identifier, the optional *data* 1384 argument may be given to specify a ``POST`` request (normally the request 1385 type is ``GET``). The *data* argument must be a bytes object in standard 1386 :mimetype:`application/x-www-form-urlencoded` format; see the 1387 :func:`urllib.parse.urlencode` function. 1388 1389 :func:`urlretrieve` will raise :exc:`ContentTooShortError` when it detects that 1390 the amount of data available was less than the expected amount (which is the 1391 size reported by a *Content-Length* header). This can occur, for example, when 1392 the download is interrupted. 1393 1394 The *Content-Length* is treated as a lower bound: if there's more data to read, 1395 urlretrieve reads more data, but if less data is available, it raises the 1396 exception. 1397 1398 You can still retrieve the downloaded data in this case, it is stored in the 1399 :attr:`content` attribute of the exception instance. 1400 1401 If no *Content-Length* header was supplied, urlretrieve can not check the size 1402 of the data it has downloaded, and just returns it. In this case you just have 1403 to assume that the download was successful. 1404 1405.. function:: urlcleanup() 1406 1407 Cleans up temporary files that may have been left behind by previous 1408 calls to :func:`urlretrieve`. 1409 1410.. class:: URLopener(proxies=None, **x509) 1411 1412 .. deprecated:: 3.3 1413 1414 Base class for opening and reading URLs. Unless you need to support opening 1415 objects using schemes other than :file:`http:`, :file:`ftp:`, or :file:`file:`, 1416 you probably want to use :class:`FancyURLopener`. 1417 1418 By default, the :class:`URLopener` class sends a :mailheader:`User-Agent` header 1419 of ``urllib/VVV``, where *VVV* is the :mod:`urllib` version number. 1420 Applications can define their own :mailheader:`User-Agent` header by subclassing 1421 :class:`URLopener` or :class:`FancyURLopener` and setting the class attribute 1422 :attr:`version` to an appropriate string value in the subclass definition. 1423 1424 The optional *proxies* parameter should be a dictionary mapping scheme names to 1425 proxy URLs, where an empty dictionary turns proxies off completely. Its default 1426 value is ``None``, in which case environmental proxy settings will be used if 1427 present, as discussed in the definition of :func:`urlopen`, above. 1428 1429 Additional keyword parameters, collected in *x509*, may be used for 1430 authentication of the client when using the :file:`https:` scheme. The keywords 1431 *key_file* and *cert_file* are supported to provide an SSL key and certificate; 1432 both are needed to support client authentication. 1433 1434 :class:`URLopener` objects will raise an :exc:`OSError` exception if the server 1435 returns an error code. 1436 1437 .. method:: open(fullurl, data=None) 1438 1439 Open *fullurl* using the appropriate protocol. This method sets up cache and 1440 proxy information, then calls the appropriate open method with its input 1441 arguments. If the scheme is not recognized, :meth:`open_unknown` is called. 1442 The *data* argument has the same meaning as the *data* argument of 1443 :func:`urlopen`. 1444 1445 This method always quotes *fullurl* using :func:`~urllib.parse.quote`. 1446 1447 .. method:: open_unknown(fullurl, data=None) 1448 1449 Overridable interface to open unknown URL types. 1450 1451 1452 .. method:: retrieve(url, filename=None, reporthook=None, data=None) 1453 1454 Retrieves the contents of *url* and places it in *filename*. The return value 1455 is a tuple consisting of a local filename and either an 1456 :class:`email.message.Message` object containing the response headers (for remote 1457 URLs) or ``None`` (for local URLs). The caller must then open and read the 1458 contents of *filename*. If *filename* is not given and the URL refers to a 1459 local file, the input filename is returned. If the URL is non-local and 1460 *filename* is not given, the filename is the output of :func:`tempfile.mktemp` 1461 with a suffix that matches the suffix of the last path component of the input 1462 URL. If *reporthook* is given, it must be a function accepting three numeric 1463 parameters: A chunk number, the maximum size chunks are read in and the total size of the download 1464 (-1 if unknown). It will be called once at the start and after each chunk of data is read from the 1465 network. *reporthook* is ignored for local URLs. 1466 1467 If the *url* uses the :file:`http:` scheme identifier, the optional *data* 1468 argument may be given to specify a ``POST`` request (normally the request type 1469 is ``GET``). The *data* argument must in standard 1470 :mimetype:`application/x-www-form-urlencoded` format; see the 1471 :func:`urllib.parse.urlencode` function. 1472 1473 1474 .. attribute:: version 1475 1476 Variable that specifies the user agent of the opener object. To get 1477 :mod:`urllib` to tell servers that it is a particular user agent, set this in a 1478 subclass as a class variable or in the constructor before calling the base 1479 constructor. 1480 1481 1482.. class:: FancyURLopener(...) 1483 1484 .. deprecated:: 3.3 1485 1486 :class:`FancyURLopener` subclasses :class:`URLopener` providing default handling 1487 for the following HTTP response codes: 301, 302, 303, 307 and 401. For the 30x 1488 response codes listed above, the :mailheader:`Location` header is used to fetch 1489 the actual URL. For 401 response codes (authentication required), basic HTTP 1490 authentication is performed. For the 30x response codes, recursion is bounded 1491 by the value of the *maxtries* attribute, which defaults to 10. 1492 1493 For all other response codes, the method :meth:`http_error_default` is called 1494 which you can override in subclasses to handle the error appropriately. 1495 1496 .. note:: 1497 1498 According to the letter of :rfc:`2616`, 301 and 302 responses to POST requests 1499 must not be automatically redirected without confirmation by the user. In 1500 reality, browsers do allow automatic redirection of these responses, changing 1501 the POST to a GET, and :mod:`urllib` reproduces this behaviour. 1502 1503 The parameters to the constructor are the same as those for :class:`URLopener`. 1504 1505 .. note:: 1506 1507 When performing basic authentication, a :class:`FancyURLopener` instance calls 1508 its :meth:`prompt_user_passwd` method. The default implementation asks the 1509 users for the required information on the controlling terminal. A subclass may 1510 override this method to support more appropriate behavior if needed. 1511 1512 The :class:`FancyURLopener` class offers one additional method that should be 1513 overloaded to provide the appropriate behavior: 1514 1515 .. method:: prompt_user_passwd(host, realm) 1516 1517 Return information needed to authenticate the user at the given host in the 1518 specified security realm. The return value should be a tuple, ``(user, 1519 password)``, which can be used for basic authentication. 1520 1521 The implementation prompts for this information on the terminal; an application 1522 should override this method to use an appropriate interaction model in the local 1523 environment. 1524 1525 1526:mod:`urllib.request` Restrictions 1527---------------------------------- 1528 1529 .. index:: 1530 pair: HTTP; protocol 1531 pair: FTP; protocol 1532 1533* Currently, only the following protocols are supported: HTTP (versions 0.9 and 1534 1.0), FTP, local files, and data URLs. 1535 1536 .. versionchanged:: 3.4 Added support for data URLs. 1537 1538* The caching feature of :func:`urlretrieve` has been disabled until someone 1539 finds the time to hack proper processing of Expiration time headers. 1540 1541* There should be a function to query whether a particular URL is in the cache. 1542 1543* For backward compatibility, if a URL appears to point to a local file but the 1544 file can't be opened, the URL is re-interpreted using the FTP protocol. This 1545 can sometimes cause confusing error messages. 1546 1547* The :func:`urlopen` and :func:`urlretrieve` functions can cause arbitrarily 1548 long delays while waiting for a network connection to be set up. This means 1549 that it is difficult to build an interactive Web client using these functions 1550 without using threads. 1551 1552 .. index:: 1553 single: HTML 1554 pair: HTTP; protocol 1555 1556* The data returned by :func:`urlopen` or :func:`urlretrieve` is the raw data 1557 returned by the server. This may be binary data (such as an image), plain text 1558 or (for example) HTML. The HTTP protocol provides type information in the reply 1559 header, which can be inspected by looking at the :mailheader:`Content-Type` 1560 header. If the returned data is HTML, you can use the module 1561 :mod:`html.parser` to parse it. 1562 1563 .. index:: single: FTP 1564 1565* The code handling the FTP protocol cannot differentiate between a file and a 1566 directory. This can lead to unexpected behavior when attempting to read a URL 1567 that points to a file that is not accessible. If the URL ends in a ``/``, it is 1568 assumed to refer to a directory and will be handled accordingly. But if an 1569 attempt to read a file leads to a 550 error (meaning the URL cannot be found or 1570 is not accessible, often for permission reasons), then the path is treated as a 1571 directory in order to handle the case when a directory is specified by a URL but 1572 the trailing ``/`` has been left off. This can cause misleading results when 1573 you try to fetch a file whose read permissions make it inaccessible; the FTP 1574 code will try to read it, fail with a 550 error, and then perform a directory 1575 listing for the unreadable file. If fine-grained control is needed, consider 1576 using the :mod:`ftplib` module, subclassing :class:`FancyURLopener`, or changing 1577 *_urlopener* to meet your needs. 1578 1579 1580 1581:mod:`urllib.response` --- Response classes used by urllib 1582========================================================== 1583 1584.. module:: urllib.response 1585 :synopsis: Response classes used by urllib. 1586 1587The :mod:`urllib.response` module defines functions and classes which define a 1588minimal file like interface, including ``read()`` and ``readline()``. The 1589typical response object is an addinfourl instance, which defines an ``info()`` 1590method and that returns headers and a ``geturl()`` method that returns the url. 1591Functions defined by this module are used internally by the 1592:mod:`urllib.request` module. 1593 1594