1.. _API-documentation: 2 3API documentation 4================= 5 6.. module:: h11 7 8.. contents:: 9 10h11 has a fairly small public API, with all public symbols available 11directly at the top level: 12 13.. ipython:: 14 15 In [2]: import h11 16 17 @verbatim 18 In [3]: h11.<TAB> 19 h11.CLIENT h11.MUST_CLOSE 20 h11.CLOSED h11.NEED_DATA 21 h11.Connection h11.PAUSED 22 h11.ConnectionClosed h11.PRODUCT_ID 23 h11.Data h11.ProtocolError 24 h11.DONE h11.RemoteProtocolError 25 h11.EndOfMessage h11.Request 26 h11.ERROR h11.Response 27 h11.IDLE h11.SEND_BODY 28 h11.InformationalResponse h11.SEND_RESPONSE 29 h11.LocalProtocolError h11.SERVER 30 h11.MIGHT_SWITCH_PROTOCOL h11.SWITCHED_PROTOCOL 31 32These symbols fall into three main categories: event classes, special 33constants used to track different connection states, and the 34:class:`Connection` class itself. We'll describe them in that order. 35 36.. _events: 37 38Events 39------ 40 41*Events* are the core of h11: the whole point of h11 is to let you 42think about HTTP transactions as being a series of events sent back 43and forth between a client and a server, instead of thinking in terms 44of bytes. 45 46All events behave in essentially similar ways. Let's take 47:class:`Request` as an example. Like all events, this is a "final" 48class -- you cannot subclass it. And like all events, it has several 49fields. For :class:`Request`, there are four of them: 50:attr:`~Request.method`, :attr:`~Request.target`, 51:attr:`~Request.headers`, and 52:attr:`~Request.http_version`. :attr:`~Request.http_version` 53defaults to ``b"1.1"``; the rest have no default, so to create a 54:class:`Request` you have to specify their values: 55 56.. ipython:: python 57 58 req = h11.Request(method="GET", 59 target="/", 60 headers=[("Host", "example.com")]) 61 62Event constructors accept only keyword arguments, not positional arguments. 63 64Events have a useful repr: 65 66.. ipython:: python 67 68 req 69 70And their fields are available as regular attributes: 71 72.. ipython:: python 73 74 req.method 75 req.target 76 req.headers 77 req.http_version 78 79Notice that these attributes have been normalized to byte-strings. In 80general, events normalize and validate their fields when they're 81constructed. Some of these normalizations and checks are specific to a 82particular event -- for example, :class:`Request` enforces RFC 7230's 83requirement that HTTP/1.1 requests must always contain a ``"Host"`` 84header: 85 86.. ipython:: python 87 88 # HTTP/1.0 requests don't require a Host: header 89 h11.Request(method="GET", target="/", headers=[], http_version="1.0") 90 91.. ipython:: python 92 :okexcept: 93 94 # But HTTP/1.1 requests do 95 h11.Request(method="GET", target="/", headers=[]) 96 97This helps protect you from accidentally violating the protocol, and 98also helps protect you from remote peers who attempt to violate the 99protocol. 100 101A few of these normalization rules are standard across multiple 102events, so we document them here: 103 104.. _headers-format: 105 106:attr:`headers`: In h11, headers are represented internally as a list 107of (*name*, *value*) pairs, where *name* and *value* are both 108byte-strings, *name* is always lowercase, and *name* and *value* are 109both guaranteed not to have any leading or trailing whitespace. When 110constructing an event, we accept any iterable of pairs like this, and 111will automatically convert native strings containing ascii or 112:term:`bytes-like object`\s to byte-strings and convert names to 113lowercase: 114 115.. ipython:: python 116 117 original_headers = [("HOST", bytearray(b"Example.Com"))] 118 req = h11.Request(method="GET", target="/", headers=original_headers) 119 original_headers 120 req.headers 121 122If any names are detected with leading or trailing whitespace, then 123this is an error ("in the past, differences in the handling of such 124whitespace have led to security vulnerabilities" -- `RFC 7230 125<https://tools.ietf.org/html/rfc7230#section-3.2.4>`_). We also check 126for certain other protocol violations, e.g. it's always illegal to 127have a newline inside a header value, and ``Content-Length: hello`` is 128an error because `Content-Length` should always be an integer. We may 129add additional checks in the future. 130 131While we make sure to expose header names as lowercased bytes, we also 132preserve the original header casing that is used. Compliant HTTP 133agents should always treat headers in a case insensitive manner, but 134this may not always be the case. When sending bytes over the wire we 135send headers preserving whatever original header casing was used. 136 137It is possible to access the headers in their raw original casing, 138which may be useful for some user output or debugging purposes. 139 140.. ipython:: python 141 142 original_headers = [("Host", "example.com")] 143 req = h11.Request(method="GET", target="/", headers=original_headers) 144 req.headers.raw_items() 145 146.. _http_version-format: 147 148It's not just headers we normalize to being byte-strings: the same 149type-conversion logic is also applied to the :attr:`Request.method` 150and :attr:`Request.target` field, and -- for consistency -- all 151:attr:`http_version` fields. In particular, we always represent HTTP 152version numbers as byte-strings like ``b"1.1"``. :term:`Bytes-like 153object`\s and native strings will be automatically converted to byte 154strings. Note that the HTTP standard `specifically guarantees 155<https://tools.ietf.org/html/rfc7230#section-2.6>`_ that all HTTP 156version numbers will consist of exactly two digits separated by a dot, 157so comparisons like ``req.http_version < b"1.1"`` are safe and valid. 158 159When manually constructing an event, you generally shouldn't specify 160:attr:`http_version`, because it defaults to ``b"1.1"``, and if you 161attempt to override this to some other value then 162:meth:`Connection.send` will reject your event -- h11 only speaks 163HTTP/1.1. But it does understand other versions of HTTP, so you might 164receive events with other ``http_version`` values from remote peers. 165 166Here's the complete set of events supported by h11: 167 168.. autoclass:: Request 169 170.. autoclass:: InformationalResponse 171 172.. autoclass:: Response 173 174.. autoclass:: Data 175 176.. autoclass:: EndOfMessage 177 178.. autoclass:: ConnectionClosed 179 180 181.. _state-machine: 182 183The state machine 184----------------- 185 186Now that you know what the different events are, the next question is: 187what can you do with them? 188 189A basic HTTP request/response cycle looks like this: 190 191* The client sends: 192 193 * one :class:`Request` event with request metadata and headers, 194 * zero or more :class:`Data` events with the request body (if any), 195 * and an :class:`EndOfMessage` event. 196 197* And then the server replies with: 198 199 * zero or more :class:`InformationalResponse` events, 200 * one :class:`Response` event, 201 * zero or more :class:`Data` events with the response body (if any), 202 * and a :class:`EndOfMessage` event. 203 204And once that's finished, both sides either close the connection, or 205they go back to the top and re-use it for another request/response 206cycle. 207 208To coordinate this interaction, the h11 :class:`Connection` object 209maintains several state machines: one that tracks what the client is 210doing, one that tracks what the server is doing, and a few more tiny 211ones to track whether :ref:`keep-alive <keepalive-and-pipelining>` is 212enabled and whether the client has proposed to :ref:`switch protocols 213<switching-protocols>`. h11 always keeps track of all of these state 214machines, regardless of whether it's currently playing the client or 215server role. 216 217The state machines look like this (click on each to expand): 218 219.. ipython:: python 220 :suppress: 221 222 import sys 223 import subprocess 224 subprocess.check_call([sys.executable, 225 sys._h11_hack_docs_source_path 226 + "/make-state-diagrams.py"]) 227 228.. |client-image| image:: _static/CLIENT.svg 229 :target: _static/CLIENT.svg 230 :width: 100% 231 :align: top 232 233.. |server-image| image:: _static/SERVER.svg 234 :target: _static/SERVER.svg 235 :width: 100% 236 :align: top 237 238.. |special-image| image:: _static/special-states.svg 239 :target: _static/special-states.svg 240 :width: 100% 241 242+----------------+----------------+ 243| |client-image| | |server-image| | 244+----------------+----------------+ 245| |special-image| | 246+---------------------------------+ 247 248If you squint at the first two diagrams, you can see the client's IDLE 249-> SEND_BODY -> DONE path and the server's IDLE -> SEND_RESPONSE -> 250SEND_BODY -> DONE path, which encode the basic sequence of events we 251described above. But there's a fair amount of other stuff going on 252here as well. 253 254The first thing you should notice is the different colors. These 255correspond to the different ways that our state machines can change 256state. 257 258* Dark blue arcs are *event-triggered transitions*: if we're in state 259 A, and this event happens, when we switch to state B. For the client 260 machine, these transitions always happen when the client *sends* an 261 event. For the server machine, most of them involve the server 262 sending an event, except that the server also goes from IDLE -> 263 SEND_RESPONSE when the client sends a :class:`Request`. 264 265* Green arcs are *state-triggered transitions*: these are somewhat 266 unusual, and are used to couple together the different state 267 machines -- if, at any moment, one machine is in state A and another 268 machine is in state B, then the first machine immediately 269 transitions to state C. For example, if the CLIENT machine is in 270 state DONE, and the SERVER machine is in the CLOSED state, then the 271 CLIENT machine transitions to MUST_CLOSE. And the same thing happens 272 if the CLIENT machine is in the state DONE and the keep-alive 273 machine is in the state disabled. 274 275* There are also two purple arcs labeled 276 :meth:`~Connection.start_next_cycle`: these correspond to an explicit 277 method call documented below. 278 279Here's why we have all the stuff in those diagrams above, beyond 280what's needed to handle the basic request/response cycle: 281 282* Server sending a :class:`Response` directly from :data:`IDLE`: This 283 is used for error responses, when the client's request never arrived 284 (e.g. 408 Request Timed Out) or was unparseable gibberish (400 Bad 285 Request) and thus didn't register with our state machine as a real 286 :class:`Request`. 287 288* The transitions involving :data:`MUST_CLOSE` and :data:`CLOSE`: 289 keep-alive and shutdown handling; see 290 :ref:`keepalive-and-pipelining` and :ref:`closing`. 291 292* The transitions involving :data:`MIGHT_SWITCH_PROTOCOL` and 293 :data:`SWITCHED_PROTOCOL`: See :ref:`switching-protocols`. 294 295* That weird :data:`ERROR` state hanging out all lonely on the bottom: 296 to avoid cluttering the diagram, we don't draw any arcs coming into 297 this node, but that doesn't mean it can't be entered. In fact, it 298 can be entered from any state: if any exception occurs while trying 299 to send/receive data, then the corresponding machine will transition 300 directly to this state. Once there, though, it can never leave -- 301 that part of the diagram is accurate. See :ref:`error-handling`. 302 303And finally, note that in these diagrams, all the labels that are in 304*italics* are informal English descriptions of things that happen in 305the code, while the labels in upright text correspond to actual 306objects in the public API. You've already seen the event objects like 307:class:`Request` and :class:`Response`; there are also a set of opaque 308sentinel values that you can use to track and query the client and 309server's states. 310 311 312Special constants 313----------------- 314 315h11 exposes some special constants corresponding to the different 316states in the client and server state machines described above. The 317complete list is: 318 319.. data:: IDLE 320 SEND_RESPONSE 321 SEND_BODY 322 DONE 323 MUST_CLOSE 324 CLOSED 325 MIGHT_SWITCH_PROTOCOL 326 SWITCHED_PROTOCOL 327 ERROR 328 329For example, we can see that initially the client and server start in 330state :data:`IDLE` / :data:`IDLE`: 331 332.. ipython:: python 333 334 conn = h11.Connection(our_role=h11.CLIENT) 335 conn.states 336 337And then if the client sends a :class:`Request`, then the client 338switches to state :data:`SEND_BODY`, while the server switches to 339state :data:`SEND_RESPONSE`: 340 341.. ipython:: python 342 343 conn.send(h11.Request(method="GET", target="/", headers=[("Host", "example.com")])); 344 conn.states 345 346And we can test these values directly using constants like :data:`SEND_BODY`: 347 348.. ipython:: python 349 350 conn.states[h11.CLIENT] is h11.SEND_BODY 351 352This shows how the :class:`Connection` type tracks these state 353machines and lets you query their current state. 354 355The above also showed the special constants that can be used to 356indicate the two different roles that a peer can play in an HTTP 357connection: 358 359.. data:: CLIENT 360 SERVER 361 362And finally, there are also two special constants that can be returned 363from :meth:`Connection.next_event`: 364 365.. data:: NEED_DATA 366 PAUSED 367 368All of these behave the same, and their behavior is modeled after 369:data:`None`: they're opaque singletons, their :meth:`__repr__` is 370their name, and you compare them with ``is``. 371 372.. _sentinel-type-trickiness: 373 374Finally, h11's constants have a quirky feature that can sometimes be 375useful: they are instances of themselves. 376 377.. ipython:: python 378 379 type(h11.NEED_DATA) is h11.NEED_DATA 380 type(h11.PAUSED) is h11.PAUSED 381 382The main application of this is that when handling the return value 383from :meth:`Connection.next_event`, which is sometimes an instance of 384an event class and sometimes :data:`NEED_DATA` or :data:`PAUSED`, you 385can always call ``type(event)`` to get something useful to dispatch 386one, using e.g. a handler table, :func:`functools.singledispatch`, or 387calling ``getattr(some_object, "handle_" + 388type(event).__name__)``. Not that this kind of dispatch-based strategy 389is always the best approach -- but the option is there if you want it. 390 391 392The Connection object 393--------------------- 394 395.. autoclass:: Connection 396 397 .. automethod:: receive_data 398 .. automethod:: next_event 399 .. automethod:: send 400 .. automethod:: send_with_data_passthrough 401 .. automethod:: send_failed 402 403 .. automethod:: start_next_cycle 404 405 .. attribute:: our_role 406 407 :data:`CLIENT` if this is a client; :data:`SERVER` if this is a server. 408 409 .. attribute:: their_role 410 411 :data:`SERVER` if this is a client; :data:`CLIENT` if this is a server. 412 413 .. autoattribute:: states 414 .. autoattribute:: our_state 415 .. autoattribute:: their_state 416 417 .. attribute:: their_http_version 418 419 The version of HTTP that our peer claims to support. ``None`` if 420 we haven't yet received a request/response. 421 422 This is preserved by :meth:`start_next_cycle`, so it can be 423 handy for a client making multiple requests on the same 424 connection: normally you don't know what version of HTTP the 425 server supports until after you do a request and get a response 426 -- so on an initial request you might have to assume the 427 worst. But on later requests on the same connection, the 428 information will be available here. 429 430 .. attribute:: client_is_waiting_for_100_continue 431 432 True if the client sent a request with the ``Expect: 433 100-continue`` header, and is still waiting for a response 434 (i.e., the server has not sent a 100 Continue or any other kind 435 of response, and the client has not gone ahead and started 436 sending the body anyway). 437 438 See `RFC 7231 section 5.1.1 439 <https://tools.ietf.org/html/rfc7231#section-5.1.1>`_ for details. 440 441 .. attribute:: they_are_waiting_for_100_continue 442 443 True if :attr:`their_role` is :data:`CLIENT` and 444 :attr:`client_is_waiting_for_100_continue`. 445 446 .. autoattribute:: trailing_data 447 448 449.. _error-handling: 450 451Error handling 452-------------- 453 454Given the vagaries of networks and the folks on the other side of 455them, it's extremely important to be prepared for errors. 456 457Most errors in h11 are signaled by raising one of 458:exc:`ProtocolError`'s two concrete base classes, 459:exc:`LocalProtocolError` and :exc:`RemoteProtocolError`: 460 461.. autoexception:: ProtocolError 462.. autoexception:: LocalProtocolError 463.. autoexception:: RemoteProtocolError 464 465There are four cases where these exceptions might be raised: 466 467* When trying to instantiate an event object 468 (:exc:`LocalProtocolError`): This indicates that something about 469 your event is invalid. Your event wasn't constructed, but there are 470 no other consequences -- feel free to try again. 471 472* When calling :meth:`Connection.start_next_cycle` 473 (:exc:`LocalProtocolError`): This indicates that the connection is 474 not ready to be re-used, because one or both of the peers are not in 475 the :data:`DONE` state. The :class:`Connection` object remains 476 usable, and you can try again later. 477 478* When calling :meth:`Connection.next_event` 479 (:exc:`RemoteProtocolError`): This indicates that the remote peer 480 has violated our protocol assumptions. This is unrecoverable -- we 481 don't know what they're doing and we cannot safely 482 proceed. :attr:`Connection.their_state` immediately becomes 483 :data:`ERROR`, and all further calls to 484 :meth:`~Connection.next_event` will also raise 485 :exc:`RemoteProtocolError`. :meth:`Connection.send` still works as 486 normal, so if you're implementing a server and this happens then you 487 have an opportunity to send back a 400 Bad Request response. But 488 aside from that, your only real option is to close your socket and 489 make a new connection. 490 491* When calling :meth:`Connection.send` or 492 :meth:`Connection.send_with_data_passthrough` 493 (:exc:`LocalProtocolError`): This indicates that *you* violated our 494 protocol assumptions. This is also unrecoverable -- h11 doesn't know 495 what you're doing, its internal state may be inconsistent, and we 496 cannot safely proceed. :attr:`Connection.our_state` immediately 497 becomes :data:`ERROR`, and all further calls to 498 :meth:`~Connection.send` will also raise 499 :exc:`LocalProtocolError`. The only thing you can reasonably due at 500 this point is to close your socket and make a new connection. 501 502So that's how h11 tells you about errors that it detects. In some 503cases, it's also useful to be able to tell h11 about an error that you 504detected. In particular, the :class:`Connection` object assumes that 505after you call :meth:`Connection.send`, you actually send that data to 506the remote peer. But sometimes, for one reason or another, this 507doesn't actually happen. 508 509Here's a concrete example. Suppose you're using h11 to implement an 510HTTP client that keeps a pool of connections so it can re-use them 511when possible (see :ref:`keepalive-and-pipelining`). You take a 512connection from the pool, and start to do a large upload... but then 513for some reason this gets cancelled (maybe you have a GUI and a user 514clicked "cancel"). This can cause h11's model of this connection to 515diverge from reality: for example, h11 might think that you 516successfully sent the full request, because you passed an 517:class:`EndOfMessage` object to :meth:`Connection.send`, but in fact 518you didn't, because you never sent the resulting bytes. And then – 519here's the really tricky part! – if you're not careful, you might 520think that it's OK to put this connection back into the connection 521pool and re-use it, because h11 is telling you that a full 522request/response cycle was completed. But this is wrong; in fact you 523have to close this connection and open a new one. 524 525The solution is simple: call :meth:`Connection.send_failed`, and now 526h11 knows that your send failed. In this case, 527:attr:`Connection.our_state` immediately becomes :data:`ERROR`, just 528like if you had tried to do something that violated the protocol. 529 530 531.. _framing: 532 533Message body framing: ``Content-Length`` and all that 534----------------------------------------------------- 535 536There are two different headers that HTTP/1.1 uses to indicate a 537framing mechanism for request/response bodies: ``Content-Length`` and 538``Transfer-Encoding``. Our general philosophy is that the way you tell 539h11 what configuration you want to use is by setting the appropriate 540headers in your request / response, and then h11 will both pass those 541headers on to the peer and encode the body appropriately. 542 543Currently, the only supported ``Transfer-Encoding`` is ``chunked``. 544 545On requests, this means: 546 547* No ``Content-Length`` or ``Transfer-Encoding``: no body, equivalent 548 to ``Content-Length: 0``. 549 550* ``Content-Length: ...``: You're going to send exactly the specified 551 number of bytes. h11 will keep track and signal an error if your 552 :class:`EndOfMessage` doesn't happen at the right place. 553 554* ``Transfer-Encoding: chunked``: You're going to send a variable / 555 not yet known number of bytes. 556 557 Note 1: only HTTP/1.1 servers are required to support 558 ``Transfer-Encoding: chunked``, and as a client you have to decide 559 whether to send this header before you get to see what protocol 560 version the server is using. 561 562 Note 2: even though HTTP/1.1 servers are required to support 563 ``Transfer-Encoding: chunked``, this doesn't necessarily mean that 564 they actually do -- e.g., applications using Python's standard WSGI 565 API cannot accept chunked requests. 566 567 Nonetheless, this is the only way to send request where you don't 568 know the size of the body ahead of time, so if that's the situation 569 you find yourself in then you might as well try it and hope. 570 571On responses, things are a bit more subtle. There are effectively two 572cases: 573 574* ``Content-Length: ...``: You're going to send exactly the specified 575 number of bytes. h11 will keep track and signal an error if your 576 :class:`EndOfMessage` doesn't happen at the right place. 577 578* ``Transfer-Encoding: chunked``, *or*, neither framing header is 579 provided: These two cases are handled differently at the wire level, 580 but as far as the application is concerned they provide (almost) 581 exactly the same semantics: in either case, you'll send a variable / 582 not yet known number of bytes. The difference between them is that 583 ``Transfer-Encoding: chunked`` works better (compatible with 584 keep-alive, allows trailing headers, clearly distinguishes between 585 successful completion and network errors), but requires an HTTP/1.1 586 client; for HTTP/1.0 clients the only option is the no-headers 587 approach where you have to close the socket to indicate completion. 588 589 Since this is (almost) entirely a wire-level-encoding concern, h11 590 abstracts it: when sending a response you can set either 591 ``Transfer-Encoding: chunked`` or leave off both framing headers, 592 and h11 will treat both cases identically: it will automatically 593 pick the best option given the client's advertised HTTP protocol 594 level. 595 596 You need to watch out for this if you're using trailing headers 597 (i.e., a non-empty ``headers`` attribute on :class:`EndOfMessage`), 598 since trailing headers are only legal if we actually ended up using 599 ``Transfer-Encoding: chunked``. Trying to send a non-empty set of 600 trailing headers to a HTTP/1.0 client will raise a 601 :exc:`LocalProtocolError`. If this use case is important to you, check 602 :attr:`Connection.their_http_version` to confirm that the client 603 speaks HTTP/1.1 before you attempt to send any trailing headers. 604 605 606.. _keepalive-and-pipelining: 607 608Re-using a connection: keep-alive and pipelining 609------------------------------------------------ 610 611HTTP/1.1 allows a connection to be re-used for multiple 612request/response cycles (also known as "keep-alive"). This can make 613things faster by letting us skip the costly connection setup, but it 614does create some complexities: we have to keep track of whether a 615connection is reusable, and when there are multiple requests and 616responses flowing through the same connection we need to be careful 617not to get confused about which request goes with which response. 618 619h11 considers a connection to be reusable if, and only if, both 620sides (a) speak HTTP/1.1 (HTTP/1.0 did have some complex and fragile 621support for keep-alive bolted on, but h11 currently doesn't support 622that -- possibly this will be added in the future), and (b) neither 623side has explicitly disabled keep-alive by sending a ``Connection: 624close`` header. 625 626If you plan to make only a single request or response and then close 627the connection, you should manually set the ``Connection: close`` 628header in your request/response. h11 will notice and update its state 629appropriately. 630 631There are also some situations where you are required to send a 632``Connection: close`` header, e.g. if you are a server talking to a 633client that doesn't support keep-alive. You don't need to worry about 634these cases -- h11 will automatically add this header when 635necessary. Just worry about setting it when it's actually something 636that you're actively choosing. 637 638If you want to re-use a connection, you have to wait until both the 639request and the response have been completed, bringing both the client 640and server to the :data:`DONE` state. Once this has happened, you can 641explicitly call :meth:`Connection.start_next_cycle` to reset both 642sides back to the :data:`IDLE` state. This makes sure that the client 643and server remain synched up. 644 645If keep-alive is disabled for whatever reason -- someone set 646``Connection: close``, lack of protocol support, one of the sides just 647unilaterally closed the connection -- then the state machines will 648skip past the :data:`DONE` state directly to the :data:`MUST_CLOSE` or 649:data:`CLOSED` states. In this case, trying to call 650:meth:`~Connection.start_next_cycle` will raise an error, and the only 651thing you can legally do is to close this connection and make a new 652one. 653 654HTTP/1.1 also allows for a more aggressive form of connection re-use, 655in which a client sends multiple requests in quick succession, and 656then waits for the responses to stream back in order 657("pipelining"). This is generally considered to have been a bad idea, 658because it makes things like error recovery very complicated. 659 660As a client, h11 does not support pipelining. This is enforced by the 661structure of the state machine: after sending one :class:`Request`, 662you can't send another until after calling 663:meth:`~Connection.start_next_cycle`, and you can't call 664:meth:`~Connection.start_next_cycle` until the server has entered the 665:data:`DONE` state, which requires reading the server's full 666response. 667 668As a server, h11 provides the minimal support for pipelining required 669to comply with the HTTP/1.1 standard: if the client sends multiple 670pipelined requests, then we handle the first request until we reach the 671:data:`DONE` state, and then :meth:`~Connection.next_event` will 672pause and refuse to parse any more events until the response is 673completed and :meth:`~Connection.start_next_cycle` is called. See the 674next section for more details. 675 676 677.. _flow-control: 678 679Flow control 680------------ 681 682Presumably you know when you want to send things, and the 683:meth:`~Connection.send` interface is very simple: it just immediately 684returns all the data you need to send for the given event, so you can 685apply whatever send buffer strategy you want. But reading from the 686remote peer is a bit trickier: you don't want to read data from the 687remote peer if it can't be processed (i.e., you want to apply 688backpressure and avoid building arbitrarily large in-memory buffers), 689and you definitely don't want to block waiting on data from the remote 690peer at the same time that it's blocked waiting for you, because that 691will cause a deadlock. 692 693One complication here is that if you're implementing a server, you 694have to be prepared to handle :class:`Request`\s that have an 695``Expect: 100-continue`` header. You can `read the spec 696<https://tools.ietf.org/html/rfc7231#section-5.1.1>`_ for the full 697details, but basically what this header means is that after sending 698the :class:`Request`, the client plans to pause and wait until they 699see some response from the server before they send that request's 700:class:`Data`. The server's response would normally be an 701:class:`InformationalResponse` with status ``100 Continue``, but it 702could be anything really (e.g. a full :class:`Response` with a 4xx 703status code). The crucial thing as a server, though, is that you 704should never block trying to read a request body if the client is 705blocked waiting for you to tell them to send the request body. 706 707Fortunately, h11 makes this easy, because it tracks whether the client 708is in the waiting-for-100-continue state, and exposes this as 709:attr:`Connection.they_are_waiting_for_100_continue`. So you don't 710have to pay attention to the ``Expect`` header yourself; you just have 711to make sure that before you block waiting to read a request body, you 712execute some code like: 713 714.. code-block:: python 715 716 if conn.they_are_waiting_for_100_continue: 717 do_send(conn, h11.InformationalResponse(100, headers=[...])) 718 do_read(...) 719 720In fact, if you're lazy (and what programmer isn't?) then you can just 721do this check before all reads -- it's mandatory before blocking to 722read a request body, but it's safe at any time. 723 724And the other thing you want to pay attention to is the special values 725that :meth:`~Connection.next_event` might return: :data:`NEED_DATA` 726and :data:`PAUSED`. 727 728:data:`NEED_DATA` is what it sounds like: it means that 729:meth:`~Connection.next_event` is guaranteed not to return any more 730real events until you've called :meth:`~Connection.receive_data` at 731least once. 732 733:data:`PAUSED` is a little more subtle: it means that 734:meth:`~Connection.next_event` is guaranteed not to return any more 735real events until something else has happened to clear up the paused 736state. There are three cases where this can happen: 737 7381) We received a full request/response from the remote peer, and then 739 we received some more data after that. (The main situation where 740 this might happen is a server responding to a pipelining client.) 741 The :data:`PAUSED` state will go away after you call 742 :meth:`~Connection.start_next_cycle`. 743 7442) A successful ``CONNECT`` or ``Upgrade:`` request has caused the 745 connection to switch to some other protocol (see 746 :ref:`switching-protocols`). This :data:`PAUSED` state is 747 permanent; you should abandon this :class:`Connection` and go do 748 whatever it is you're going to do with your new protocol. 749 7503) We're a server, and the client we're talking to proposed to switch 751 protocols (see :ref:`switching-protocols`), and now is waiting to 752 find out whether their request was successful or not. Once we 753 either accept or deny their request then this will turn into one of 754 the above two states, so you probably don't need to worry about 755 handling it specially. 756 757Putting all this together -- 758 759If your I/O is organized around a "pull" strategy, where your code 760requests events as its ready to handle them (e.g. classic synchronous 761code, or asyncio's ``await loop.sock_recv(...)``, or `Trio's streams 762<http://https://trio.readthedocs.io/en/latest/reference-io.html#the-abstract-stream-api>`__), 763then you'll probably want logic that looks something like: 764 765.. code-block:: python 766 767 # Replace do_sendall and do_recv with your I/O code 768 def get_next_event(): 769 while True: 770 event = conn.next_event() 771 if event is h11.NEED_DATA: 772 if conn.they_are_waiting_for_100_continue: 773 do_sendall(conn, h11.InformationalResponse(100, ...)) 774 conn.receive_data(do_recv()) 775 continue 776 return event 777 778And then your code that calls this will need to make sure to call it 779only at appropriate times (e.g., not immediately after receiving 780:class:`EndOfMessage` or :data:`PAUSED`). 781 782If your I/O is organized around a "push" strategy, where the network 783drives processing (e.g. you're using `Twisted 784<https://twistedmatrix.com/>`_, or implementing an 785:class:`asyncio.Protocol`), then you'll want to internally apply 786back-pressure whenever you see :data:`PAUSED`, remove back-pressure 787when you call :meth:`~Connection.start_next_cycle`, and otherwise just 788deliver events as they arrive. Something like: 789 790.. code-block:: python 791 792 class HTTPProtocol(asyncio.Protocol): 793 # Save the transport for later -- needed to access the 794 # backpressure API. 795 def connection_made(self, transport): 796 self._transport = transport 797 798 # Internal helper function -- deliver all pending events 799 def _deliver_events(self): 800 while True: 801 event = self.conn.next_event() 802 if event is h11.NEED_DATA: 803 break 804 elif event is h11.PAUSED: 805 # Apply back-pressure 806 self._transport.pause_reading() 807 break 808 else: 809 self.event_received(event) 810 811 # Called by "someone" whenever new data appears on our socket 812 def data_received(self, data): 813 self.conn.receive_data(data) 814 self._deliver_events() 815 816 # Called by "someone" whenever the peer closes their socket 817 def eof_received(self): 818 self.conn.receive_data(b"") 819 self._deliver_events() 820 # asyncio will close our socket unless we return True here. 821 return True 822 823 # Called by your code when its ready to start a new 824 # request/response cycle 825 def start_next_cycle(self): 826 self.conn.start_next_cycle() 827 # New events might have been buffered internally, and only 828 # become deliverable after calling start_next_cycle 829 self._deliver_events() 830 # Remove back-pressure 831 self._transport.resume_reading() 832 833 # Fill in your code here 834 def event_received(self, event): 835 ... 836 837And your code that uses this will have to remember to check for 838:attr:`~Connection.they_are_waiting_for_100_continue` at the 839appropriate time. 840 841 842.. _closing: 843 844Closing connections 845------------------- 846 847h11 represents a connection shutdown with the special event type 848:class:`ConnectionClosed`. You can send this event, in which case 849:meth:`~Connection.send` will simply update the state machine and 850then return ``None``. You can receive this event, if you call 851``conn.receive_data(b"")``. (The actual receipt might be delayed if 852the connection is :ref:`paused <flow-control>`.) It's safe and legal 853to call ``conn.receive_data(b"")`` multiple times, and once you've 854done this once, then all future calls to 855:meth:`~Connection.receive_data` will also return 856``ConnectionClosed()``: 857 858.. ipython:: python 859 860 conn = h11.Connection(our_role=h11.CLIENT) 861 conn.receive_data(b"") 862 conn.receive_data(b"") 863 conn.receive_data(None) 864 865(Or if you try to actually pass new data in after calling 866``conn.receive_data(b"")``, that will raise an exception.) 867 868h11 is careful about interpreting connection closure in a *half-duplex 869fashion*. TCP sockets pretend to be a two-way connection, but really 870they're two one-way connections. In particular, it's possible for one 871party to shut down their sending connection -- which causes the other 872side to be notified that the connection has closed via the usual 873``socket.recv(...) -> b""`` mechanism -- while still being able to 874read from their receiving connection. (On Unix, this is generally 875accomplished via the ``shutdown(2)`` system call.) So, for example, a 876client could send a request, and then close their socket for writing 877to indicate that they won't be sending any more requests, and then 878read the response. It's this kind of closure that is indicated by 879h11's :class:`ConnectionClosed`: it means that this party will not be 880sending any more data -- nothing more, nothing less. You can see this 881reflected in the :ref:`state machine <state-machine>`, in which one 882party transitioning to :data:`CLOSED` doesn't immediately halt the 883connection, but merely prevents it from continuing for another 884request/response cycle. 885 886The state machine also indicates that :class:`ConnectionClosed` events 887can only happen in certain states. This isn't true, of course -- any 888party can close their connection at any time, and h11 can't stop 889them. But what h11 can do is distinguish between clean and unclean 890closes. For example, if both sides complete a request/response cycle 891and then close the connection, that's a clean closure and everyone 892will transition to the :data:`CLOSED` state in an orderly fashion. On 893the other hand, if one party suddenly closes the connection while 894they're in the middle of sending a chunked response body, or when they 895promised a ``Content-Length:`` of 1000 bytes but have only sent 500, 896then h11 knows that this is a violation of the HTTP protocol, and will 897raise a :exc:`ProtocolError`. Basically h11 treats an unexpected 898close the same way it would treat unexpected, uninterpretable data 899arriving -- it lets you know that something has gone wrong. 900 901As a client, the proper way to perform a single request and then close 902the connection is: 903 9041) Send a :class:`Request` with ``Connection: close`` 905 9062) Send the rest of the request body 907 9083) Read the server's :class:`Response` and body 909 9104) ``conn.our_state is h11.MUST_CLOSE`` will now be true. Call 911 ``conn.send(ConnectionClosed())`` and then close the socket. Or 912 really you could just close the socket -- the thing calling 913 ``send`` will do is raise an error if you're not in 914 :data:`MUST_CLOSE` as expected. So it's between you and your 915 conscience and your code reviewers. 916 917(Technically it would also be legal to shutdown your socket for 918writing as step 2.5, but this doesn't serve any purpose and some 919buggy servers might get annoyed, so it's not recommended.) 920 921As a server, the proper way to perform a response is: 922 9231) Send your :class:`Response` and body 924 9252) Check if ``conn.our_state is h11.MUST_CLOSE``. This might happen 926 for a variety of reasons; for example, if the response had unknown 927 length and the client speaks only HTTP/1.0, then the client will 928 not consider the connection complete until we issue a close. 929 930You should be particularly careful to take into consideration the 931following note fromx `RFC 7230 section 6.6 932<https://tools.ietf.org/html/rfc7230#section-6.6>`_: 933 934 If a server performs an immediate close of a TCP connection, there is 935 a significant risk that the client will not be able to read the last 936 HTTP response. If the server receives additional data from the 937 client on a fully closed connection, such as another request that was 938 sent by the client before receiving the server's response, the 939 server's TCP stack will send a reset packet to the client; 940 unfortunately, the reset packet might erase the client's 941 unacknowledged input buffers before they can be read and interpreted 942 by the client's HTTP parser. 943 944 To avoid the TCP reset problem, servers typically close a connection 945 in stages. First, the server performs a half-close by closing only 946 the write side of the read/write connection. The server then 947 continues to read from the connection until it receives a 948 corresponding close by the client, or until the server is reasonably 949 certain that its own TCP stack has received the client's 950 acknowledgement of the packet(s) containing the server's last 951 response. Finally, the server fully closes the connection. 952 953 954.. _switching-protocols: 955 956Switching protocols 957------------------- 958 959h11 supports two kinds of "protocol switches": requests with method 960``CONNECT``, and the newer ``Upgrade:`` header, most commonly used for 961negotiating WebSocket connections. Both follow the same pattern: the 962client proposes that they switch from regular HTTP to some other kind 963of interaction, and then the server either rejects the suggestion -- 964in which case we return to regular HTTP rules -- or else accepts 965it. (For ``CONNECT``, acceptance means a response with 2xx status 966code; for ``Upgrade:``, acceptance means an 967:class:`InformationalResponse` with status ``101 Switching 968Protocols``) If the proposal is accepted, then both sides switch to 969doing something else with their socket, and h11's job is done. 970 971As a developer using h11, it's your responsibility to send and 972interpret the actual ``CONNECT`` or ``Upgrade:`` request and response, 973and to figure out what to do after the handover; it's h11's job to 974understand what's going on, and help you make the handover 975smoothly. 976 977Specifically, what h11 does is :ref:`pause <flow-control>` parsing 978incoming data at the boundary between the two protocols, and then you 979can retrieve any unprocessed data from the 980:attr:`Connection.trailing_data` attribute. 981 982 983.. _sendfile: 984 985Support for ``sendfile()`` 986-------------------------- 987 988Many networking APIs provide some efficient way to send particular 989data, e.g. asking the operating system to stream files directly off of 990the disk and into a socket without passing through userspace. 991 992It's possible to use these APIs together with h11. The basic strategy 993is: 994 995* Create some placeholder object representing the special data, that 996 your networking code knows how to "send" by invoking whatever the 997 appropriate underlying APIs are. 998 999* Make sure your placeholder object implements a ``__len__`` method 1000 returning its size in bytes. 1001 1002* Call ``conn.send_with_data_passthrough(Data(data=<your placeholder 1003 object>))`` 1004 1005* This returns a list whose contents are a mixture of (a) bytes-like 1006 objects, and (b) your placeholder object. You should send them to 1007 the network in order. 1008 1009Here's a sketch of what this might look like: 1010 1011.. code-block:: python 1012 1013 class FilePlaceholder: 1014 def __init__(self, file, offset, count): 1015 self.file = file 1016 self.offset = offset 1017 self.count = count 1018 1019 def __len__(self): 1020 return self.count 1021 1022 def send_data(sock, data): 1023 if isinstance(data, FilePlaceholder): 1024 # socket.sendfile added in Python 3.5 1025 sock.sendfile(data.file, data.offset, data.count) 1026 else: 1027 # data is a bytes-like object to be sent directly 1028 sock.sendall(data) 1029 1030 placeholder = FilePlaceholder(open("...", "rb"), 0, 200) 1031 for data in conn.send_with_data_passthrough(Data(data=placeholder)): 1032 send_data(sock, data) 1033 1034This works with all the different framing modes (``Content-Length``, 1035``Transfer-Encoding: chunked``, etc.) -- h11 will add any necessary 1036framing data, update its internal state, and away you go. 1037 1038 1039Identifying h11 in requests and responses 1040----------------------------------------- 1041 1042According to RFC 7231, client requests are supposed to include a 1043``User-Agent:`` header identifying what software they're using, and 1044servers are supposed to respond with a ``Server:`` header doing the 1045same. h11 doesn't construct these headers for you, but to make it 1046easier for you to construct this header, it provides: 1047 1048.. data:: PRODUCT_ID 1049 1050 A string suitable for identifying the current version of h11 in a 1051 ``User-Agent:`` or ``Server:`` header. 1052 1053 The version of h11 that was used to build these docs identified 1054 itself as: 1055 1056 .. ipython:: python 1057 1058 h11.PRODUCT_ID 1059 1060 1061.. _chunk-delimiters-are-bad: 1062 1063Chunked Transfer Encoding Delimiters 1064------------------------------------ 1065 1066.. versionadded:: 0.7.0 1067 1068HTTP/1.1 allows for the use of Chunked Transfer Encoding to frame request and 1069response bodies. This form of transfer encoding allows the implementation to 1070provide its body data in the form of length-prefixed "chunks" of data. 1071 1072RFC 7230 is extremely clear that the breaking points between chunks of data are 1073non-semantic: that is, users should not rely on them or assign any meaning to 1074them. This is particularly important given that RFC 7230 also allows 1075intermediaries such as proxies and caches to change the chunk boundaries as 1076they see fit, or even to remove the chunked transfer encoding entirely. 1077 1078However, for some applications it is valuable or essential to see the chunk 1079boundaries because the peer implementation has assigned meaning to them. While 1080this is against the specification, if you do really need access to this 1081information h11 makes it available to you in the form of the 1082:data:`Data.chunk_start` and :data:`Data.chunk_end` properties of the 1083:class:`Data` event. 1084 1085:data:`Data.chunk_start` is set to ``True`` for the first :class:`Data` event 1086for a given chunk of data. :data:`Data.chunk_end` is set to ``True`` for the 1087last :class:`Data` event that is emitted for a given chunk of data. h11 1088guarantees that it will always emit at least one :class:`Data` event for each 1089chunk of data received from the remote peer, but due to its internal buffering 1090logic it may return more than one. It is possible for a single :class:`Data` 1091event to have both :data:`Data.chunk_start` and :data:`Data.chunk_end` set to 1092``True``, in which case it will be the only :class:`Data` event for that chunk 1093of data. 1094 1095Again, it is *strongly encouraged* that you avoid relying on this information 1096if at all possible. This functionality should be considered an escape hatch for 1097when there is no alternative but to rely on the information, rather than a 1098general source of data that is worth relying on. 1099