1.. _API-documentation:
2
3API documentation
4=================
5
6.. module:: h11
7
8.. contents::
9
10h11 has a fairly small public API, with all public symbols available
11directly at the top level:
12
13.. ipython::
14
15   In [2]: import h11
16
17   @verbatim
18   In [3]: h11.<TAB>
19   h11.CLIENT                 h11.MUST_CLOSE
20   h11.CLOSED                 h11.NEED_DATA
21   h11.Connection             h11.PAUSED
22   h11.ConnectionClosed       h11.PRODUCT_ID
23   h11.Data                   h11.ProtocolError
24   h11.DONE                   h11.RemoteProtocolError
25   h11.EndOfMessage           h11.Request
26   h11.ERROR                  h11.Response
27   h11.IDLE                   h11.SEND_BODY
28   h11.InformationalResponse  h11.SEND_RESPONSE
29   h11.LocalProtocolError     h11.SERVER
30   h11.MIGHT_SWITCH_PROTOCOL  h11.SWITCHED_PROTOCOL
31
32These symbols fall into three main categories: event classes, special
33constants used to track different connection states, and the
34:class:`Connection` class itself. We'll describe them in that order.
35
36.. _events:
37
38Events
39------
40
41*Events* are the core of h11: the whole point of h11 is to let you
42think about HTTP transactions as being a series of events sent back
43and forth between a client and a server, instead of thinking in terms
44of bytes.
45
46All events behave in essentially similar ways. Let's take
47:class:`Request` as an example. Like all events, this is a "final"
48class -- you cannot subclass it. And like all events, it has several
49fields. For :class:`Request`, there are four of them:
50:attr:`~Request.method`, :attr:`~Request.target`,
51:attr:`~Request.headers`, and
52:attr:`~Request.http_version`. :attr:`~Request.http_version`
53defaults to ``b"1.1"``; the rest have no default, so to create a
54:class:`Request` you have to specify their values:
55
56.. ipython:: python
57
58   req = h11.Request(method="GET",
59                     target="/",
60                     headers=[("Host", "example.com")])
61
62Event constructors accept only keyword arguments, not positional arguments.
63
64Events have a useful repr:
65
66.. ipython:: python
67
68   req
69
70And their fields are available as regular attributes:
71
72.. ipython:: python
73
74   req.method
75   req.target
76   req.headers
77   req.http_version
78
79Notice that these attributes have been normalized to byte-strings. In
80general, events normalize and validate their fields when they're
81constructed. Some of these normalizations and checks are specific to a
82particular event -- for example, :class:`Request` enforces RFC 7230's
83requirement that HTTP/1.1 requests must always contain a ``"Host"``
84header:
85
86.. ipython:: python
87
88   # HTTP/1.0 requests don't require a Host: header
89   h11.Request(method="GET", target="/", headers=[], http_version="1.0")
90
91.. ipython:: python
92   :okexcept:
93
94   # But HTTP/1.1 requests do
95   h11.Request(method="GET", target="/", headers=[])
96
97This helps protect you from accidentally violating the protocol, and
98also helps protect you from remote peers who attempt to violate the
99protocol.
100
101A few of these normalization rules are standard across multiple
102events, so we document them here:
103
104.. _headers-format:
105
106:attr:`headers`: In h11, headers are represented internally as a list
107of (*name*, *value*) pairs, where *name* and *value* are both
108byte-strings, *name* is always lowercase, and *name* and *value* are
109both guaranteed not to have any leading or trailing whitespace. When
110constructing an event, we accept any iterable of pairs like this, and
111will automatically convert native strings containing ascii or
112:term:`bytes-like object`\s to byte-strings and convert names to
113lowercase:
114
115.. ipython:: python
116
117   original_headers = [("HOST", bytearray(b"Example.Com"))]
118   req = h11.Request(method="GET", target="/", headers=original_headers)
119   original_headers
120   req.headers
121
122If any names are detected with leading or trailing whitespace, then
123this is an error ("in the past, differences in the handling of such
124whitespace have led to security vulnerabilities" -- `RFC 7230
125<https://tools.ietf.org/html/rfc7230#section-3.2.4>`_). We also check
126for certain other protocol violations, e.g. it's always illegal to
127have a newline inside a header value, and ``Content-Length: hello`` is
128an error because `Content-Length` should always be an integer. We may
129add additional checks in the future.
130
131While we make sure to expose header names as lowercased bytes, we also
132preserve the original header casing that is used. Compliant HTTP
133agents should always treat headers in a case insensitive manner, but
134this may not always be the case. When sending bytes over the wire we
135send headers preserving whatever original header casing was used.
136
137It is possible to access the headers in their raw original casing,
138which may be useful for some user output or debugging purposes.
139
140.. ipython:: python
141
142    original_headers = [("Host", "example.com")]
143    req = h11.Request(method="GET", target="/", headers=original_headers)
144    req.headers.raw_items()
145
146.. _http_version-format:
147
148It's not just headers we normalize to being byte-strings: the same
149type-conversion logic is also applied to the :attr:`Request.method`
150and :attr:`Request.target` field, and -- for consistency -- all
151:attr:`http_version` fields. In particular, we always represent HTTP
152version numbers as byte-strings like ``b"1.1"``. :term:`Bytes-like
153object`\s and native strings will be automatically converted to byte
154strings. Note that the HTTP standard `specifically guarantees
155<https://tools.ietf.org/html/rfc7230#section-2.6>`_ that all HTTP
156version numbers will consist of exactly two digits separated by a dot,
157so comparisons like ``req.http_version < b"1.1"`` are safe and valid.
158
159When manually constructing an event, you generally shouldn't specify
160:attr:`http_version`, because it defaults to ``b"1.1"``, and if you
161attempt to override this to some other value then
162:meth:`Connection.send` will reject your event -- h11 only speaks
163HTTP/1.1. But it does understand other versions of HTTP, so you might
164receive events with other ``http_version`` values from remote peers.
165
166Here's the complete set of events supported by h11:
167
168.. autoclass:: Request
169
170.. autoclass:: InformationalResponse
171
172.. autoclass:: Response
173
174.. autoclass:: Data
175
176.. autoclass:: EndOfMessage
177
178.. autoclass:: ConnectionClosed
179
180
181.. _state-machine:
182
183The state machine
184-----------------
185
186Now that you know what the different events are, the next question is:
187what can you do with them?
188
189A basic HTTP request/response cycle looks like this:
190
191* The client sends:
192
193  * one :class:`Request` event with request metadata and headers,
194  * zero or more :class:`Data` events with the request body (if any),
195  * and an :class:`EndOfMessage` event.
196
197* And then the server replies with:
198
199  * zero or more :class:`InformationalResponse` events,
200  * one :class:`Response` event,
201  * zero or more :class:`Data` events with the response body (if any),
202  * and a :class:`EndOfMessage` event.
203
204And once that's finished, both sides either close the connection, or
205they go back to the top and re-use it for another request/response
206cycle.
207
208To coordinate this interaction, the h11 :class:`Connection` object
209maintains several state machines: one that tracks what the client is
210doing, one that tracks what the server is doing, and a few more tiny
211ones to track whether :ref:`keep-alive <keepalive-and-pipelining>` is
212enabled and whether the client has proposed to :ref:`switch protocols
213<switching-protocols>`. h11 always keeps track of all of these state
214machines, regardless of whether it's currently playing the client or
215server role.
216
217The state machines look like this (click on each to expand):
218
219.. ipython:: python
220   :suppress:
221
222   import sys
223   import subprocess
224   subprocess.check_call([sys.executable,
225                          sys._h11_hack_docs_source_path
226                          + "/make-state-diagrams.py"])
227
228.. |client-image| image:: _static/CLIENT.svg
229      :target: _static/CLIENT.svg
230      :width: 100%
231      :align: top
232
233.. |server-image| image:: _static/SERVER.svg
234      :target: _static/SERVER.svg
235      :width: 100%
236      :align: top
237
238.. |special-image| image:: _static/special-states.svg
239   :target: _static/special-states.svg
240   :width: 100%
241
242+----------------+----------------+
243| |client-image| | |server-image| |
244+----------------+----------------+
245|        |special-image|          |
246+---------------------------------+
247
248If you squint at the first two diagrams, you can see the client's IDLE
249-> SEND_BODY -> DONE path and the server's IDLE -> SEND_RESPONSE ->
250SEND_BODY -> DONE path, which encode the basic sequence of events we
251described above. But there's a fair amount of other stuff going on
252here as well.
253
254The first thing you should notice is the different colors. These
255correspond to the different ways that our state machines can change
256state.
257
258* Dark blue arcs are *event-triggered transitions*: if we're in state
259  A, and this event happens, when we switch to state B. For the client
260  machine, these transitions always happen when the client *sends* an
261  event. For the server machine, most of them involve the server
262  sending an event, except that the server also goes from IDLE ->
263  SEND_RESPONSE when the client sends a :class:`Request`.
264
265* Green arcs are *state-triggered transitions*: these are somewhat
266  unusual, and are used to couple together the different state
267  machines -- if, at any moment, one machine is in state A and another
268  machine is in state B, then the first machine immediately
269  transitions to state C. For example, if the CLIENT machine is in
270  state DONE, and the SERVER machine is in the CLOSED state, then the
271  CLIENT machine transitions to MUST_CLOSE. And the same thing happens
272  if the CLIENT machine is in the state DONE and the keep-alive
273  machine is in the state disabled.
274
275* There are also two purple arcs labeled
276  :meth:`~Connection.start_next_cycle`: these correspond to an explicit
277  method call documented below.
278
279Here's why we have all the stuff in those diagrams above, beyond
280what's needed to handle the basic request/response cycle:
281
282* Server sending a :class:`Response` directly from :data:`IDLE`: This
283  is used for error responses, when the client's request never arrived
284  (e.g. 408 Request Timed Out) or was unparseable gibberish (400 Bad
285  Request) and thus didn't register with our state machine as a real
286  :class:`Request`.
287
288* The transitions involving :data:`MUST_CLOSE` and :data:`CLOSE`:
289  keep-alive and shutdown handling; see
290  :ref:`keepalive-and-pipelining` and :ref:`closing`.
291
292* The transitions involving :data:`MIGHT_SWITCH_PROTOCOL` and
293  :data:`SWITCHED_PROTOCOL`: See :ref:`switching-protocols`.
294
295* That weird :data:`ERROR` state hanging out all lonely on the bottom:
296  to avoid cluttering the diagram, we don't draw any arcs coming into
297  this node, but that doesn't mean it can't be entered. In fact, it
298  can be entered from any state: if any exception occurs while trying
299  to send/receive data, then the corresponding machine will transition
300  directly to this state. Once there, though, it can never leave --
301  that part of the diagram is accurate. See :ref:`error-handling`.
302
303And finally, note that in these diagrams, all the labels that are in
304*italics* are informal English descriptions of things that happen in
305the code, while the labels in upright text correspond to actual
306objects in the public API. You've already seen the event objects like
307:class:`Request` and :class:`Response`; there are also a set of opaque
308sentinel values that you can use to track and query the client and
309server's states.
310
311
312Special constants
313-----------------
314
315h11 exposes some special constants corresponding to the different
316states in the client and server state machines described above. The
317complete list is:
318
319.. data:: IDLE
320          SEND_RESPONSE
321          SEND_BODY
322          DONE
323          MUST_CLOSE
324          CLOSED
325          MIGHT_SWITCH_PROTOCOL
326          SWITCHED_PROTOCOL
327          ERROR
328
329For example, we can see that initially the client and server start in
330state :data:`IDLE` / :data:`IDLE`:
331
332.. ipython:: python
333
334   conn = h11.Connection(our_role=h11.CLIENT)
335   conn.states
336
337And then if the client sends a :class:`Request`, then the client
338switches to state :data:`SEND_BODY`, while the server switches to
339state :data:`SEND_RESPONSE`:
340
341.. ipython:: python
342
343   conn.send(h11.Request(method="GET", target="/", headers=[("Host", "example.com")]));
344   conn.states
345
346And we can test these values directly using constants like :data:`SEND_BODY`:
347
348.. ipython:: python
349
350   conn.states[h11.CLIENT] is h11.SEND_BODY
351
352This shows how the :class:`Connection` type tracks these state
353machines and lets you query their current state.
354
355The above also showed the special constants that can be used to
356indicate the two different roles that a peer can play in an HTTP
357connection:
358
359.. data:: CLIENT
360          SERVER
361
362And finally, there are also two special constants that can be returned
363from :meth:`Connection.next_event`:
364
365.. data:: NEED_DATA
366          PAUSED
367
368All of these behave the same, and their behavior is modeled after
369:data:`None`: they're opaque singletons, their :meth:`__repr__` is
370their name, and you compare them with ``is``.
371
372.. _sentinel-type-trickiness:
373
374Finally, h11's constants have a quirky feature that can sometimes be
375useful: they are instances of themselves.
376
377.. ipython:: python
378
379   type(h11.NEED_DATA) is h11.NEED_DATA
380   type(h11.PAUSED) is h11.PAUSED
381
382The main application of this is that when handling the return value
383from :meth:`Connection.next_event`, which is sometimes an instance of
384an event class and sometimes :data:`NEED_DATA` or :data:`PAUSED`, you
385can always call ``type(event)`` to get something useful to dispatch
386one, using e.g. a handler table, :func:`functools.singledispatch`, or
387calling ``getattr(some_object, "handle_" +
388type(event).__name__)``. Not that this kind of dispatch-based strategy
389is always the best approach -- but the option is there if you want it.
390
391
392The Connection object
393---------------------
394
395.. autoclass:: Connection
396
397   .. automethod:: receive_data
398   .. automethod:: next_event
399   .. automethod:: send
400   .. automethod:: send_with_data_passthrough
401   .. automethod:: send_failed
402
403   .. automethod:: start_next_cycle
404
405   .. attribute:: our_role
406
407      :data:`CLIENT` if this is a client; :data:`SERVER` if this is a server.
408
409   .. attribute:: their_role
410
411      :data:`SERVER` if this is a client; :data:`CLIENT` if this is a server.
412
413   .. autoattribute:: states
414   .. autoattribute:: our_state
415   .. autoattribute:: their_state
416
417   .. attribute:: their_http_version
418
419      The version of HTTP that our peer claims to support. ``None`` if
420      we haven't yet received a request/response.
421
422      This is preserved by :meth:`start_next_cycle`, so it can be
423      handy for a client making multiple requests on the same
424      connection: normally you don't know what version of HTTP the
425      server supports until after you do a request and get a response
426      -- so on an initial request you might have to assume the
427      worst. But on later requests on the same connection, the
428      information will be available here.
429
430   .. attribute:: client_is_waiting_for_100_continue
431
432      True if the client sent a request with the ``Expect:
433      100-continue`` header, and is still waiting for a response
434      (i.e., the server has not sent a 100 Continue or any other kind
435      of response, and the client has not gone ahead and started
436      sending the body anyway).
437
438      See `RFC 7231 section 5.1.1
439      <https://tools.ietf.org/html/rfc7231#section-5.1.1>`_ for details.
440
441   .. attribute:: they_are_waiting_for_100_continue
442
443      True if :attr:`their_role` is :data:`CLIENT` and
444      :attr:`client_is_waiting_for_100_continue`.
445
446   .. autoattribute:: trailing_data
447
448
449.. _error-handling:
450
451Error handling
452--------------
453
454Given the vagaries of networks and the folks on the other side of
455them, it's extremely important to be prepared for errors.
456
457Most errors in h11 are signaled by raising one of
458:exc:`ProtocolError`'s two concrete base classes,
459:exc:`LocalProtocolError` and :exc:`RemoteProtocolError`:
460
461.. autoexception:: ProtocolError
462.. autoexception:: LocalProtocolError
463.. autoexception:: RemoteProtocolError
464
465There are four cases where these exceptions might be raised:
466
467* When trying to instantiate an event object
468  (:exc:`LocalProtocolError`): This indicates that something about
469  your event is invalid. Your event wasn't constructed, but there are
470  no other consequences -- feel free to try again.
471
472* When calling :meth:`Connection.start_next_cycle`
473  (:exc:`LocalProtocolError`): This indicates that the connection is
474  not ready to be re-used, because one or both of the peers are not in
475  the :data:`DONE` state. The :class:`Connection` object remains
476  usable, and you can try again later.
477
478* When calling :meth:`Connection.next_event`
479  (:exc:`RemoteProtocolError`): This indicates that the remote peer
480  has violated our protocol assumptions. This is unrecoverable -- we
481  don't know what they're doing and we cannot safely
482  proceed. :attr:`Connection.their_state` immediately becomes
483  :data:`ERROR`, and all further calls to
484  :meth:`~Connection.next_event` will also raise
485  :exc:`RemoteProtocolError`. :meth:`Connection.send` still works as
486  normal, so if you're implementing a server and this happens then you
487  have an opportunity to send back a 400 Bad Request response. But
488  aside from that, your only real option is to close your socket and
489  make a new connection.
490
491* When calling :meth:`Connection.send` or
492  :meth:`Connection.send_with_data_passthrough`
493  (:exc:`LocalProtocolError`): This indicates that *you* violated our
494  protocol assumptions. This is also unrecoverable -- h11 doesn't know
495  what you're doing, its internal state may be inconsistent, and we
496  cannot safely proceed. :attr:`Connection.our_state` immediately
497  becomes :data:`ERROR`, and all further calls to
498  :meth:`~Connection.send` will also raise
499  :exc:`LocalProtocolError`. The only thing you can reasonably due at
500  this point is to close your socket and make a new connection.
501
502So that's how h11 tells you about errors that it detects. In some
503cases, it's also useful to be able to tell h11 about an error that you
504detected. In particular, the :class:`Connection` object assumes that
505after you call :meth:`Connection.send`, you actually send that data to
506the remote peer. But sometimes, for one reason or another, this
507doesn't actually happen.
508
509Here's a concrete example. Suppose you're using h11 to implement an
510HTTP client that keeps a pool of connections so it can re-use them
511when possible (see :ref:`keepalive-and-pipelining`). You take a
512connection from the pool, and start to do a large upload... but then
513for some reason this gets cancelled (maybe you have a GUI and a user
514clicked "cancel"). This can cause h11's model of this connection to
515diverge from reality: for example, h11 might think that you
516successfully sent the full request, because you passed an
517:class:`EndOfMessage` object to :meth:`Connection.send`, but in fact
518you didn't, because you never sent the resulting bytes. And then –
519here's the really tricky part! – if you're not careful, you might
520think that it's OK to put this connection back into the connection
521pool and re-use it, because h11 is telling you that a full
522request/response cycle was completed. But this is wrong; in fact you
523have to close this connection and open a new one.
524
525The solution is simple: call :meth:`Connection.send_failed`, and now
526h11 knows that your send failed. In this case,
527:attr:`Connection.our_state` immediately becomes :data:`ERROR`, just
528like if you had tried to do something that violated the protocol.
529
530
531.. _framing:
532
533Message body framing: ``Content-Length`` and all that
534-----------------------------------------------------
535
536There are two different headers that HTTP/1.1 uses to indicate a
537framing mechanism for request/response bodies: ``Content-Length`` and
538``Transfer-Encoding``. Our general philosophy is that the way you tell
539h11 what configuration you want to use is by setting the appropriate
540headers in your request / response, and then h11 will both pass those
541headers on to the peer and encode the body appropriately.
542
543Currently, the only supported ``Transfer-Encoding`` is ``chunked``.
544
545On requests, this means:
546
547* No ``Content-Length`` or ``Transfer-Encoding``: no body, equivalent
548  to ``Content-Length: 0``.
549
550* ``Content-Length: ...``: You're going to send exactly the specified
551  number of bytes. h11 will keep track and signal an error if your
552  :class:`EndOfMessage` doesn't happen at the right place.
553
554* ``Transfer-Encoding: chunked``: You're going to send a variable /
555  not yet known number of bytes.
556
557  Note 1: only HTTP/1.1 servers are required to support
558  ``Transfer-Encoding: chunked``, and as a client you have to decide
559  whether to send this header before you get to see what protocol
560  version the server is using.
561
562  Note 2: even though HTTP/1.1 servers are required to support
563  ``Transfer-Encoding: chunked``, this doesn't necessarily mean that
564  they actually do -- e.g., applications using Python's standard WSGI
565  API cannot accept chunked requests.
566
567  Nonetheless, this is the only way to send request where you don't
568  know the size of the body ahead of time, so if that's the situation
569  you find yourself in then you might as well try it and hope.
570
571On responses, things are a bit more subtle. There are effectively two
572cases:
573
574* ``Content-Length: ...``: You're going to send exactly the specified
575  number of bytes. h11 will keep track and signal an error if your
576  :class:`EndOfMessage` doesn't happen at the right place.
577
578* ``Transfer-Encoding: chunked``, *or*, neither framing header is
579  provided: These two cases are handled differently at the wire level,
580  but as far as the application is concerned they provide (almost)
581  exactly the same semantics: in either case, you'll send a variable /
582  not yet known number of bytes. The difference between them is that
583  ``Transfer-Encoding: chunked`` works better (compatible with
584  keep-alive, allows trailing headers, clearly distinguishes between
585  successful completion and network errors), but requires an HTTP/1.1
586  client; for HTTP/1.0 clients the only option is the no-headers
587  approach where you have to close the socket to indicate completion.
588
589  Since this is (almost) entirely a wire-level-encoding concern, h11
590  abstracts it: when sending a response you can set either
591  ``Transfer-Encoding: chunked`` or leave off both framing headers,
592  and h11 will treat both cases identically: it will automatically
593  pick the best option given the client's advertised HTTP protocol
594  level.
595
596  You need to watch out for this if you're using trailing headers
597  (i.e., a non-empty ``headers`` attribute on :class:`EndOfMessage`),
598  since trailing headers are only legal if we actually ended up using
599  ``Transfer-Encoding: chunked``. Trying to send a non-empty set of
600  trailing headers to a HTTP/1.0 client will raise a
601  :exc:`LocalProtocolError`. If this use case is important to you, check
602  :attr:`Connection.their_http_version` to confirm that the client
603  speaks HTTP/1.1 before you attempt to send any trailing headers.
604
605
606.. _keepalive-and-pipelining:
607
608Re-using a connection: keep-alive and pipelining
609------------------------------------------------
610
611HTTP/1.1 allows a connection to be re-used for multiple
612request/response cycles (also known as "keep-alive"). This can make
613things faster by letting us skip the costly connection setup, but it
614does create some complexities: we have to keep track of whether a
615connection is reusable, and when there are multiple requests and
616responses flowing through the same connection we need to be careful
617not to get confused about which request goes with which response.
618
619h11 considers a connection to be reusable if, and only if, both
620sides (a) speak HTTP/1.1 (HTTP/1.0 did have some complex and fragile
621support for keep-alive bolted on, but h11 currently doesn't support
622that -- possibly this will be added in the future), and (b) neither
623side has explicitly disabled keep-alive by sending a ``Connection:
624close`` header.
625
626If you plan to make only a single request or response and then close
627the connection, you should manually set the ``Connection: close``
628header in your request/response. h11 will notice and update its state
629appropriately.
630
631There are also some situations where you are required to send a
632``Connection: close`` header, e.g. if you are a server talking to a
633client that doesn't support keep-alive. You don't need to worry about
634these cases -- h11 will automatically add this header when
635necessary. Just worry about setting it when it's actually something
636that you're actively choosing.
637
638If you want to re-use a connection, you have to wait until both the
639request and the response have been completed, bringing both the client
640and server to the :data:`DONE` state. Once this has happened, you can
641explicitly call :meth:`Connection.start_next_cycle` to reset both
642sides back to the :data:`IDLE` state. This makes sure that the client
643and server remain synched up.
644
645If keep-alive is disabled for whatever reason -- someone set
646``Connection: close``, lack of protocol support, one of the sides just
647unilaterally closed the connection -- then the state machines will
648skip past the :data:`DONE` state directly to the :data:`MUST_CLOSE` or
649:data:`CLOSED` states. In this case, trying to call
650:meth:`~Connection.start_next_cycle` will raise an error, and the only
651thing you can legally do is to close this connection and make a new
652one.
653
654HTTP/1.1 also allows for a more aggressive form of connection re-use,
655in which a client sends multiple requests in quick succession, and
656then waits for the responses to stream back in order
657("pipelining"). This is generally considered to have been a bad idea,
658because it makes things like error recovery very complicated.
659
660As a client, h11 does not support pipelining. This is enforced by the
661structure of the state machine: after sending one :class:`Request`,
662you can't send another until after calling
663:meth:`~Connection.start_next_cycle`, and you can't call
664:meth:`~Connection.start_next_cycle` until the server has entered the
665:data:`DONE` state, which requires reading the server's full
666response.
667
668As a server, h11 provides the minimal support for pipelining required
669to comply with the HTTP/1.1 standard: if the client sends multiple
670pipelined requests, then we handle the first request until we reach the
671:data:`DONE` state, and then :meth:`~Connection.next_event` will
672pause and refuse to parse any more events until the response is
673completed and :meth:`~Connection.start_next_cycle` is called. See the
674next section for more details.
675
676
677.. _flow-control:
678
679Flow control
680------------
681
682Presumably you know when you want to send things, and the
683:meth:`~Connection.send` interface is very simple: it just immediately
684returns all the data you need to send for the given event, so you can
685apply whatever send buffer strategy you want. But reading from the
686remote peer is a bit trickier: you don't want to read data from the
687remote peer if it can't be processed (i.e., you want to apply
688backpressure and avoid building arbitrarily large in-memory buffers),
689and you definitely don't want to block waiting on data from the remote
690peer at the same time that it's blocked waiting for you, because that
691will cause a deadlock.
692
693One complication here is that if you're implementing a server, you
694have to be prepared to handle :class:`Request`\s that have an
695``Expect: 100-continue`` header. You can `read the spec
696<https://tools.ietf.org/html/rfc7231#section-5.1.1>`_ for the full
697details, but basically what this header means is that after sending
698the :class:`Request`, the client plans to pause and wait until they
699see some response from the server before they send that request's
700:class:`Data`. The server's response would normally be an
701:class:`InformationalResponse` with status ``100 Continue``, but it
702could be anything really (e.g. a full :class:`Response` with a 4xx
703status code). The crucial thing as a server, though, is that you
704should never block trying to read a request body if the client is
705blocked waiting for you to tell them to send the request body.
706
707Fortunately, h11 makes this easy, because it tracks whether the client
708is in the waiting-for-100-continue state, and exposes this as
709:attr:`Connection.they_are_waiting_for_100_continue`. So you don't
710have to pay attention to the ``Expect`` header yourself; you just have
711to make sure that before you block waiting to read a request body, you
712execute some code like:
713
714.. code-block:: python
715
716   if conn.they_are_waiting_for_100_continue:
717       do_send(conn, h11.InformationalResponse(100, headers=[...]))
718   do_read(...)
719
720In fact, if you're lazy (and what programmer isn't?) then you can just
721do this check before all reads -- it's mandatory before blocking to
722read a request body, but it's safe at any time.
723
724And the other thing you want to pay attention to is the special values
725that :meth:`~Connection.next_event` might return: :data:`NEED_DATA`
726and :data:`PAUSED`.
727
728:data:`NEED_DATA` is what it sounds like: it means that
729:meth:`~Connection.next_event` is guaranteed not to return any more
730real events until you've called :meth:`~Connection.receive_data` at
731least once.
732
733:data:`PAUSED` is a little more subtle: it means that
734:meth:`~Connection.next_event` is guaranteed not to return any more
735real events until something else has happened to clear up the paused
736state. There are three cases where this can happen:
737
7381) We received a full request/response from the remote peer, and then
739   we received some more data after that. (The main situation where
740   this might happen is a server responding to a pipelining client.)
741   The :data:`PAUSED` state will go away after you call
742   :meth:`~Connection.start_next_cycle`.
743
7442) A successful ``CONNECT`` or ``Upgrade:`` request has caused the
745   connection to switch to some other protocol (see
746   :ref:`switching-protocols`). This :data:`PAUSED` state is
747   permanent; you should abandon this :class:`Connection` and go do
748   whatever it is you're going to do with your new protocol.
749
7503) We're a server, and the client we're talking to proposed to switch
751   protocols (see :ref:`switching-protocols`), and now is waiting to
752   find out whether their request was successful or not. Once we
753   either accept or deny their request then this will turn into one of
754   the above two states, so you probably don't need to worry about
755   handling it specially.
756
757Putting all this together --
758
759If your I/O is organized around a "pull" strategy, where your code
760requests events as its ready to handle them (e.g. classic synchronous
761code, or asyncio's ``await loop.sock_recv(...)``, or `Trio's streams
762<http://https://trio.readthedocs.io/en/latest/reference-io.html#the-abstract-stream-api>`__),
763then you'll probably want logic that looks something like:
764
765.. code-block:: python
766
767   # Replace do_sendall and do_recv with your I/O code
768   def get_next_event():
769       while True:
770           event = conn.next_event()
771           if event is h11.NEED_DATA:
772               if conn.they_are_waiting_for_100_continue:
773                   do_sendall(conn, h11.InformationalResponse(100, ...))
774               conn.receive_data(do_recv())
775               continue
776           return event
777
778And then your code that calls this will need to make sure to call it
779only at appropriate times (e.g., not immediately after receiving
780:class:`EndOfMessage` or :data:`PAUSED`).
781
782If your I/O is organized around a "push" strategy, where the network
783drives processing (e.g. you're using `Twisted
784<https://twistedmatrix.com/>`_, or implementing an
785:class:`asyncio.Protocol`), then you'll want to internally apply
786back-pressure whenever you see :data:`PAUSED`, remove back-pressure
787when you call :meth:`~Connection.start_next_cycle`, and otherwise just
788deliver events as they arrive. Something like:
789
790.. code-block:: python
791
792   class HTTPProtocol(asyncio.Protocol):
793       # Save the transport for later -- needed to access the
794       # backpressure API.
795       def connection_made(self, transport):
796           self._transport = transport
797
798       # Internal helper function -- deliver all pending events
799       def _deliver_events(self):
800           while True:
801               event = self.conn.next_event()
802               if event is h11.NEED_DATA:
803                   break
804               elif event is h11.PAUSED:
805                   # Apply back-pressure
806                   self._transport.pause_reading()
807                   break
808               else:
809                   self.event_received(event)
810
811       # Called by "someone" whenever new data appears on our socket
812       def data_received(self, data):
813           self.conn.receive_data(data)
814           self._deliver_events()
815
816       # Called by "someone" whenever the peer closes their socket
817       def eof_received(self):
818           self.conn.receive_data(b"")
819           self._deliver_events()
820           # asyncio will close our socket unless we return True here.
821           return True
822
823       # Called by your code when its ready to start a new
824       # request/response cycle
825       def start_next_cycle(self):
826           self.conn.start_next_cycle()
827           # New events might have been buffered internally, and only
828           # become deliverable after calling start_next_cycle
829           self._deliver_events()
830           # Remove back-pressure
831           self._transport.resume_reading()
832
833       # Fill in your code here
834       def event_received(self, event):
835           ...
836
837And your code that uses this will have to remember to check for
838:attr:`~Connection.they_are_waiting_for_100_continue` at the
839appropriate time.
840
841
842.. _closing:
843
844Closing connections
845-------------------
846
847h11 represents a connection shutdown with the special event type
848:class:`ConnectionClosed`. You can send this event, in which case
849:meth:`~Connection.send` will simply update the state machine and
850then return ``None``. You can receive this event, if you call
851``conn.receive_data(b"")``. (The actual receipt might be delayed if
852the connection is :ref:`paused <flow-control>`.) It's safe and legal
853to call ``conn.receive_data(b"")`` multiple times, and once you've
854done this once, then all future calls to
855:meth:`~Connection.receive_data` will also return
856``ConnectionClosed()``:
857
858.. ipython:: python
859
860   conn = h11.Connection(our_role=h11.CLIENT)
861   conn.receive_data(b"")
862   conn.receive_data(b"")
863   conn.receive_data(None)
864
865(Or if you try to actually pass new data in after calling
866``conn.receive_data(b"")``, that will raise an exception.)
867
868h11 is careful about interpreting connection closure in a *half-duplex
869fashion*. TCP sockets pretend to be a two-way connection, but really
870they're two one-way connections. In particular, it's possible for one
871party to shut down their sending connection -- which causes the other
872side to be notified that the connection has closed via the usual
873``socket.recv(...) -> b""`` mechanism -- while still being able to
874read from their receiving connection. (On Unix, this is generally
875accomplished via the ``shutdown(2)`` system call.) So, for example, a
876client could send a request, and then close their socket for writing
877to indicate that they won't be sending any more requests, and then
878read the response. It's this kind of closure that is indicated by
879h11's :class:`ConnectionClosed`: it means that this party will not be
880sending any more data -- nothing more, nothing less. You can see this
881reflected in the :ref:`state machine <state-machine>`, in which one
882party transitioning to :data:`CLOSED` doesn't immediately halt the
883connection, but merely prevents it from continuing for another
884request/response cycle.
885
886The state machine also indicates that :class:`ConnectionClosed` events
887can only happen in certain states. This isn't true, of course -- any
888party can close their connection at any time, and h11 can't stop
889them. But what h11 can do is distinguish between clean and unclean
890closes. For example, if both sides complete a request/response cycle
891and then close the connection, that's a clean closure and everyone
892will transition to the :data:`CLOSED` state in an orderly fashion. On
893the other hand, if one party suddenly closes the connection while
894they're in the middle of sending a chunked response body, or when they
895promised a ``Content-Length:`` of 1000 bytes but have only sent 500,
896then h11 knows that this is a violation of the HTTP protocol, and will
897raise a :exc:`ProtocolError`. Basically h11 treats an unexpected
898close the same way it would treat unexpected, uninterpretable data
899arriving -- it lets you know that something has gone wrong.
900
901As a client, the proper way to perform a single request and then close
902the connection is:
903
9041) Send a :class:`Request` with ``Connection: close``
905
9062) Send the rest of the request body
907
9083) Read the server's :class:`Response` and body
909
9104) ``conn.our_state is h11.MUST_CLOSE`` will now be true. Call
911   ``conn.send(ConnectionClosed())`` and then close the socket. Or
912   really you could just close the socket -- the thing calling
913   ``send`` will do is raise an error if you're not in
914   :data:`MUST_CLOSE` as expected. So it's between you and your
915   conscience and your code reviewers.
916
917(Technically it would also be legal to shutdown your socket for
918writing as step 2.5, but this doesn't serve any purpose and some
919buggy servers might get annoyed, so it's not recommended.)
920
921As a server, the proper way to perform a response is:
922
9231) Send your :class:`Response` and body
924
9252) Check if ``conn.our_state is h11.MUST_CLOSE``. This might happen
926   for a variety of reasons; for example, if the response had unknown
927   length and the client speaks only HTTP/1.0, then the client will
928   not consider the connection complete until we issue a close.
929
930You should be particularly careful to take into consideration the
931following note fromx `RFC 7230 section 6.6
932<https://tools.ietf.org/html/rfc7230#section-6.6>`_:
933
934   If a server performs an immediate close of a TCP connection, there is
935   a significant risk that the client will not be able to read the last
936   HTTP response.  If the server receives additional data from the
937   client on a fully closed connection, such as another request that was
938   sent by the client before receiving the server's response, the
939   server's TCP stack will send a reset packet to the client;
940   unfortunately, the reset packet might erase the client's
941   unacknowledged input buffers before they can be read and interpreted
942   by the client's HTTP parser.
943
944   To avoid the TCP reset problem, servers typically close a connection
945   in stages.  First, the server performs a half-close by closing only
946   the write side of the read/write connection.  The server then
947   continues to read from the connection until it receives a
948   corresponding close by the client, or until the server is reasonably
949   certain that its own TCP stack has received the client's
950   acknowledgement of the packet(s) containing the server's last
951   response.  Finally, the server fully closes the connection.
952
953
954.. _switching-protocols:
955
956Switching protocols
957-------------------
958
959h11 supports two kinds of "protocol switches": requests with method
960``CONNECT``, and the newer ``Upgrade:`` header, most commonly used for
961negotiating WebSocket connections. Both follow the same pattern: the
962client proposes that they switch from regular HTTP to some other kind
963of interaction, and then the server either rejects the suggestion --
964in which case we return to regular HTTP rules -- or else accepts
965it. (For ``CONNECT``, acceptance means a response with 2xx status
966code; for ``Upgrade:``, acceptance means an
967:class:`InformationalResponse` with status ``101 Switching
968Protocols``) If the proposal is accepted, then both sides switch to
969doing something else with their socket, and h11's job is done.
970
971As a developer using h11, it's your responsibility to send and
972interpret the actual ``CONNECT`` or ``Upgrade:`` request and response,
973and to figure out what to do after the handover; it's h11's job to
974understand what's going on, and help you make the handover
975smoothly.
976
977Specifically, what h11 does is :ref:`pause <flow-control>` parsing
978incoming data at the boundary between the two protocols, and then you
979can retrieve any unprocessed data from the
980:attr:`Connection.trailing_data` attribute.
981
982
983.. _sendfile:
984
985Support for ``sendfile()``
986--------------------------
987
988Many networking APIs provide some efficient way to send particular
989data, e.g. asking the operating system to stream files directly off of
990the disk and into a socket without passing through userspace.
991
992It's possible to use these APIs together with h11. The basic strategy
993is:
994
995* Create some placeholder object representing the special data, that
996  your networking code knows how to "send" by invoking whatever the
997  appropriate underlying APIs are.
998
999* Make sure your placeholder object implements a ``__len__`` method
1000  returning its size in bytes.
1001
1002* Call ``conn.send_with_data_passthrough(Data(data=<your placeholder
1003  object>))``
1004
1005* This returns a list whose contents are a mixture of (a) bytes-like
1006  objects, and (b) your placeholder object. You should send them to
1007  the network in order.
1008
1009Here's a sketch of what this might look like:
1010
1011.. code-block:: python
1012
1013   class FilePlaceholder:
1014       def __init__(self, file, offset, count):
1015           self.file = file
1016           self.offset = offset
1017           self.count = count
1018
1019       def __len__(self):
1020           return self.count
1021
1022   def send_data(sock, data):
1023       if isinstance(data, FilePlaceholder):
1024           # socket.sendfile added in Python 3.5
1025           sock.sendfile(data.file, data.offset, data.count)
1026       else:
1027           # data is a bytes-like object to be sent directly
1028           sock.sendall(data)
1029
1030   placeholder = FilePlaceholder(open("...", "rb"), 0, 200)
1031   for data in conn.send_with_data_passthrough(Data(data=placeholder)):
1032       send_data(sock, data)
1033
1034This works with all the different framing modes (``Content-Length``,
1035``Transfer-Encoding: chunked``, etc.) -- h11 will add any necessary
1036framing data, update its internal state, and away you go.
1037
1038
1039Identifying h11 in requests and responses
1040-----------------------------------------
1041
1042According to RFC 7231, client requests are supposed to include a
1043``User-Agent:`` header identifying what software they're using, and
1044servers are supposed to respond with a ``Server:`` header doing the
1045same. h11 doesn't construct these headers for you, but to make it
1046easier for you to construct this header, it provides:
1047
1048.. data:: PRODUCT_ID
1049
1050   A string suitable for identifying the current version of h11 in a
1051   ``User-Agent:`` or ``Server:`` header.
1052
1053   The version of h11 that was used to build these docs identified
1054   itself as:
1055
1056   .. ipython:: python
1057
1058      h11.PRODUCT_ID
1059
1060
1061.. _chunk-delimiters-are-bad:
1062
1063Chunked Transfer Encoding Delimiters
1064------------------------------------
1065
1066.. versionadded:: 0.7.0
1067
1068HTTP/1.1 allows for the use of Chunked Transfer Encoding to frame request and
1069response bodies. This form of transfer encoding allows the implementation to
1070provide its body data in the form of length-prefixed "chunks" of data.
1071
1072RFC 7230 is extremely clear that the breaking points between chunks of data are
1073non-semantic: that is, users should not rely on them or assign any meaning to
1074them. This is particularly important given that RFC 7230 also allows
1075intermediaries such as proxies and caches to change the chunk boundaries as
1076they see fit, or even to remove the chunked transfer encoding entirely.
1077
1078However, for some applications it is valuable or essential to see the chunk
1079boundaries because the peer implementation has assigned meaning to them. While
1080this is against the specification, if you do really need access to this
1081information h11 makes it available to you in the form of the
1082:data:`Data.chunk_start` and :data:`Data.chunk_end` properties of the
1083:class:`Data` event.
1084
1085:data:`Data.chunk_start` is set to ``True`` for the first :class:`Data` event
1086for a given chunk of data. :data:`Data.chunk_end` is set to ``True`` for the
1087last :class:`Data` event that is emitted for a given chunk of data. h11
1088guarantees that it will always emit at least one :class:`Data` event for each
1089chunk of data received from the remote peer, but due to its internal buffering
1090logic it may return more than one. It is possible for a single :class:`Data`
1091event to have both :data:`Data.chunk_start` and :data:`Data.chunk_end` set to
1092``True``, in which case it will be the only :class:`Data` event for that chunk
1093of data.
1094
1095Again, it is *strongly encouraged* that you avoid relying on this information
1096if at all possible. This functionality should be considered an escape hatch for
1097when there is no alternative but to rely on the information, rather than a
1098general source of data that is worth relying on.
1099