1Change Log
2----------
3
41.0.1
5~~~~~
6
7Released on December 7, 2017
8
9Breaking changes:
10
11* Drop support for Python 2.6. (#330) (Thank you, Hugo, Will Kahn-Greene!)
12* Remove ``utils/spider.py`` (#353) (Thank you, Jon Dufresne!)
13
14Features:
15
16* Improve documentation. (#300, #307) (Thank you, Jon Dufresne, Tom Most,
17  Will Kahn-Greene!)
18* Add iframe seamless boolean attribute. (Thank you, Ritwik Gupta!)
19* Add itemscope as a boolean attribute. (#194) (Thank you, Jonathan Vanasco!)
20* Support Python 3.6. (#333) (Thank you, Jon Dufresne!)
21* Add CI support for Windows using AppVeyor. (Thank you, John Vandenberg!)
22* Improve testing and CI and add code coverage (#323, #334), (Thank you, Jon
23  Dufresne, John Vandenberg, Geoffrey Sneddon, Will Kahn-Greene!)
24* Semver-compliant version number.
25
26Bug fixes:
27
28* Add support for setuptools < 18.5 to support environment markers. (Thank you,
29  John Vandenberg!)
30* Add explicit dependency for six >= 1.9. (Thank you, Eric Amorde!)
31* Fix regexes to work with Python 3.7 regex adjustments. (#318, #379) (Thank
32  you, Benedikt Morbach, Ville Skyttä, Mark Vasilkov!)
33* Fix alphabeticalattributes filter namespace bug. (#324) (Thank you, Will
34  Kahn-Greene!)
35* Include license file in generated wheel package. (#350) (Thank you, Jon
36  Dufresne!)
37* Fix annotation-xml typo. (#339) (Thank you, Will Kahn-Greene!)
38* Allow uppercase hex chararcters in CSS colour check. (#377) (Thank you,
39  Komal Dembla, Hugo!)
40
41
421.0
43~~~
44
45Released and unreleased on December 7, 2017. Badly packaged release.
46
47
480.999999999/1.0b10
49~~~~~~~~~~~~~~~~~~
50
51Released on July 15, 2016
52
53* Fix attribute order going to the tree builder to be document order
54  instead of reverse document order(!).
55
56
570.99999999/1.0b9
58~~~~~~~~~~~~~~~~
59
60Released on July 14, 2016
61
62* **Added ordereddict as a mandatory dependency on Python 2.6.**
63
64* Added ``lxml``, ``genshi``, ``datrie``, ``charade``, and ``all``
65  extras that will do the right thing based on the specific
66  interpreter implementation.
67
68* Now requires the ``mock`` package for the testsuite.
69
70* Cease supporting DATrie under PyPy.
71
72* **Remove PullDOM support, as this hasn't ever been properly
73  tested, doesn't entirely work, and as far as I can tell is
74  completely unused by anyone.**
75
76* Move testsuite to ``py.test``.
77
78* **Fix #124: move to webencodings for decoding the input byte stream;
79  this makes html5lib compliant with the Encoding Standard, and
80  introduces a required dependency on webencodings.**
81
82* **Cease supporting Python 3.2 (in both CPython and PyPy forms).**
83
84* **Fix comments containing double-dash with lxml 3.5 and above.**
85
86* **Use scripting disabled by default (as we don't implement
87  scripting).**
88
89* **Fix #11, avoiding the XSS bug potentially caused by serializer
90  allowing attribute values to be escaped out of in old browser versions,
91  changing the quote_attr_values option on serializer to take one of
92  three values, "always" (the old True value), "legacy" (the new option,
93  and the new default), and "spec" (the old False value, and the old
94  default).**
95
96* **Fix #72 by rewriting the sanitizer to apply only to treewalkers
97  (instead of the tokenizer); as such, this will require amending all
98  callers of it to use it via the treewalker API.**
99
100* **Drop support of charade, now that chardet is supported once more.**
101
102* **Replace the charset keyword argument on parse and related methods
103  with a set of keyword arguments: override_encoding, transport_encoding,
104  same_origin_parent_encoding, likely_encoding, and default_encoding.**
105
106* **Move filters._base, treebuilder._base, and treewalkers._base to .base
107  to clarify their status as public.**
108
109* **Get rid of the sanitizer package. Merge sanitizer.sanitize into the
110  sanitizer.htmlsanitizer module and move that to sanitizer. This means
111  anyone who used sanitizer.sanitize or sanitizer.HTMLSanitizer needs no
112  code changes.**
113
114* **Rename treewalkers.lxmletree to .etree_lxml and
115  treewalkers.genshistream to .genshi to have a consistent API.**
116
117* Move a whole load of stuff (inputstream, ihatexml, trie, tokenizer,
118  utils) to be underscore prefixed to clarify their status as private.
119
120
1210.9999999/1.0b8
122~~~~~~~~~~~~~~~
123
124Released on September 10, 2015
125
126* Fix #195: fix the sanitizer to drop broken URLs (it threw an
127  exception between 0.9999 and 0.999999).
128
129
1300.999999/1.0b7
131~~~~~~~~~~~~~~
132
133Released on July 7, 2015
134
135* Fix #189: fix the sanitizer to allow relative URLs again (as it did
136  prior to 0.9999/1.0b5).
137
138
1390.99999/1.0b6
140~~~~~~~~~~~~~
141
142Released on April 30, 2015
143
144* Fix #188: fix the sanitizer to not throw an exception when sanitizing
145  bogus data URLs.
146
147
1480.9999/1.0b5
149~~~~~~~~~~~~
150
151Released on April 29, 2015
152
153* Fix #153: Sanitizer fails to treat some attributes as URLs. Despite how
154  this sounds, this has no known security implications.  No known version
155  of IE (5.5 to current), Firefox (3 to current), Safari (6 to current),
156  Chrome (1 to current), or Opera (12 to current) will run any script
157  provided in these attributes.
158
159* Pass error message to the ParseError exception in strict parsing mode.
160
161* Allow data URIs in the sanitizer, with a whitelist of content-types.
162
163* Add support for Python implementations that don't support lone
164  surrogates (read: Jython). Fixes #2.
165
166* Remove localization of error messages. This functionality was totally
167  unused (and untested that everything was localizable), so we may as
168  well follow numerous browsers in not supporting translating technical
169  strings.
170
171* Expose treewalkers.pprint as a public API.
172
173* Add a documentEncoding property to HTML5Parser, fix #121.
174
175
1760.999
177~~~~~
178
179Released on December 23, 2013
180
181* Fix #127: add work-around for CPython issue #20007: .read(0) on
182  http.client.HTTPResponse drops the rest of the content.
183
184* Fix #115: lxml treewalker can now deal with fragments containing, at
185  their root level, text nodes with non-ASCII characters on Python 2.
186
187
1880.99
189~~~~
190
191Released on September 10, 2013
192
193* No library changes from 1.0b3; released as 0.99 as pip has changed
194  behaviour from 1.4 to avoid installing pre-release versions per
195  PEP 440.
196
197
1981.0b3
199~~~~~
200
201Released on July 24, 2013
202
203* Removed ``RecursiveTreeWalker`` from ``treewalkers._base``. Any
204  implementation using it should be moved to
205  ``NonRecursiveTreeWalker``, as everything bundled with html5lib has
206  for years.
207
208* Fix #67 so that ``BufferedStream`` to correctly returns a bytes
209  object, thereby fixing any case where html5lib is passed a
210  non-seekable RawIOBase-like object.
211
212
2131.0b2
214~~~~~
215
216Released on June 27, 2013
217
218* Removed reordering of attributes within the serializer. There is now
219  an ``alphabetical_attributes`` option which preserves the previous
220  behaviour through a new filter. This allows attribute order to be
221  preserved through html5lib if the tree builder preserves order.
222
223* Removed ``dom2sax`` from DOM treebuilders. It has been replaced by
224  ``treeadapters.sax.to_sax`` which is generic and supports any
225  treewalker; it also resolves all known bugs with ``dom2sax``.
226
227* Fix treewalker assertions on hitting bytes strings on
228  Python 2. Previous to 1.0b1, treewalkers coped with mixed
229  bytes/unicode data on Python 2; this reintroduces this prior
230  behaviour on Python 2. Behaviour is unchanged on Python 3.
231
232
2331.0b1
234~~~~~
235
236Released on May 17, 2013
237
238* Implementation updated to implement the `HTML specification
239  <http://www.whatwg.org/specs/web-apps/current-work/>`_ as of 5th May
240  2013 (`SVN <http://svn.whatwg.org/webapps/>`_ revision r7867).
241
242* Python 3.2+ supported in a single codebase using the ``six`` library.
243
244* Removed support for Python 2.5 and older.
245
246* Removed the deprecated Beautiful Soup 3 treebuilder.
247  ``beautifulsoup4`` can use ``html5lib`` as a parser instead. Note that
248  since it doesn't support namespaces, foreign content like SVG and
249  MathML is parsed incorrectly.
250
251* Removed ``simpletree`` from the package. The default tree builder is
252  now ``etree`` (using the ``xml.etree.cElementTree`` implementation if
253  available, and ``xml.etree.ElementTree`` otherwise).
254
255* Removed the ``XHTMLSerializer`` as it never actually guaranteed its
256  output was well-formed XML, and hence provided little of use.
257
258* Removed default DOM treebuilder, so ``html5lib.treebuilders.dom`` is no
259  longer supported. ``html5lib.treebuilders.getTreeBuilder("dom")`` will
260  return the default DOM treebuilder, which uses ``xml.dom.minidom``.
261
262* Optional heuristic character encoding detection now based on
263  ``charade`` for Python 2.6 - 3.3 compatibility.
264
265* Optional ``Genshi`` treewalker support fixed.
266
267* Many bugfixes, including:
268
269  * #33: null in attribute value breaks XML AttValue;
270
271  * #4: nested, indirect descendant, <button> causes infinite loop;
272
273  * `Google Code 215
274    <http://code.google.com/p/html5lib/issues/detail?id=215>`_: Properly
275    detect seekable streams;
276
277  * `Google Code 206
278    <http://code.google.com/p/html5lib/issues/detail?id=206>`_: add
279    support for <video preload=...>, <audio preload=...>;
280
281  * `Google Code 205
282    <http://code.google.com/p/html5lib/issues/detail?id=205>`_: add
283    support for <video poster=...>;
284
285  * `Google Code 202
286    <http://code.google.com/p/html5lib/issues/detail?id=202>`_: Unicode
287    file breaks InputStream.
288
289* Source code is now mostly PEP 8 compliant.
290
291* Test harness has been improved and now depends on ``nose``.
292
293* Documentation updated and moved to https://html5lib.readthedocs.io/.
294
295
2960.95
297~~~~
298
299Released on February 11, 2012
300
301
3020.90
303~~~~
304
305Released on January 17, 2010
306
307
3080.11.1
309~~~~~~
310
311Released on June 12, 2008
312
313
3140.11
315~~~~
316
317Released on June 10, 2008
318
319
3200.10
321~~~~
322
323Released on October 7, 2007
324
325
3260.9
327~~~
328
329Released on March 11, 2007
330
331
3320.2
333~~~
334
335Released on January 8, 2007
336