1Change Log 2---------- 3 41.0.1 5~~~~~ 6 7Released on December 7, 2017 8 9Breaking changes: 10 11* Drop support for Python 2.6. (#330) (Thank you, Hugo, Will Kahn-Greene!) 12* Remove ``utils/spider.py`` (#353) (Thank you, Jon Dufresne!) 13 14Features: 15 16* Improve documentation. (#300, #307) (Thank you, Jon Dufresne, Tom Most, 17 Will Kahn-Greene!) 18* Add iframe seamless boolean attribute. (Thank you, Ritwik Gupta!) 19* Add itemscope as a boolean attribute. (#194) (Thank you, Jonathan Vanasco!) 20* Support Python 3.6. (#333) (Thank you, Jon Dufresne!) 21* Add CI support for Windows using AppVeyor. (Thank you, John Vandenberg!) 22* Improve testing and CI and add code coverage (#323, #334), (Thank you, Jon 23 Dufresne, John Vandenberg, Geoffrey Sneddon, Will Kahn-Greene!) 24* Semver-compliant version number. 25 26Bug fixes: 27 28* Add support for setuptools < 18.5 to support environment markers. (Thank you, 29 John Vandenberg!) 30* Add explicit dependency for six >= 1.9. (Thank you, Eric Amorde!) 31* Fix regexes to work with Python 3.7 regex adjustments. (#318, #379) (Thank 32 you, Benedikt Morbach, Ville Skyttä, Mark Vasilkov!) 33* Fix alphabeticalattributes filter namespace bug. (#324) (Thank you, Will 34 Kahn-Greene!) 35* Include license file in generated wheel package. (#350) (Thank you, Jon 36 Dufresne!) 37* Fix annotation-xml typo. (#339) (Thank you, Will Kahn-Greene!) 38* Allow uppercase hex chararcters in CSS colour check. (#377) (Thank you, 39 Komal Dembla, Hugo!) 40 41 421.0 43~~~ 44 45Released and unreleased on December 7, 2017. Badly packaged release. 46 47 480.999999999/1.0b10 49~~~~~~~~~~~~~~~~~~ 50 51Released on July 15, 2016 52 53* Fix attribute order going to the tree builder to be document order 54 instead of reverse document order(!). 55 56 570.99999999/1.0b9 58~~~~~~~~~~~~~~~~ 59 60Released on July 14, 2016 61 62* **Added ordereddict as a mandatory dependency on Python 2.6.** 63 64* Added ``lxml``, ``genshi``, ``datrie``, ``charade``, and ``all`` 65 extras that will do the right thing based on the specific 66 interpreter implementation. 67 68* Now requires the ``mock`` package for the testsuite. 69 70* Cease supporting DATrie under PyPy. 71 72* **Remove PullDOM support, as this hasn't ever been properly 73 tested, doesn't entirely work, and as far as I can tell is 74 completely unused by anyone.** 75 76* Move testsuite to ``py.test``. 77 78* **Fix #124: move to webencodings for decoding the input byte stream; 79 this makes html5lib compliant with the Encoding Standard, and 80 introduces a required dependency on webencodings.** 81 82* **Cease supporting Python 3.2 (in both CPython and PyPy forms).** 83 84* **Fix comments containing double-dash with lxml 3.5 and above.** 85 86* **Use scripting disabled by default (as we don't implement 87 scripting).** 88 89* **Fix #11, avoiding the XSS bug potentially caused by serializer 90 allowing attribute values to be escaped out of in old browser versions, 91 changing the quote_attr_values option on serializer to take one of 92 three values, "always" (the old True value), "legacy" (the new option, 93 and the new default), and "spec" (the old False value, and the old 94 default).** 95 96* **Fix #72 by rewriting the sanitizer to apply only to treewalkers 97 (instead of the tokenizer); as such, this will require amending all 98 callers of it to use it via the treewalker API.** 99 100* **Drop support of charade, now that chardet is supported once more.** 101 102* **Replace the charset keyword argument on parse and related methods 103 with a set of keyword arguments: override_encoding, transport_encoding, 104 same_origin_parent_encoding, likely_encoding, and default_encoding.** 105 106* **Move filters._base, treebuilder._base, and treewalkers._base to .base 107 to clarify their status as public.** 108 109* **Get rid of the sanitizer package. Merge sanitizer.sanitize into the 110 sanitizer.htmlsanitizer module and move that to sanitizer. This means 111 anyone who used sanitizer.sanitize or sanitizer.HTMLSanitizer needs no 112 code changes.** 113 114* **Rename treewalkers.lxmletree to .etree_lxml and 115 treewalkers.genshistream to .genshi to have a consistent API.** 116 117* Move a whole load of stuff (inputstream, ihatexml, trie, tokenizer, 118 utils) to be underscore prefixed to clarify their status as private. 119 120 1210.9999999/1.0b8 122~~~~~~~~~~~~~~~ 123 124Released on September 10, 2015 125 126* Fix #195: fix the sanitizer to drop broken URLs (it threw an 127 exception between 0.9999 and 0.999999). 128 129 1300.999999/1.0b7 131~~~~~~~~~~~~~~ 132 133Released on July 7, 2015 134 135* Fix #189: fix the sanitizer to allow relative URLs again (as it did 136 prior to 0.9999/1.0b5). 137 138 1390.99999/1.0b6 140~~~~~~~~~~~~~ 141 142Released on April 30, 2015 143 144* Fix #188: fix the sanitizer to not throw an exception when sanitizing 145 bogus data URLs. 146 147 1480.9999/1.0b5 149~~~~~~~~~~~~ 150 151Released on April 29, 2015 152 153* Fix #153: Sanitizer fails to treat some attributes as URLs. Despite how 154 this sounds, this has no known security implications. No known version 155 of IE (5.5 to current), Firefox (3 to current), Safari (6 to current), 156 Chrome (1 to current), or Opera (12 to current) will run any script 157 provided in these attributes. 158 159* Pass error message to the ParseError exception in strict parsing mode. 160 161* Allow data URIs in the sanitizer, with a whitelist of content-types. 162 163* Add support for Python implementations that don't support lone 164 surrogates (read: Jython). Fixes #2. 165 166* Remove localization of error messages. This functionality was totally 167 unused (and untested that everything was localizable), so we may as 168 well follow numerous browsers in not supporting translating technical 169 strings. 170 171* Expose treewalkers.pprint as a public API. 172 173* Add a documentEncoding property to HTML5Parser, fix #121. 174 175 1760.999 177~~~~~ 178 179Released on December 23, 2013 180 181* Fix #127: add work-around for CPython issue #20007: .read(0) on 182 http.client.HTTPResponse drops the rest of the content. 183 184* Fix #115: lxml treewalker can now deal with fragments containing, at 185 their root level, text nodes with non-ASCII characters on Python 2. 186 187 1880.99 189~~~~ 190 191Released on September 10, 2013 192 193* No library changes from 1.0b3; released as 0.99 as pip has changed 194 behaviour from 1.4 to avoid installing pre-release versions per 195 PEP 440. 196 197 1981.0b3 199~~~~~ 200 201Released on July 24, 2013 202 203* Removed ``RecursiveTreeWalker`` from ``treewalkers._base``. Any 204 implementation using it should be moved to 205 ``NonRecursiveTreeWalker``, as everything bundled with html5lib has 206 for years. 207 208* Fix #67 so that ``BufferedStream`` to correctly returns a bytes 209 object, thereby fixing any case where html5lib is passed a 210 non-seekable RawIOBase-like object. 211 212 2131.0b2 214~~~~~ 215 216Released on June 27, 2013 217 218* Removed reordering of attributes within the serializer. There is now 219 an ``alphabetical_attributes`` option which preserves the previous 220 behaviour through a new filter. This allows attribute order to be 221 preserved through html5lib if the tree builder preserves order. 222 223* Removed ``dom2sax`` from DOM treebuilders. It has been replaced by 224 ``treeadapters.sax.to_sax`` which is generic and supports any 225 treewalker; it also resolves all known bugs with ``dom2sax``. 226 227* Fix treewalker assertions on hitting bytes strings on 228 Python 2. Previous to 1.0b1, treewalkers coped with mixed 229 bytes/unicode data on Python 2; this reintroduces this prior 230 behaviour on Python 2. Behaviour is unchanged on Python 3. 231 232 2331.0b1 234~~~~~ 235 236Released on May 17, 2013 237 238* Implementation updated to implement the `HTML specification 239 <http://www.whatwg.org/specs/web-apps/current-work/>`_ as of 5th May 240 2013 (`SVN <http://svn.whatwg.org/webapps/>`_ revision r7867). 241 242* Python 3.2+ supported in a single codebase using the ``six`` library. 243 244* Removed support for Python 2.5 and older. 245 246* Removed the deprecated Beautiful Soup 3 treebuilder. 247 ``beautifulsoup4`` can use ``html5lib`` as a parser instead. Note that 248 since it doesn't support namespaces, foreign content like SVG and 249 MathML is parsed incorrectly. 250 251* Removed ``simpletree`` from the package. The default tree builder is 252 now ``etree`` (using the ``xml.etree.cElementTree`` implementation if 253 available, and ``xml.etree.ElementTree`` otherwise). 254 255* Removed the ``XHTMLSerializer`` as it never actually guaranteed its 256 output was well-formed XML, and hence provided little of use. 257 258* Removed default DOM treebuilder, so ``html5lib.treebuilders.dom`` is no 259 longer supported. ``html5lib.treebuilders.getTreeBuilder("dom")`` will 260 return the default DOM treebuilder, which uses ``xml.dom.minidom``. 261 262* Optional heuristic character encoding detection now based on 263 ``charade`` for Python 2.6 - 3.3 compatibility. 264 265* Optional ``Genshi`` treewalker support fixed. 266 267* Many bugfixes, including: 268 269 * #33: null in attribute value breaks XML AttValue; 270 271 * #4: nested, indirect descendant, <button> causes infinite loop; 272 273 * `Google Code 215 274 <http://code.google.com/p/html5lib/issues/detail?id=215>`_: Properly 275 detect seekable streams; 276 277 * `Google Code 206 278 <http://code.google.com/p/html5lib/issues/detail?id=206>`_: add 279 support for <video preload=...>, <audio preload=...>; 280 281 * `Google Code 205 282 <http://code.google.com/p/html5lib/issues/detail?id=205>`_: add 283 support for <video poster=...>; 284 285 * `Google Code 202 286 <http://code.google.com/p/html5lib/issues/detail?id=202>`_: Unicode 287 file breaks InputStream. 288 289* Source code is now mostly PEP 8 compliant. 290 291* Test harness has been improved and now depends on ``nose``. 292 293* Documentation updated and moved to https://html5lib.readthedocs.io/. 294 295 2960.95 297~~~~ 298 299Released on February 11, 2012 300 301 3020.90 303~~~~ 304 305Released on January 17, 2010 306 307 3080.11.1 309~~~~~~ 310 311Released on June 12, 2008 312 313 3140.11 315~~~~ 316 317Released on June 10, 2008 318 319 3200.10 321~~~~ 322 323Released on October 7, 2007 324 325 3260.9 327~~~ 328 329Released on March 11, 2007 330 331 3320.2 333~~~ 334 335Released on January 8, 2007 336