1Metadata-Version: 1.1 2Name: parsel 3Version: 1.5.1 4Summary: Parsel is a library to extract data from HTML and XML using XPath and CSS selectors 5Home-page: https://github.com/scrapy/parsel 6Author: Scrapy project 7Author-email: info@scrapy.org 8License: BSD 9Description: =============================== 10 Parsel 11 =============================== 12 13 .. image:: https://img.shields.io/travis/scrapy/parsel/master.svg 14 :target: https://travis-ci.org/scrapy/parsel 15 :alt: Build Status 16 17 .. image:: https://img.shields.io/pypi/v/parsel.svg 18 :target: https://pypi.python.org/pypi/parsel 19 :alt: PyPI Version 20 21 .. image:: https://img.shields.io/codecov/c/github/scrapy/parsel/master.svg 22 :target: http://codecov.io/github/scrapy/parsel?branch=master 23 :alt: Coverage report 24 25 26 Parsel is a library to extract data from HTML and XML using XPath and CSS selectors 27 28 * Free software: BSD license 29 * Documentation: https://parsel.readthedocs.org. 30 31 Features 32 -------- 33 34 * Extract text using CSS or XPath selectors 35 * Regular expression helper methods 36 37 Example:: 38 39 >>> from parsel import Selector 40 >>> sel = Selector(text=u"""<html> 41 <body> 42 <h1>Hello, Parsel!</h1> 43 <ul> 44 <li><a href="http://example.com">Link 1</a></li> 45 <li><a href="http://scrapy.org">Link 2</a></li> 46 </ul 47 </body> 48 </html>""") 49 >>> 50 >>> sel.css('h1::text').get() 51 'Hello, Parsel!' 52 >>> 53 >>> sel.css('h1::text').re('\w+') 54 ['Hello', 'Parsel'] 55 >>> 56 >>> for e in sel.css('ul > li'): 57 ... print(e.xpath('.//a/@href').get()) 58 http://example.com 59 http://scrapy.org 60 61 62 63 64 History 65 ------- 66 67 1.5.1 (2018-10-25) 68 ~~~~~~~~~~~~~~~~~~ 69 70 * ``has-class`` XPath function handles newlines and other separators 71 in class names properly; 72 * fixed parsing of HTML documents with null bytes; 73 * documentation improvements; 74 * Python 3.7 tests are run on CI; other test improvements. 75 76 1.5.0 (2018-07-04) 77 ~~~~~~~~~~~~~~~~~~ 78 79 * New ``Selector.attrib`` and ``SelectorList.attrib`` properties which make 80 it easier to get attributes of HTML elements. 81 * CSS selectors became faster: compilation results are cached 82 (LRU cache is used for ``css2xpath``), so there is 83 less overhead when the same CSS expression is used several times. 84 * ``.get()`` and ``.getall()`` selector methods are documented and recommended 85 over ``.extract_first()`` and ``.extract()``. 86 * Various documentation tweaks and improvements. 87 88 One more change is that ``.extract()`` and ``.extract_first()`` methods 89 are now implemented using ``.get()`` and ``.getall()``, not the other 90 way around, and instead of calling ``Selector.extract`` all other methods 91 now call ``Selector.get`` internally. It can be **backwards incompatible** 92 in case of custom Selector subclasses which override ``Selector.extract`` 93 without doing the same for ``Selector.get``. If you have such Selector 94 subclass, make sure ``get`` method is also overridden. For example, this:: 95 96 class MySelector(parsel.Selector): 97 def extract(self): 98 return super().extract() + " foo" 99 100 should be changed to this:: 101 102 class MySelector(parsel.Selector): 103 def get(self): 104 return super().get() + " foo" 105 extract = get 106 107 108 1.4.0 (2018-02-08) 109 ~~~~~~~~~~~~~~~~~~ 110 111 * ``Selector`` and ``SelectorList`` can't be pickled because 112 pickling/unpickling doesn't work for ``lxml.html.HtmlElement``; 113 parsel now raises TypeError explicitly instead of allowing pickle to 114 silently produce wrong output. This is technically backwards-incompatible 115 if you're using Python < 3.6. 116 117 118 1.3.1 (2017-12-28) 119 ~~~~~~~~~~~~~~~~~~ 120 121 * Fix artifact uploads to pypi. 122 123 124 1.3.0 (2017-12-28) 125 ~~~~~~~~~~~~~~~~~~ 126 127 * ``has-class`` XPath extension function; 128 * ``parsel.xpathfuncs.set_xpathfunc`` is a simplified way to register 129 XPath extensions; 130 * ``Selector.remove_namespaces`` now removes namespace declarations; 131 * Python 3.3 support is dropped; 132 * ``make htmlview`` command for easier Parsel docs development. 133 * CI: PyPy installation is fixed; parsel now runs tests for PyPy3 as well. 134 135 136 1.2.0 (2017-05-17) 137 ~~~~~~~~~~~~~~~~~~ 138 139 * Add ``SelectorList.get`` and ``SelectorList.getall`` 140 methods as aliases for ``SelectorList.extract_first`` 141 and ``SelectorList.extract`` respectively 142 * Add default value parameter to ``SelectorList.re_first`` method 143 * Add ``Selector.re_first`` method 144 * Add ``replace_entities`` argument on ``.re()`` and ``.re_first()`` 145 to turn off replacing of character entity references 146 * Bug fix: detect ``None`` result from lxml parsing and fallback with an empty document 147 * Rearrange XML/HTML examples in the selectors usage docs 148 * Travis CI: 149 150 * Test against Python 3.6 151 * Test against PyPy using "Portable PyPy for Linux" distribution 152 153 154 1.1.0 (2016-11-22) 155 ~~~~~~~~~~~~~~~~~~ 156 157 * Change default HTML parser to `lxml.html.HTMLParser <http://lxml.de/api/lxml.html.HTMLParser-class.html>`_, 158 which makes easier to use some HTML specific features 159 * Add css2xpath function to translate CSS to XPath 160 * Add support for ad-hoc namespaces declarations 161 * Add support for XPath variables 162 * Documentation improvements and updates 163 164 165 1.0.3 (2016-07-29) 166 ~~~~~~~~~~~~~~~~~~ 167 168 * Add BSD-3-Clause license file 169 * Re-enable PyPy tests 170 * Integrate py.test runs with setuptools (needed for Debian packaging) 171 * Changelog is now called ``NEWS`` 172 173 174 1.0.2 (2016-04-26) 175 ~~~~~~~~~~~~~~~~~~ 176 177 * Fix bug in exception handling causing original traceback to be lost 178 * Added docstrings and other doc fixes 179 180 181 1.0.1 (2015-08-24) 182 ~~~~~~~~~~~~~~~~~~ 183 184 * Updated PyPI classifiers 185 * Added docstrings for csstranslator module and other doc fixes 186 187 188 1.0.0 (2015-08-22) 189 ~~~~~~~~~~~~~~~~~~ 190 191 * Documentation fixes 192 193 194 0.9.6 (2015-08-14) 195 ~~~~~~~~~~~~~~~~~~ 196 197 * Updated documentation 198 * Extended test coverage 199 200 201 0.9.5 (2015-08-11) 202 ~~~~~~~~~~~~~~~~~~ 203 204 * Support for extending SelectorList 205 206 207 0.9.4 (2015-08-10) 208 ~~~~~~~~~~~~~~~~~~ 209 210 * Try workaround for travis-ci/dpl#253 211 212 213 0.9.3 (2015-08-07) 214 ~~~~~~~~~~~~~~~~~~ 215 216 * Add base_url argument 217 218 219 0.9.2 (2015-08-07) 220 ~~~~~~~~~~~~~~~~~~ 221 222 * Rename module unified -> selector and promoted root attribute 223 * Add create_root_node function 224 225 226 0.9.1 (2015-08-04) 227 ~~~~~~~~~~~~~~~~~~ 228 229 * Setup Sphinx build and docs structure 230 * Build universal wheels 231 * Rename some leftovers from package extraction 232 233 234 0.9.0 (2015-07-30) 235 ~~~~~~~~~~~~~~~~~~ 236 237 * First release on PyPI. 238 239Keywords: parsel 240Platform: UNKNOWN 241Classifier: Development Status :: 5 - Production/Stable 242Classifier: Intended Audience :: Developers 243Classifier: License :: OSI Approved :: BSD License 244Classifier: Natural Language :: English 245Classifier: Topic :: Text Processing :: Markup 246Classifier: Topic :: Text Processing :: Markup :: HTML 247Classifier: Topic :: Text Processing :: Markup :: XML 248Classifier: Programming Language :: Python :: 2 249Classifier: Programming Language :: Python :: 2.7 250Classifier: Programming Language :: Python :: 3 251Classifier: Programming Language :: Python :: 3.4 252Classifier: Programming Language :: Python :: 3.5 253Classifier: Programming Language :: Python :: 3.6 254Classifier: Programming Language :: Python :: 3.7 255Classifier: Programming Language :: Python :: Implementation :: CPython 256Classifier: Programming Language :: Python :: Implementation :: PyPy 257