1Metadata-Version: 2.1 2Name: Protego 3Version: 0.1.16 4Summary: Pure-Python robots.txt parser with support for modern conventions 5Home-page: UNKNOWN 6Author: Anubhav Patel 7Author-email: anubhavp28@gmail.com 8License: BSD 9Description: # Protego 10 11 ![build-badge](https://api.travis-ci.com/scrapy/protego.svg?branch=master) 12 [![made-with-python](https://img.shields.io/badge/Made%20with-Python-1f425f.svg)](https://www.python.org/) 13 ## Overview 14 Protego is a pure-Python `robots.txt` parser with support for modern conventions. 15 16 ## Requirements 17 * Python 2.7 or Python 3.5+ 18 * Works on Linux, Windows, Mac OSX, BSD 19 20 ## Install 21 22 To install Protego, simply use pip: 23 24 ``` 25 pip install protego 26 ``` 27 28 ## Usage 29 30 ```python 31 >>> from protego import Protego 32 >>> robotstxt = """ 33 ... User-agent: * 34 ... Disallow: / 35 ... Allow: /about 36 ... Allow: /account 37 ... Disallow: /account/contact$ 38 ... Disallow: /account/*/profile 39 ... Crawl-delay: 4 40 ... Request-rate: 10/1m # 10 requests every 1 minute 41 ... 42 ... Sitemap: http://example.com/sitemap-index.xml 43 ... Host: http://example.co.in 44 ... """ 45 >>> rp = Protego.parse(robotstxt) 46 >>> rp.can_fetch("http://example.com/profiles", "mybot") 47 False 48 >>> rp.can_fetch("http://example.com/about", "mybot") 49 True 50 >>> rp.can_fetch("http://example.com/account", "mybot") 51 True 52 >>> rp.can_fetch("http://example.com/account/myuser/profile", "mybot") 53 False 54 >>> rp.can_fetch("http://example.com/account/contact", "mybot") 55 False 56 >>> rp.crawl_delay("mybot") 57 4.0 58 >>> rp.request_rate("mybot") 59 RequestRate(requests=10, seconds=60, start_time=None, end_time=None) 60 >>> list(rp.sitemaps) 61 ['http://example.com/sitemap-index.xml'] 62 >>> rp.preferred_host 63 'http://example.co.in' 64 ``` 65 66 Using Protego with [Requests](https://3.python-requests.org/) 67 68 ```python 69 >>> from protego import Protego 70 >>> import requests 71 >>> r = requests.get("https://google.com/robots.txt") 72 >>> rp = Protego.parse(r.text) 73 >>> rp.can_fetch("https://google.com/search", "mybot") 74 False 75 >>> rp.can_fetch("https://google.com/search/about", "mybot") 76 True 77 >>> list(rp.sitemaps) 78 ['https://www.google.com/sitemap.xml'] 79 ``` 80 81 ## Documentation 82 83 Class `protego.Protego`: 84 85 ### Properties 86 87 * `sitemaps` {`list_iterator`} A list of sitemaps specified in `robots.txt`. 88 * `preferred_host` {string} Preferred host specified in `robots.txt`. 89 90 ### Methods 91 92 * `parse(robotstxt_body)` Parse `robots.txt` and return a new instance of `protego.Protego`. 93 * `can_fetch(url, user_agent)` Return True if the user agent can fetch the URL, otherwise return False. 94 * `crawl_delay(user_agent)` Return the crawl delay specified for the user agent as a float. If nothing is specified, return None. 95 * `request_rate(user_agent)` Return the request rate specified for the user agent as a named tuple `RequestRate(requests, seconds, start_time, end_time)`. If nothing is specified, return None. 96 97Keywords: robots.txt,parser,robots,rep 98Platform: UNKNOWN 99Classifier: Development Status :: 4 - Beta 100Classifier: Intended Audience :: Developers 101Classifier: License :: OSI Approved :: BSD License 102Classifier: Operating System :: OS Independent 103Classifier: Programming Language :: Python 104Classifier: Programming Language :: Python :: 2 105Classifier: Programming Language :: Python :: 2.7 106Classifier: Programming Language :: Python :: 3 107Classifier: Programming Language :: Python :: 3.5 108Classifier: Programming Language :: Python :: 3.6 109Classifier: Programming Language :: Python :: 3.7 110Classifier: Programming Language :: Python :: Implementation :: CPython 111Classifier: Programming Language :: Python :: Implementation :: PyPy 112Requires-Python: >=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.* 113Description-Content-Type: text/markdown 114