1.. _grab_proxy: 2 3Proxy Server Support 4==================== 5 6Basic Usage 7----------- 8 9To make Grab send requests through a proxy server, use the :ref:`option_proxy` option:: 10 11 g.setup(proxy='example.com:8080') 12 13If the proxy server requires authentication, use the :ref:`option_proxy_userpwd` option 14to specify the username and password:: 15 16 g.setup(proxy='example.com:8080', proxy_userpwd='root:777') 17 18You can also specify the type of proxy server: "http", "socks4" or "socks5". By default, 19Grab assumes that proxy is of type "http":: 20 21 g.setup(proxy='example.com:8080', proxy_userpwd='root:777', proxy_type='socks5') 22 23You can always see which proxy is used at the moment in `g.config['proxy']`:: 24 25 >>> g = Grab() 26 >>> g.setup(proxy='example.com:8080') 27 >>> g.config['proxy'] 28 'example.com:8080' 29 30Proxy List Support 31------------------ 32 33Grab supports working with a list of multiple proxies. Use the `g.proxylist` 34attribute to get access to the proxy manager. By default, the proxy manager is created and initialized with an empty proxy list:: 35 36 >>> g = Grab() 37 >>> g.proxylist 38 <grab.proxy.ProxyList object at 0x2e15b10> 39 >>> g.proxylist.proxy_list 40 [] 41 42 43Proxy List Source 44----------------- 45 46You need to setup the proxy list manager with details of the source that 47manager will load proxies from. Using the `g.proxylist.set_source` method, the first 48positional argument defines the type of source. Currently, two types are supported: 49"file" and "remote". 50 51Example of loading proxies from local file:: 52 53 >>> g = Grab() 54 >>> g.proxylist.set_source('file', location='/web/proxy.txt') 55 <grab.proxy.ProxyList object at 0x2e15b10> 56 >>> g.proxylist.proxy_list 57 >>> g.proxylist.set_source('file', location='/web/proxy.txt') 58 >>> g.proxylist.get_next() 59 >>> g.proxylist.get_next_proxy() 60 <grab.proxy.Proxy object at 0x2d7c610> 61 >>> g.proxylist.get_next_proxy().server 62 'example.com' 63 >>> g.proxylist.get_next_proxy().address 64 'example.com:8080' 65 >>> len(g.proxylist.proxy_list) 66 1000 67 68 69And here is how to load proxies from the web:: 70 71 >>> g = Grab() 72 >>> g.proxylist.set_source('remote', url='http://example.com/proxy.txt') 73 74 75Automatic Proxy Rotation 76------------------------ 77 78By default, if you set up any non-empty proxy source, Grab starts rotating through proxies from the proxy list for each request. 79You can disable proxy rotation with :ref:`option_proxy_auto_change` option set to False:: 80 81 >>> from grab import Grab 82 >>> import logging 83 >>> logging.basicConfig(level=logging.DEBUG) 84 >>> g = Grab() 85 >>> g.proxylist.set_source('file', location='/web/proxy.txt') 86 >>> g.go('http://yandex.ru/') 87 DEBUG:grab.network:[02] GET http://yandex.ru/ via 91.210.101.31:8080 proxy of type http with authorization 88 <grab.response.Response object at 0x109d9f0> 89 >>> g.go('http://rambler.ru/') 90 DEBUG:grab.network:[03] GET http://rambler.ru/ via 194.29.185.38:8080 proxy of type http with authorization 91 <grab.response.Response object at 0x109d9f0> 92 93Now let's see how Grab works when `proxy_auto_change` is False:: 94 95 >>> from grab import Grab 96 >>> import logging 97 >>> g = Grab() 98 >>> g.proxylist.set_source('file', location='/web/proxy.txt') 99 >>> g.setup(proxy_auto_change=False) 100 >>> g.go('http://ya.ru') 101 DEBUG:grab.network:[04] GET http://ya.ru 102 <grab.response.Response object at 0x109de50> 103 >>> g.change_proxy() 104 >>> g.go('http://ya.ru') 105 DEBUG:grab.network:[05] GET http://ya.ru via 62.122.73.30:8080 proxy of type http with authorization 106 <grab.response.Response object at 0x109d9f0> 107 >>> g.go('http://ya.ru') 108 DEBUG:grab.network:[06] GET http://ya.ru via 62.122.73.30:8080 proxy of type http with authorization 109 <grab.response.Response object at 0x109d9f0> 110 111 112Getting Proxy From Proxy List 113----------------------------- 114 115Each time you call `g.proxylist.get_next_proxy`, you get the next proxy from the proxy list. 116When you receive the last proxy in the list, you'll continue receiving proxies from the beginning of the list. 117You can also use `g.proxylist.get_random_proxy` to pick a random proxy from the proxy list. 118 119Automatic Proxy List Reloading 120------------------------------ 121 122Grab automatically rereads the proxy source each `g.proxylist.reload_time` 123seconds. You can set the value of this option as follows:: 124 125 >>> g = Grab() 126 >>> g.proxylist.setup(reload_time=3600) # reload proxy list one time per hour 127 128 129Proxy Accumulating 130------------------ 131 132Be default, Grab overwrites the proxy list each time it reloads the proxy source. You can change that behaviour:: 133 134 >>> g.proxylist.setup(accumulate_updates=True) 135 136That will setup Grab to append new proxies to existing ones. 137