1.. _grab_proxy:
2
3Proxy Server Support
4====================
5
6Basic Usage
7-----------
8
9To make Grab send requests through a proxy server, use the :ref:`option_proxy` option::
10
11    g.setup(proxy='example.com:8080')
12
13If the proxy server requires authentication, use the :ref:`option_proxy_userpwd` option
14to specify the username and password::
15
16    g.setup(proxy='example.com:8080', proxy_userpwd='root:777')
17
18You can also specify the type of proxy server: "http", "socks4" or "socks5". By default,
19Grab assumes that proxy is of type "http"::
20
21    g.setup(proxy='example.com:8080', proxy_userpwd='root:777', proxy_type='socks5')
22
23You can always see which proxy is used at the moment in `g.config['proxy']`::
24
25    >>> g = Grab()
26    >>> g.setup(proxy='example.com:8080')
27    >>> g.config['proxy']
28    'example.com:8080'
29
30Proxy List Support
31------------------
32
33Grab supports working with a list of multiple proxies. Use the `g.proxylist`
34attribute to get access to the proxy manager. By default, the proxy manager is created and initialized with an empty proxy list::
35
36    >>> g = Grab()
37    >>> g.proxylist
38    <grab.proxy.ProxyList object at 0x2e15b10>
39    >>> g.proxylist.proxy_list
40    []
41
42
43Proxy List Source
44-----------------
45
46You need to setup the proxy list manager with details of the source that
47manager will load proxies from. Using the `g.proxylist.set_source` method, the first
48positional argument defines the type of source. Currently, two types are supported:
49"file" and "remote".
50
51Example of loading proxies from local file::
52
53    >>> g = Grab()
54    >>> g.proxylist.set_source('file', location='/web/proxy.txt')
55    <grab.proxy.ProxyList object at 0x2e15b10>
56    >>> g.proxylist.proxy_list
57    >>> g.proxylist.set_source('file', location='/web/proxy.txt')
58    >>> g.proxylist.get_next()
59    >>> g.proxylist.get_next_proxy()
60    <grab.proxy.Proxy object at 0x2d7c610>
61    >>> g.proxylist.get_next_proxy().server
62    'example.com'
63    >>> g.proxylist.get_next_proxy().address
64    'example.com:8080'
65    >>> len(g.proxylist.proxy_list)
66    1000
67
68
69And here is how to load proxies from the web::
70
71    >>> g = Grab()
72    >>> g.proxylist.set_source('remote', url='http://example.com/proxy.txt')
73
74
75Automatic Proxy Rotation
76------------------------
77
78By default, if you set up any non-empty proxy source, Grab starts rotating through proxies from the proxy list for each request.
79You can disable proxy rotation with :ref:`option_proxy_auto_change` option set to False::
80
81    >>> from grab import Grab
82    >>> import logging
83    >>> logging.basicConfig(level=logging.DEBUG)
84    >>> g = Grab()
85    >>> g.proxylist.set_source('file', location='/web/proxy.txt')
86    >>> g.go('http://yandex.ru/')
87    DEBUG:grab.network:[02] GET http://yandex.ru/ via 91.210.101.31:8080 proxy of type http with authorization
88    <grab.response.Response object at 0x109d9f0>
89    >>> g.go('http://rambler.ru/')
90    DEBUG:grab.network:[03] GET http://rambler.ru/ via 194.29.185.38:8080 proxy of type http with authorization
91    <grab.response.Response object at 0x109d9f0>
92
93Now let's see how Grab works when `proxy_auto_change` is False::
94
95    >>> from grab import Grab
96    >>> import logging
97    >>> g = Grab()
98    >>> g.proxylist.set_source('file', location='/web/proxy.txt')
99    >>> g.setup(proxy_auto_change=False)
100    >>> g.go('http://ya.ru')
101    DEBUG:grab.network:[04] GET http://ya.ru
102    <grab.response.Response object at 0x109de50>
103    >>> g.change_proxy()
104    >>> g.go('http://ya.ru')
105    DEBUG:grab.network:[05] GET http://ya.ru via 62.122.73.30:8080 proxy of type http with authorization
106    <grab.response.Response object at 0x109d9f0>
107    >>> g.go('http://ya.ru')
108    DEBUG:grab.network:[06] GET http://ya.ru via 62.122.73.30:8080 proxy of type http with authorization
109    <grab.response.Response object at 0x109d9f0>
110
111
112Getting Proxy From Proxy List
113-----------------------------
114
115Each time you call `g.proxylist.get_next_proxy`, you get the next proxy from the proxy list.
116When you receive the last proxy in the list, you'll continue receiving proxies from the beginning of the list.
117You can also use `g.proxylist.get_random_proxy` to pick a random proxy from the proxy list.
118
119Automatic Proxy List Reloading
120------------------------------
121
122Grab automatically rereads the proxy source each `g.proxylist.reload_time`
123seconds. You can set the value of this option as follows::
124
125    >>> g = Grab()
126    >>> g.proxylist.setup(reload_time=3600) # reload proxy list one time per hour
127
128
129Proxy Accumulating
130------------------
131
132Be default, Grab overwrites the proxy list each time it reloads the proxy source. You can change that behaviour::
133
134    >>> g.proxylist.setup(accumulate_updates=True)
135
136That will setup Grab to append new proxies to existing ones.
137