1.. _news:
2
3Release notes
4=============
5
6.. _release-2.5.1:
7
8Scrapy 2.5.1 (2021-10-05)
9-------------------------
10
11*   **Security bug fix:**
12
13    If you use
14    :class:`~scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware`
15    (i.e. the ``http_user`` and ``http_pass`` spider attributes) for HTTP
16    authentication, any request exposes your credentials to the request target.
17
18    To prevent unintended exposure of authentication credentials to unintended
19    domains, you must now additionally set a new, additional spider attribute,
20    ``http_auth_domain``, and point it to the specific domain to which the
21    authentication credentials must be sent.
22
23    If the ``http_auth_domain`` spider attribute is not set, the domain of the
24    first request will be considered the HTTP authentication target, and
25    authentication credentials will only be sent in requests targeting that
26    domain.
27
28    If you need to send the same HTTP authentication credentials to multiple
29    domains, you can use :func:`w3lib.http.basic_auth_header` instead to
30    set the value of the ``Authorization`` header of your requests.
31
32    If you *really* want your spider to send the same HTTP authentication
33    credentials to any domain, set the ``http_auth_domain`` spider attribute
34    to ``None``.
35
36    Finally, if you are a user of `scrapy-splash`_, know that this version of
37    Scrapy breaks compatibility with scrapy-splash 0.7.2 and earlier. You will
38    need to upgrade scrapy-splash to a greater version for it to continue to
39    work.
40
41.. _scrapy-splash: https://github.com/scrapy-plugins/scrapy-splash
42
43
44.. _release-2.5.0:
45
46Scrapy 2.5.0 (2021-04-06)
47-------------------------
48
49Highlights:
50
51-   Official Python 3.9 support
52
53-   Experimental :ref:`HTTP/2 support <http2>`
54
55-   New :func:`~scrapy.downloadermiddlewares.retry.get_retry_request` function
56    to retry requests from spider callbacks
57
58-   New :class:`~scrapy.signals.headers_received` signal that allows stopping
59    downloads early
60
61-   New :class:`Response.protocol <scrapy.http.Response.protocol>` attribute
62
63Deprecation removals
64~~~~~~~~~~~~~~~~~~~~
65
66-   Removed all code that :ref:`was deprecated in 1.7.0 <1.7-deprecations>` and
67    had not :ref:`already been removed in 2.4.0 <2.4-deprecation-removals>`.
68    (:issue:`4901`)
69
70-   Removed support for the ``SCRAPY_PICKLED_SETTINGS_TO_OVERRIDE`` environment
71    variable, :ref:`deprecated in 1.8.0 <1.8-deprecations>`. (:issue:`4912`)
72
73
74Deprecations
75~~~~~~~~~~~~
76
77-   The :mod:`scrapy.utils.py36` module is now deprecated in favor of
78    :mod:`scrapy.utils.asyncgen`. (:issue:`4900`)
79
80
81New features
82~~~~~~~~~~~~
83
84-   Experimental :ref:`HTTP/2 support <http2>` through a new download handler
85    that can be assigned to the ``https`` protocol in the
86    :setting:`DOWNLOAD_HANDLERS` setting.
87    (:issue:`1854`, :issue:`4769`, :issue:`5058`, :issue:`5059`, :issue:`5066`)
88
89-   The new :func:`scrapy.downloadermiddlewares.retry.get_retry_request`
90    function may be used from spider callbacks or middlewares to handle the
91    retrying of a request beyond the scenarios that
92    :class:`~scrapy.downloadermiddlewares.retry.RetryMiddleware` supports.
93    (:issue:`3590`, :issue:`3685`, :issue:`4902`)
94
95-   The new :class:`~scrapy.signals.headers_received` signal gives early access
96    to response headers and allows :ref:`stopping downloads
97    <topics-stop-response-download>`.
98    (:issue:`1772`, :issue:`4897`)
99
100-   The new :attr:`Response.protocol <scrapy.http.Response.protocol>`
101    attribute gives access to the string that identifies the protocol used to
102    download a response. (:issue:`4878`)
103
104-   :ref:`Stats <topics-stats>` now include the following entries that indicate
105    the number of successes and failures in storing
106    :ref:`feeds <topics-feed-exports>`::
107
108        feedexport/success_count/<storage type>
109        feedexport/failed_count/<storage type>
110
111    Where ``<storage type>`` is the feed storage backend class name, such as
112    :class:`~scrapy.extensions.feedexport.FileFeedStorage` or
113    :class:`~scrapy.extensions.feedexport.FTPFeedStorage`.
114
115    (:issue:`3947`, :issue:`4850`)
116
117-   The :class:`~scrapy.spidermiddlewares.urllength.UrlLengthMiddleware` spider
118    middleware now logs ignored URLs with ``INFO`` :ref:`logging level
119    <levels>` instead of ``DEBUG``, and it now includes the following entry
120    into :ref:`stats <topics-stats>` to keep track of the number of ignored
121    URLs::
122
123        urllength/request_ignored_count
124
125    (:issue:`5036`)
126
127-   The
128    :class:`~scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware`
129    downloader middleware now logs the number of decompressed responses and the
130    total count of resulting bytes::
131
132        httpcompression/response_bytes
133        httpcompression/response_count
134
135    (:issue:`4797`, :issue:`4799`)
136
137
138Bug fixes
139~~~~~~~~~
140
141-   Fixed installation on PyPy installing PyDispatcher in addition to
142    PyPyDispatcher, which could prevent Scrapy from working depending on which
143    package got imported. (:issue:`4710`, :issue:`4814`)
144
145-   When inspecting a callback to check if it is a generator that also returns
146    a value, an exception is no longer raised if the callback has a docstring
147    with lower indentation than the following code.
148    (:issue:`4477`, :issue:`4935`)
149
150-   The `Content-Length <https://tools.ietf.org/html/rfc2616#section-14.13>`_
151    header is no longer omitted from responses when using the default, HTTP/1.1
152    download handler (see :setting:`DOWNLOAD_HANDLERS`).
153    (:issue:`5009`, :issue:`5034`, :issue:`5045`, :issue:`5057`, :issue:`5062`)
154
155-   Setting the :reqmeta:`handle_httpstatus_all` request meta key to ``False``
156    now has the same effect as not setting it at all, instead of having the
157    same effect as setting it to ``True``.
158    (:issue:`3851`, :issue:`4694`)
159
160
161Documentation
162~~~~~~~~~~~~~
163
164-   Added instructions to :ref:`install Scrapy in Windows using pip
165    <intro-install-windows>`.
166    (:issue:`4715`, :issue:`4736`)
167
168-   Logging documentation now includes :ref:`additional ways to filter logs
169    <topics-logging-advanced-customization>`.
170    (:issue:`4216`, :issue:`4257`, :issue:`4965`)
171
172-   Covered how to deal with long lists of allowed domains in the :ref:`FAQ
173    <faq>`. (:issue:`2263`, :issue:`3667`)
174
175-   Covered scrapy-bench_ in :ref:`benchmarking`.
176    (:issue:`4996`, :issue:`5016`)
177
178-   Clarified that one :ref:`extension <topics-extensions>` instance is created
179    per crawler.
180    (:issue:`5014`)
181
182-   Fixed some errors in examples.
183    (:issue:`4829`, :issue:`4830`, :issue:`4907`, :issue:`4909`,
184    :issue:`5008`)
185
186-   Fixed some external links, typos, and so on.
187    (:issue:`4892`, :issue:`4899`, :issue:`4936`, :issue:`4942`, :issue:`5005`,
188    :issue:`5063`)
189
190-   The :ref:`list of Request.meta keys <topics-request-meta>` is now sorted
191    alphabetically.
192    (:issue:`5061`, :issue:`5065`)
193
194-   Updated references to Scrapinghub, which is now called Zyte.
195    (:issue:`4973`, :issue:`5072`)
196
197-   Added a mention to contributors in the README. (:issue:`4956`)
198
199-   Reduced the top margin of lists. (:issue:`4974`)
200
201
202Quality Assurance
203~~~~~~~~~~~~~~~~~
204
205-   Made Python 3.9 support official (:issue:`4757`, :issue:`4759`)
206
207-   Extended typing hints (:issue:`4895`)
208
209-   Fixed deprecated uses of the Twisted API.
210    (:issue:`4940`, :issue:`4950`, :issue:`5073`)
211
212-   Made our tests run with the new pip resolver.
213    (:issue:`4710`, :issue:`4814`)
214
215-   Added tests to ensure that :ref:`coroutine support <coroutine-support>`
216    is tested. (:issue:`4987`)
217
218-   Migrated from Travis CI to GitHub Actions. (:issue:`4924`)
219
220-   Fixed CI issues.
221    (:issue:`4986`, :issue:`5020`, :issue:`5022`, :issue:`5027`, :issue:`5052`,
222    :issue:`5053`)
223
224-   Implemented code refactorings, style fixes and cleanups.
225    (:issue:`4911`, :issue:`4982`, :issue:`5001`, :issue:`5002`, :issue:`5076`)
226
227
228.. _release-2.4.1:
229
230Scrapy 2.4.1 (2020-11-17)
231-------------------------
232
233-   Fixed :ref:`feed exports <topics-feed-exports>` overwrite support (:issue:`4845`, :issue:`4857`, :issue:`4859`)
234
235-   Fixed the AsyncIO event loop handling, which could make code hang
236    (:issue:`4855`, :issue:`4872`)
237
238-   Fixed the IPv6-capable DNS resolver
239    :class:`~scrapy.resolver.CachingHostnameResolver` for download handlers
240    that call
241    :meth:`reactor.resolve <twisted.internet.interfaces.IReactorCore.resolve>`
242    (:issue:`4802`, :issue:`4803`)
243
244-   Fixed the output of the :command:`genspider` command showing placeholders
245    instead of the import path of the generated spider module (:issue:`4874`)
246
247-   Migrated Windows CI from Azure Pipelines to GitHub Actions (:issue:`4869`,
248    :issue:`4876`)
249
250
251.. _release-2.4.0:
252
253Scrapy 2.4.0 (2020-10-11)
254-------------------------
255
256Highlights:
257
258*   Python 3.5 support has been dropped.
259
260*   The ``file_path`` method of :ref:`media pipelines <topics-media-pipeline>`
261    can now access the source :ref:`item <topics-items>`.
262
263    This allows you to set a download file path based on item data.
264
265*   The new ``item_export_kwargs`` key of the :setting:`FEEDS` setting allows
266    to define keyword parameters to pass to :ref:`item exporter classes
267    <topics-exporters>`
268
269*   You can now choose whether :ref:`feed exports <topics-feed-exports>`
270    overwrite or append to the output file.
271
272    For example, when using the :command:`crawl` or :command:`runspider`
273    commands, you can use the ``-O`` option instead of ``-o`` to overwrite the
274    output file.
275
276*   Zstd-compressed responses are now supported if zstandard_ is installed.
277
278*   In settings, where the import path of a class is required, it is now
279    possible to pass a class object instead.
280
281Modified requirements
282~~~~~~~~~~~~~~~~~~~~~
283
284*   Python 3.6 or greater is now required; support for Python 3.5 has been
285    dropped
286
287    As a result:
288
289    -   When using PyPy, PyPy 7.2.0 or greater :ref:`is now required
290        <faq-python-versions>`
291
292    -   For Amazon S3 storage support in :ref:`feed exports
293        <topics-feed-storage-s3>` or :ref:`media pipelines
294        <media-pipelines-s3>`, botocore_ 1.4.87 or greater is now required
295
296    -   To use the :ref:`images pipeline <images-pipeline>`, Pillow_ 4.0.0 or
297        greater is now required
298
299    (:issue:`4718`, :issue:`4732`, :issue:`4733`, :issue:`4742`, :issue:`4743`,
300    :issue:`4764`)
301
302
303Backward-incompatible changes
304~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
305
306*   :class:`~scrapy.downloadermiddlewares.cookies.CookiesMiddleware` once again
307    discards cookies defined in :attr:`Request.headers
308    <scrapy.http.Request.headers>`.
309
310    We decided to revert this bug fix, introduced in Scrapy 2.2.0, because it
311    was reported that the current implementation could break existing code.
312
313    If you need to set cookies for a request, use the :class:`Request.cookies
314    <scrapy.http.Request>` parameter.
315
316    A future version of Scrapy will include a new, better implementation of the
317    reverted bug fix.
318
319    (:issue:`4717`, :issue:`4823`)
320
321
322.. _2.4-deprecation-removals:
323
324Deprecation removals
325~~~~~~~~~~~~~~~~~~~~
326
327*   :class:`scrapy.extensions.feedexport.S3FeedStorage` no longer reads the
328    values of ``access_key`` and ``secret_key`` from the running project
329    settings when they are not passed to its ``__init__`` method; you must
330    either pass those parameters to its ``__init__`` method or use
331    :class:`S3FeedStorage.from_crawler
332    <scrapy.extensions.feedexport.S3FeedStorage.from_crawler>`
333    (:issue:`4356`, :issue:`4411`, :issue:`4688`)
334
335*   :attr:`Rule.process_request <scrapy.spiders.crawl.Rule.process_request>`
336    no longer admits callables which expect a single ``request`` parameter,
337    rather than both ``request`` and ``response`` (:issue:`4818`)
338
339
340Deprecations
341~~~~~~~~~~~~
342
343*   In custom :ref:`media pipelines <topics-media-pipeline>`, signatures that
344    do not accept a keyword-only ``item`` parameter in any of the  methods that
345    :ref:`now support this parameter <media-pipeline-item-parameter>` are now
346    deprecated (:issue:`4628`, :issue:`4686`)
347
348*   In custom :ref:`feed storage backend classes <topics-feed-storage>`,
349    ``__init__`` method signatures that do not accept a keyword-only
350    ``feed_options`` parameter are now deprecated (:issue:`547`, :issue:`716`,
351    :issue:`4512`)
352
353*   The :class:`scrapy.utils.python.WeakKeyCache` class is now deprecated
354    (:issue:`4684`, :issue:`4701`)
355
356*   The :func:`scrapy.utils.boto.is_botocore` function is now deprecated, use
357    :func:`scrapy.utils.boto.is_botocore_available` instead (:issue:`4734`,
358    :issue:`4776`)
359
360
361New features
362~~~~~~~~~~~~
363
364.. _media-pipeline-item-parameter:
365
366*   The following methods of :ref:`media pipelines <topics-media-pipeline>` now
367    accept an ``item`` keyword-only parameter containing the source
368    :ref:`item <topics-items>`:
369
370    -   In :class:`scrapy.pipelines.files.FilesPipeline`:
371
372        -   :meth:`~scrapy.pipelines.files.FilesPipeline.file_downloaded`
373
374        -   :meth:`~scrapy.pipelines.files.FilesPipeline.file_path`
375
376        -   :meth:`~scrapy.pipelines.files.FilesPipeline.media_downloaded`
377
378        -   :meth:`~scrapy.pipelines.files.FilesPipeline.media_to_download`
379
380    -   In :class:`scrapy.pipelines.images.ImagesPipeline`:
381
382        -   :meth:`~scrapy.pipelines.images.ImagesPipeline.file_downloaded`
383
384        -   :meth:`~scrapy.pipelines.images.ImagesPipeline.file_path`
385
386        -   :meth:`~scrapy.pipelines.images.ImagesPipeline.get_images`
387
388        -   :meth:`~scrapy.pipelines.images.ImagesPipeline.image_downloaded`
389
390        -   :meth:`~scrapy.pipelines.images.ImagesPipeline.media_downloaded`
391
392        -   :meth:`~scrapy.pipelines.images.ImagesPipeline.media_to_download`
393
394    (:issue:`4628`, :issue:`4686`)
395
396*   The new ``item_export_kwargs`` key of the :setting:`FEEDS` setting allows
397    to define keyword parameters to pass to :ref:`item exporter classes
398    <topics-exporters>` (:issue:`4606`, :issue:`4768`)
399
400*   :ref:`Feed exports <topics-feed-exports>` gained overwrite support:
401
402    *   When using the :command:`crawl` or :command:`runspider` commands, you
403        can use the ``-O`` option instead of ``-o`` to overwrite the output
404        file
405
406    *   You can use the ``overwrite`` key in the :setting:`FEEDS` setting to
407        configure whether to overwrite the output file (``True``) or append to
408        its content (``False``)
409
410    *   The ``__init__`` and ``from_crawler`` methods of :ref:`feed storage
411        backend classes <topics-feed-storage>` now receive a new keyword-only
412        parameter, ``feed_options``, which is a dictionary of :ref:`feed
413        options <feed-options>`
414
415    (:issue:`547`, :issue:`716`, :issue:`4512`)
416
417*   Zstd-compressed responses are now supported if zstandard_ is installed
418    (:issue:`4831`)
419
420*   In settings, where the import path of a class is required, it is now
421    possible to pass a class object instead (:issue:`3870`, :issue:`3873`).
422
423    This includes also settings where only part of its value is made of an
424    import path, such as :setting:`DOWNLOADER_MIDDLEWARES` or
425    :setting:`DOWNLOAD_HANDLERS`.
426
427*   :ref:`Downloader middlewares <topics-downloader-middleware>` can now
428    override :class:`response.request <scrapy.http.Response.request>`.
429
430    If a :ref:`downloader middleware <topics-downloader-middleware>` returns
431    a :class:`~scrapy.http.Response` object from
432    :meth:`~scrapy.downloadermiddlewares.DownloaderMiddleware.process_response`
433    or
434    :meth:`~scrapy.downloadermiddlewares.DownloaderMiddleware.process_exception`
435    with a custom :class:`~scrapy.http.Request` object assigned to
436    :class:`response.request <scrapy.http.Response.request>`:
437
438    -   The response is handled by the callback of that custom
439        :class:`~scrapy.http.Request` object, instead of being handled by the
440        callback of the original :class:`~scrapy.http.Request` object
441
442    -   That custom :class:`~scrapy.http.Request` object is now sent as the
443        ``request`` argument to the :signal:`response_received` signal, instead
444        of the original :class:`~scrapy.http.Request` object
445
446    (:issue:`4529`, :issue:`4632`)
447
448*   When using the :ref:`FTP feed storage backend <topics-feed-storage-ftp>`:
449
450    -   It is now possible to set the new ``overwrite`` :ref:`feed option
451        <feed-options>` to ``False`` to append to an existing file instead of
452        overwriting it
453
454    -   The FTP password can now be omitted if it is not necessary
455
456    (:issue:`547`, :issue:`716`, :issue:`4512`)
457
458*   The ``__init__`` method of :class:`~scrapy.exporters.CsvItemExporter` now
459    supports an ``errors`` parameter to indicate how to handle encoding errors
460    (:issue:`4755`)
461
462*   When :ref:`using asyncio <using-asyncio>`, it is now possible to
463    :ref:`set a custom asyncio loop <using-custom-loops>` (:issue:`4306`,
464    :issue:`4414`)
465
466*   Serialized requests (see :ref:`topics-jobs`) now support callbacks that are
467    spider methods that delegate on other callable (:issue:`4756`)
468
469*   When a response is larger than :setting:`DOWNLOAD_MAXSIZE`, the logged
470    message is now a warning, instead of an error (:issue:`3874`,
471    :issue:`3886`, :issue:`4752`)
472
473
474Bug fixes
475~~~~~~~~~
476
477*   The :command:`genspider` command no longer overwrites existing files
478    unless the ``--force`` option is used (:issue:`4561`, :issue:`4616`,
479    :issue:`4623`)
480
481*   Cookies with an empty value are no longer considered invalid cookies
482    (:issue:`4772`)
483
484*   The :command:`runspider` command now supports files with the ``.pyw`` file
485    extension (:issue:`4643`, :issue:`4646`)
486
487*   The :class:`~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware`
488    middleware now simply ignores unsupported proxy values (:issue:`3331`,
489    :issue:`4778`)
490
491*   Checks for generator callbacks with a ``return`` statement no longer warn
492    about ``return`` statements in nested functions (:issue:`4720`,
493    :issue:`4721`)
494
495*   The system file mode creation mask no longer affects the permissions of
496    files generated using the :command:`startproject` command (:issue:`4722`)
497
498*   :func:`scrapy.utils.iterators.xmliter` now supports namespaced node names
499    (:issue:`861`, :issue:`4746`)
500
501*   :class:`~scrapy.Request` objects can now have ``about:`` URLs, which can
502    work when using a headless browser (:issue:`4835`)
503
504
505Documentation
506~~~~~~~~~~~~~
507
508*   The :setting:`FEED_URI_PARAMS` setting is now documented (:issue:`4671`,
509    :issue:`4724`)
510
511*   Improved the documentation of
512    :ref:`link extractors <topics-link-extractors>` with an usage example from
513    a spider callback and reference documentation for the
514    :class:`~scrapy.link.Link` class (:issue:`4751`, :issue:`4775`)
515
516*   Clarified the impact of :setting:`CONCURRENT_REQUESTS` when using the
517    :class:`~scrapy.extensions.closespider.CloseSpider` extension
518    (:issue:`4836`)
519
520*   Removed references to Python 2’s ``unicode`` type (:issue:`4547`,
521    :issue:`4703`)
522
523*   We now have an :ref:`official deprecation policy <deprecation-policy>`
524    (:issue:`4705`)
525
526*   Our :ref:`documentation policies <documentation-policies>` now cover usage
527    of Sphinx’s :rst:dir:`versionadded` and :rst:dir:`versionchanged`
528    directives, and we have removed usages referencing Scrapy 1.4.0 and earlier
529    versions (:issue:`3971`, :issue:`4310`)
530
531*   Other documentation cleanups (:issue:`4090`, :issue:`4782`, :issue:`4800`,
532    :issue:`4801`, :issue:`4809`, :issue:`4816`, :issue:`4825`)
533
534
535Quality assurance
536~~~~~~~~~~~~~~~~~
537
538*   Extended typing hints (:issue:`4243`, :issue:`4691`)
539
540*   Added tests for the :command:`check` command (:issue:`4663`)
541
542*   Fixed test failures on Debian (:issue:`4726`, :issue:`4727`, :issue:`4735`)
543
544*   Improved Windows test coverage (:issue:`4723`)
545
546*   Switched to :ref:`formatted string literals <f-strings>` where possible
547    (:issue:`4307`, :issue:`4324`, :issue:`4672`)
548
549*   Modernized :func:`super` usage (:issue:`4707`)
550
551*   Other code and test cleanups (:issue:`1790`, :issue:`3288`, :issue:`4165`,
552    :issue:`4564`, :issue:`4651`, :issue:`4714`, :issue:`4738`, :issue:`4745`,
553    :issue:`4747`, :issue:`4761`, :issue:`4765`, :issue:`4804`, :issue:`4817`,
554    :issue:`4820`, :issue:`4822`, :issue:`4839`)
555
556
557.. _release-2.3.0:
558
559Scrapy 2.3.0 (2020-08-04)
560-------------------------
561
562Highlights:
563
564*   :ref:`Feed exports <topics-feed-exports>` now support :ref:`Google Cloud
565    Storage <topics-feed-storage-gcs>` as a storage backend
566
567*   The new :setting:`FEED_EXPORT_BATCH_ITEM_COUNT` setting allows to deliver
568    output items in batches of up to the specified number of items.
569
570    It also serves as a workaround for :ref:`delayed file delivery
571    <delayed-file-delivery>`, which causes Scrapy to only start item delivery
572    after the crawl has finished when using certain storage backends
573    (:ref:`S3 <topics-feed-storage-s3>`, :ref:`FTP <topics-feed-storage-ftp>`,
574    and now :ref:`GCS <topics-feed-storage-gcs>`).
575
576*   The base implementation of :ref:`item loaders <topics-loaders>` has been
577    moved into a separate library, :doc:`itemloaders <itemloaders:index>`,
578    allowing usage from outside Scrapy and a separate release schedule
579
580Deprecation removals
581~~~~~~~~~~~~~~~~~~~~
582
583*   Removed the following classes and their parent modules from
584    ``scrapy.linkextractors``:
585
586    *   ``htmlparser.HtmlParserLinkExtractor``
587    *   ``regex.RegexLinkExtractor``
588    *   ``sgml.BaseSgmlLinkExtractor``
589    *   ``sgml.SgmlLinkExtractor``
590
591    Use
592    :class:`LinkExtractor <scrapy.linkextractors.lxmlhtml.LxmlLinkExtractor>`
593    instead (:issue:`4356`, :issue:`4679`)
594
595
596Deprecations
597~~~~~~~~~~~~
598
599*   The ``scrapy.utils.python.retry_on_eintr`` function is now deprecated
600    (:issue:`4683`)
601
602
603New features
604~~~~~~~~~~~~
605
606*   :ref:`Feed exports <topics-feed-exports>` support :ref:`Google Cloud
607    Storage <topics-feed-storage-gcs>` (:issue:`685`, :issue:`3608`)
608
609*   New :setting:`FEED_EXPORT_BATCH_ITEM_COUNT` setting for batch deliveries
610    (:issue:`4250`, :issue:`4434`)
611
612*   The :command:`parse` command now allows specifying an output file
613    (:issue:`4317`, :issue:`4377`)
614
615*   :meth:`Request.from_curl <scrapy.http.Request.from_curl>` and
616    :func:`~scrapy.utils.curl.curl_to_request_kwargs` now also support
617    ``--data-raw`` (:issue:`4612`)
618
619*   A ``parse`` callback may now be used in built-in spider subclasses, such
620    as :class:`~scrapy.spiders.CrawlSpider` (:issue:`712`, :issue:`732`,
621    :issue:`781`, :issue:`4254` )
622
623
624Bug fixes
625~~~~~~~~~
626
627*   Fixed the :ref:`CSV exporting <topics-feed-format-csv>` of
628    :ref:`dataclass items <dataclass-items>` and :ref:`attr.s items
629    <attrs-items>` (:issue:`4667`, :issue:`4668`)
630
631*   :meth:`Request.from_curl <scrapy.http.Request.from_curl>` and
632    :func:`~scrapy.utils.curl.curl_to_request_kwargs` now set the request
633    method to ``POST`` when a request body is specified and no request method
634    is specified (:issue:`4612`)
635
636*   The processing of ANSI escape sequences in enabled in Windows 10.0.14393
637    and later, where it is required for colored output (:issue:`4393`,
638    :issue:`4403`)
639
640
641Documentation
642~~~~~~~~~~~~~
643
644*   Updated the `OpenSSL cipher list format`_ link in the documentation about
645    the :setting:`DOWNLOADER_CLIENT_TLS_CIPHERS` setting (:issue:`4653`)
646
647*   Simplified the code example in :ref:`topics-loaders-dataclass`
648    (:issue:`4652`)
649
650.. _OpenSSL cipher list format: https://www.openssl.org/docs/manmaster/man1/openssl-ciphers.html#CIPHER-LIST-FORMAT
651
652
653Quality assurance
654~~~~~~~~~~~~~~~~~
655
656*   The base implementation of :ref:`item loaders <topics-loaders>` has been
657    moved into :doc:`itemloaders <itemloaders:index>` (:issue:`4005`,
658    :issue:`4516`)
659
660*   Fixed a silenced error in some scheduler tests (:issue:`4644`,
661    :issue:`4645`)
662
663*   Renewed the localhost certificate used for SSL tests (:issue:`4650`)
664
665*   Removed cookie-handling code specific to Python 2 (:issue:`4682`)
666
667*   Stopped using Python 2 unicode literal syntax (:issue:`4704`)
668
669*   Stopped using a backlash for line continuation (:issue:`4673`)
670
671*   Removed unneeded entries from the MyPy exception list (:issue:`4690`)
672
673*   Automated tests now pass on Windows as part of our continuous integration
674    system (:issue:`4458`)
675
676*   Automated tests now pass on the latest PyPy version for supported Python
677    versions in our continuous integration system (:issue:`4504`)
678
679
680.. _release-2.2.1:
681
682Scrapy 2.2.1 (2020-07-17)
683-------------------------
684
685*   The :command:`startproject` command no longer makes unintended changes to
686    the permissions of files in the destination folder, such as removing
687    execution permissions (:issue:`4662`, :issue:`4666`)
688
689
690.. _release-2.2.0:
691
692Scrapy 2.2.0 (2020-06-24)
693-------------------------
694
695Highlights:
696
697* Python 3.5.2+ is required now
698* :ref:`dataclass objects <dataclass-items>` and
699  :ref:`attrs objects <attrs-items>` are now valid :ref:`item types
700  <item-types>`
701* New :meth:`TextResponse.json <scrapy.http.TextResponse.json>` method
702* New :signal:`bytes_received` signal that allows canceling response download
703* :class:`~scrapy.downloadermiddlewares.cookies.CookiesMiddleware` fixes
704
705Backward-incompatible changes
706~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
707
708*   Support for Python 3.5.0 and 3.5.1 has been dropped; Scrapy now refuses to
709    run with a Python version lower than 3.5.2, which introduced
710    :class:`typing.Type` (:issue:`4615`)
711
712
713Deprecations
714~~~~~~~~~~~~
715
716*   :meth:`TextResponse.body_as_unicode
717    <scrapy.http.TextResponse.body_as_unicode>` is now deprecated, use
718    :attr:`TextResponse.text <scrapy.http.TextResponse.text>` instead
719    (:issue:`4546`, :issue:`4555`, :issue:`4579`)
720
721*   :class:`scrapy.item.BaseItem` is now deprecated, use
722    :class:`scrapy.item.Item` instead (:issue:`4534`)
723
724
725New features
726~~~~~~~~~~~~
727
728*   :ref:`dataclass objects <dataclass-items>` and
729    :ref:`attrs objects <attrs-items>` are now valid :ref:`item types
730    <item-types>`, and a new itemadapter_ library makes it easy to
731    write code that :ref:`supports any item type <supporting-item-types>`
732    (:issue:`2749`, :issue:`2807`, :issue:`3761`, :issue:`3881`, :issue:`4642`)
733
734*   A new :meth:`TextResponse.json <scrapy.http.TextResponse.json>` method
735    allows to deserialize JSON responses (:issue:`2444`, :issue:`4460`,
736    :issue:`4574`)
737
738*   A new :signal:`bytes_received` signal allows monitoring response download
739    progress and :ref:`stopping downloads <topics-stop-response-download>`
740    (:issue:`4205`, :issue:`4559`)
741
742*   The dictionaries in the result list of a :ref:`media pipeline
743    <topics-media-pipeline>` now include a new key, ``status``, which indicates
744    if the file was downloaded or, if the file was not downloaded, why it was
745    not downloaded; see :meth:`FilesPipeline.get_media_requests
746    <scrapy.pipelines.files.FilesPipeline.get_media_requests>` for more
747    information (:issue:`2893`, :issue:`4486`)
748
749*   When using :ref:`Google Cloud Storage <media-pipeline-gcs>` for
750    a :ref:`media pipeline <topics-media-pipeline>`, a warning is now logged if
751    the configured credentials do not grant the required permissions
752    (:issue:`4346`, :issue:`4508`)
753
754*   :ref:`Link extractors <topics-link-extractors>` are now serializable,
755    as long as you do not use :ref:`lambdas <lambda>` for parameters; for
756    example, you can now pass link extractors in :attr:`Request.cb_kwargs
757    <scrapy.http.Request.cb_kwargs>` or
758    :attr:`Request.meta <scrapy.http.Request.meta>` when :ref:`persisting
759    scheduled requests <topics-jobs>` (:issue:`4554`)
760
761*   Upgraded the :ref:`pickle protocol <pickle-protocols>` that Scrapy uses
762    from protocol 2 to protocol 4, improving serialization capabilities and
763    performance (:issue:`4135`, :issue:`4541`)
764
765*   :func:`scrapy.utils.misc.create_instance` now raises a :exc:`TypeError`
766    exception if the resulting instance is ``None`` (:issue:`4528`,
767    :issue:`4532`)
768
769.. _itemadapter: https://github.com/scrapy/itemadapter
770
771
772Bug fixes
773~~~~~~~~~
774
775*   :class:`~scrapy.downloadermiddlewares.cookies.CookiesMiddleware` no longer
776    discards cookies defined in :attr:`Request.headers
777    <scrapy.http.Request.headers>` (:issue:`1992`, :issue:`2400`)
778
779*   :class:`~scrapy.downloadermiddlewares.cookies.CookiesMiddleware` no longer
780    re-encodes cookies defined as :class:`bytes` in the ``cookies`` parameter
781    of the ``__init__`` method of :class:`~scrapy.http.Request`
782    (:issue:`2400`, :issue:`3575`)
783
784*   When :setting:`FEEDS` defines multiple URIs, :setting:`FEED_STORE_EMPTY` is
785    ``False`` and the crawl yields no items, Scrapy no longer stops feed
786    exports after the first URI (:issue:`4621`, :issue:`4626`)
787
788*   :class:`~scrapy.spiders.Spider` callbacks defined using :doc:`coroutine
789    syntax <topics/coroutines>` no longer need to return an iterable, and may
790    instead return a :class:`~scrapy.http.Request` object, an
791    :ref:`item <topics-items>`, or ``None`` (:issue:`4609`)
792
793*   The :command:`startproject` command now ensures that the generated project
794    folders and files have the right permissions (:issue:`4604`)
795
796*   Fix a :exc:`KeyError` exception being sometimes raised from
797    :class:`scrapy.utils.datatypes.LocalWeakReferencedCache` (:issue:`4597`,
798    :issue:`4599`)
799
800*   When :setting:`FEEDS` defines multiple URIs, log messages about items being
801    stored now contain information from the corresponding feed, instead of
802    always containing information about only one of the feeds (:issue:`4619`,
803    :issue:`4629`)
804
805
806Documentation
807~~~~~~~~~~~~~
808
809*   Added a new section about :ref:`accessing cb_kwargs from errbacks
810    <errback-cb_kwargs>` (:issue:`4598`, :issue:`4634`)
811
812*   Covered chompjs_ in :ref:`topics-parsing-javascript` (:issue:`4556`,
813    :issue:`4562`)
814
815*   Removed from :doc:`topics/coroutines` the warning about the API being
816    experimental (:issue:`4511`, :issue:`4513`)
817
818*   Removed references to unsupported versions of :doc:`Twisted
819    <twisted:index>` (:issue:`4533`)
820
821*   Updated the description of the :ref:`screenshot pipeline example
822    <ScreenshotPipeline>`, which now uses :doc:`coroutine syntax
823    <topics/coroutines>` instead of returning a
824    :class:`~twisted.internet.defer.Deferred` (:issue:`4514`, :issue:`4593`)
825
826*   Removed a misleading import line from the
827    :func:`scrapy.utils.log.configure_logging` code example (:issue:`4510`,
828    :issue:`4587`)
829
830*   The display-on-hover behavior of internal documentation references now also
831    covers links to :ref:`commands <topics-commands>`, :attr:`Request.meta
832    <scrapy.http.Request.meta>` keys, :ref:`settings <topics-settings>` and
833    :ref:`signals <topics-signals>` (:issue:`4495`, :issue:`4563`)
834
835*   It is again possible to download the documentation for offline reading
836    (:issue:`4578`, :issue:`4585`)
837
838*   Removed backslashes preceding ``*args`` and ``**kwargs`` in some function
839    and method signatures (:issue:`4592`, :issue:`4596`)
840
841.. _chompjs: https://github.com/Nykakin/chompjs
842
843
844Quality assurance
845~~~~~~~~~~~~~~~~~
846
847*   Adjusted the code base further to our :ref:`style guidelines
848    <coding-style>` (:issue:`4237`, :issue:`4525`, :issue:`4538`,
849    :issue:`4539`, :issue:`4540`, :issue:`4542`, :issue:`4543`, :issue:`4544`,
850    :issue:`4545`, :issue:`4557`, :issue:`4558`, :issue:`4566`, :issue:`4568`,
851    :issue:`4572`)
852
853*   Removed remnants of Python 2 support (:issue:`4550`, :issue:`4553`,
854    :issue:`4568`)
855
856*   Improved code sharing between the :command:`crawl` and :command:`runspider`
857    commands (:issue:`4548`, :issue:`4552`)
858
859*   Replaced ``chain(*iterable)`` with ``chain.from_iterable(iterable)``
860    (:issue:`4635`)
861
862*   You may now run the :mod:`asyncio` tests with Tox on any Python version
863    (:issue:`4521`)
864
865*   Updated test requirements to reflect an incompatibility with pytest 5.4 and
866    5.4.1 (:issue:`4588`)
867
868*   Improved :class:`~scrapy.spiderloader.SpiderLoader` test coverage for
869    scenarios involving duplicate spider names (:issue:`4549`, :issue:`4560`)
870
871*   Configured Travis CI to also run the tests with Python 3.5.2
872    (:issue:`4518`, :issue:`4615`)
873
874*   Added a `Pylint <https://www.pylint.org/>`_ job to Travis CI
875    (:issue:`3727`)
876
877*   Added a `Mypy <http://mypy-lang.org/>`_ job to Travis CI (:issue:`4637`)
878
879*   Made use of set literals in tests (:issue:`4573`)
880
881*   Cleaned up the Travis CI configuration (:issue:`4517`, :issue:`4519`,
882    :issue:`4522`, :issue:`4537`)
883
884
885.. _release-2.1.0:
886
887Scrapy 2.1.0 (2020-04-24)
888-------------------------
889
890Highlights:
891
892* New :setting:`FEEDS` setting to export to multiple feeds
893* New :attr:`Response.ip_address <scrapy.http.Response.ip_address>` attribute
894
895Backward-incompatible changes
896~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
897
898*   :exc:`AssertionError` exceptions triggered by :ref:`assert <assert>`
899    statements have been replaced by new exception types, to support running
900    Python in optimized mode (see :option:`-O`) without changing Scrapy’s
901    behavior in any unexpected ways.
902
903    If you catch an :exc:`AssertionError` exception from Scrapy, update your
904    code to catch the corresponding new exception.
905
906    (:issue:`4440`)
907
908
909Deprecation removals
910~~~~~~~~~~~~~~~~~~~~
911
912*   The ``LOG_UNSERIALIZABLE_REQUESTS`` setting is no longer supported, use
913    :setting:`SCHEDULER_DEBUG` instead (:issue:`4385`)
914
915*   The ``REDIRECT_MAX_METAREFRESH_DELAY`` setting is no longer supported, use
916    :setting:`METAREFRESH_MAXDELAY` instead (:issue:`4385`)
917
918*   The :class:`~scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware`
919    middleware has been removed, including the entire
920    :class:`scrapy.downloadermiddlewares.chunked` module; chunked transfers
921    work out of the box (:issue:`4431`)
922
923*   The ``spiders`` property has been removed from
924    :class:`~scrapy.crawler.Crawler`, use :class:`CrawlerRunner.spider_loader
925    <scrapy.crawler.CrawlerRunner.spider_loader>` or instantiate
926    :setting:`SPIDER_LOADER_CLASS` with your settings instead (:issue:`4398`)
927
928*   The ``MultiValueDict``, ``MultiValueDictKeyError``, and ``SiteNode``
929    classes have been removed from :mod:`scrapy.utils.datatypes`
930    (:issue:`4400`)
931
932
933Deprecations
934~~~~~~~~~~~~
935
936*   The ``FEED_FORMAT`` and ``FEED_URI`` settings have been deprecated in
937    favor of the new :setting:`FEEDS` setting (:issue:`1336`, :issue:`3858`,
938    :issue:`4507`)
939
940
941New features
942~~~~~~~~~~~~
943
944*   A new setting, :setting:`FEEDS`, allows configuring multiple output feeds
945    with different settings each (:issue:`1336`, :issue:`3858`, :issue:`4507`)
946
947*   The :command:`crawl` and :command:`runspider` commands now support multiple
948    ``-o`` parameters (:issue:`1336`, :issue:`3858`, :issue:`4507`)
949
950*   The :command:`crawl` and :command:`runspider` commands now support
951    specifying an output format by appending ``:<format>`` to the output file
952    (:issue:`1336`, :issue:`3858`, :issue:`4507`)
953
954*   The new :attr:`Response.ip_address <scrapy.http.Response.ip_address>`
955    attribute gives access to the IP address that originated a response
956    (:issue:`3903`, :issue:`3940`)
957
958*   A warning is now issued when a value in
959    :attr:`~scrapy.spiders.Spider.allowed_domains` includes a port
960    (:issue:`50`, :issue:`3198`, :issue:`4413`)
961
962*   Zsh completion now excludes used option aliases from the completion list
963    (:issue:`4438`)
964
965
966Bug fixes
967~~~~~~~~~
968
969*   :ref:`Request serialization <request-serialization>` no longer breaks for
970    callbacks that are spider attributes which are assigned a function with a
971    different name (:issue:`4500`)
972
973*   ``None`` values in :attr:`~scrapy.spiders.Spider.allowed_domains` no longer
974    cause a :exc:`TypeError` exception (:issue:`4410`)
975
976*   Zsh completion no longer allows options after arguments (:issue:`4438`)
977
978*   zope.interface 5.0.0 and later versions are now supported
979    (:issue:`4447`, :issue:`4448`)
980
981*   :meth:`Spider.make_requests_from_url
982    <scrapy.spiders.Spider.make_requests_from_url>`, deprecated in Scrapy
983    1.4.0, now issues a warning when used (:issue:`4412`)
984
985
986Documentation
987~~~~~~~~~~~~~
988
989*   Improved the documentation about signals that allow their handlers to
990    return a :class:`~twisted.internet.defer.Deferred` (:issue:`4295`,
991    :issue:`4390`)
992
993*   Our PyPI entry now includes links for our documentation, our source code
994    repository and our issue tracker (:issue:`4456`)
995
996*   Covered the `curl2scrapy <https://michael-shub.github.io/curl2scrapy/>`_
997    service in the documentation (:issue:`4206`, :issue:`4455`)
998
999*   Removed references to the Guppy library, which only works in Python 2
1000    (:issue:`4285`, :issue:`4343`)
1001
1002*   Extended use of InterSphinx to link to Python 3 documentation
1003    (:issue:`4444`, :issue:`4445`)
1004
1005*   Added support for Sphinx 3.0 and later (:issue:`4475`, :issue:`4480`,
1006    :issue:`4496`, :issue:`4503`)
1007
1008
1009Quality assurance
1010~~~~~~~~~~~~~~~~~
1011
1012*   Removed warnings about using old, removed settings (:issue:`4404`)
1013
1014*   Removed a warning about importing
1015    :class:`~twisted.internet.testing.StringTransport` from
1016    ``twisted.test.proto_helpers`` in Twisted 19.7.0 or newer (:issue:`4409`)
1017
1018*   Removed outdated Debian package build files (:issue:`4384`)
1019
1020*   Removed :class:`object` usage as a base class (:issue:`4430`)
1021
1022*   Removed code that added support for old versions of Twisted that we no
1023    longer support (:issue:`4472`)
1024
1025*   Fixed code style issues (:issue:`4468`, :issue:`4469`, :issue:`4471`,
1026    :issue:`4481`)
1027
1028*   Removed :func:`twisted.internet.defer.returnValue` calls (:issue:`4443`,
1029    :issue:`4446`, :issue:`4489`)
1030
1031
1032.. _release-2.0.1:
1033
1034Scrapy 2.0.1 (2020-03-18)
1035-------------------------
1036
1037*   :meth:`Response.follow_all <scrapy.http.Response.follow_all>` now supports
1038    an empty URL iterable as input (:issue:`4408`, :issue:`4420`)
1039
1040*   Removed top-level :mod:`~twisted.internet.reactor` imports to prevent
1041    errors about the wrong Twisted reactor being installed when setting a
1042    different Twisted reactor using :setting:`TWISTED_REACTOR` (:issue:`4401`,
1043    :issue:`4406`)
1044
1045*   Fixed tests (:issue:`4422`)
1046
1047
1048.. _release-2.0.0:
1049
1050Scrapy 2.0.0 (2020-03-03)
1051-------------------------
1052
1053Highlights:
1054
1055* Python 2 support has been removed
1056* :doc:`Partial <topics/coroutines>` :ref:`coroutine syntax <async>` support
1057  and :doc:`experimental <topics/asyncio>` :mod:`asyncio` support
1058* New :meth:`Response.follow_all <scrapy.http.Response.follow_all>` method
1059* :ref:`FTP support <media-pipeline-ftp>` for media pipelines
1060* New :attr:`Response.certificate <scrapy.http.Response.certificate>`
1061  attribute
1062* IPv6 support through :setting:`DNS_RESOLVER`
1063
1064Backward-incompatible changes
1065~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1066
1067*   Python 2 support has been removed, following `Python 2 end-of-life on
1068    January 1, 2020`_ (:issue:`4091`, :issue:`4114`, :issue:`4115`,
1069    :issue:`4121`, :issue:`4138`, :issue:`4231`, :issue:`4242`, :issue:`4304`,
1070    :issue:`4309`, :issue:`4373`)
1071
1072*   Retry gaveups (see :setting:`RETRY_TIMES`) are now logged as errors instead
1073    of as debug information (:issue:`3171`, :issue:`3566`)
1074
1075*   File extensions that
1076    :class:`LinkExtractor <scrapy.linkextractors.lxmlhtml.LxmlLinkExtractor>`
1077    ignores by default now also include ``7z``, ``7zip``, ``apk``, ``bz2``,
1078    ``cdr``, ``dmg``, ``ico``, ``iso``, ``tar``, ``tar.gz``, ``webm``, and
1079    ``xz`` (:issue:`1837`, :issue:`2067`, :issue:`4066`)
1080
1081*   The :setting:`METAREFRESH_IGNORE_TAGS` setting is now an empty list by
1082    default, following web browser behavior (:issue:`3844`, :issue:`4311`)
1083
1084*   The
1085    :class:`~scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware`
1086    now includes spaces after commas in the value of the ``Accept-Encoding``
1087    header that it sets, following web browser behavior (:issue:`4293`)
1088
1089*   The ``__init__`` method of custom download handlers (see
1090    :setting:`DOWNLOAD_HANDLERS`) or subclasses of the following downloader
1091    handlers  no longer receives a ``settings`` parameter:
1092
1093    *   :class:`scrapy.core.downloader.handlers.datauri.DataURIDownloadHandler`
1094
1095    *   :class:`scrapy.core.downloader.handlers.file.FileDownloadHandler`
1096
1097    Use the ``from_settings`` or ``from_crawler`` class methods to expose such
1098    a parameter to your custom download handlers.
1099
1100    (:issue:`4126`)
1101
1102*   We have refactored the :class:`scrapy.core.scheduler.Scheduler` class and
1103    related queue classes (see :setting:`SCHEDULER_PRIORITY_QUEUE`,
1104    :setting:`SCHEDULER_DISK_QUEUE` and :setting:`SCHEDULER_MEMORY_QUEUE`) to
1105    make it easier to implement custom scheduler queue classes. See
1106    :ref:`2-0-0-scheduler-queue-changes` below for details.
1107
1108*   Overridden settings are now logged in a different format. This is more in
1109    line with similar information logged at startup (:issue:`4199`)
1110
1111.. _Python 2 end-of-life on January 1, 2020: https://www.python.org/doc/sunset-python-2/
1112
1113
1114Deprecation removals
1115~~~~~~~~~~~~~~~~~~~~
1116
1117*   The :ref:`Scrapy shell <topics-shell>` no longer provides a `sel` proxy
1118    object, use :meth:`response.selector <scrapy.http.Response.selector>`
1119    instead (:issue:`4347`)
1120
1121*   LevelDB support has been removed (:issue:`4112`)
1122
1123*   The following functions have been removed from :mod:`scrapy.utils.python`:
1124    ``isbinarytext``, ``is_writable``, ``setattr_default``, ``stringify_dict``
1125    (:issue:`4362`)
1126
1127
1128Deprecations
1129~~~~~~~~~~~~
1130
1131*   Using environment variables prefixed with ``SCRAPY_`` to override settings
1132    is deprecated (:issue:`4300`, :issue:`4374`, :issue:`4375`)
1133
1134*   :class:`scrapy.linkextractors.FilteringLinkExtractor` is deprecated, use
1135    :class:`scrapy.linkextractors.LinkExtractor
1136    <scrapy.linkextractors.lxmlhtml.LxmlLinkExtractor>` instead (:issue:`4045`)
1137
1138*   The ``noconnect`` query string argument of proxy URLs is deprecated and
1139    should be removed from proxy URLs (:issue:`4198`)
1140
1141*   The :meth:`next <scrapy.utils.python.MutableChain.next>` method of
1142    :class:`scrapy.utils.python.MutableChain` is deprecated, use the global
1143    :func:`next` function or :meth:`MutableChain.__next__
1144    <scrapy.utils.python.MutableChain.__next__>` instead (:issue:`4153`)
1145
1146
1147New features
1148~~~~~~~~~~~~
1149
1150*   Added :doc:`partial support <topics/coroutines>` for Python’s
1151    :ref:`coroutine syntax <async>` and :doc:`experimental support
1152    <topics/asyncio>` for :mod:`asyncio` and :mod:`asyncio`-powered libraries
1153    (:issue:`4010`, :issue:`4259`, :issue:`4269`, :issue:`4270`, :issue:`4271`,
1154    :issue:`4316`, :issue:`4318`)
1155
1156*   The new :meth:`Response.follow_all <scrapy.http.Response.follow_all>`
1157    method offers the same functionality as
1158    :meth:`Response.follow <scrapy.http.Response.follow>` but supports an
1159    iterable of URLs as input and returns an iterable of requests
1160    (:issue:`2582`, :issue:`4057`, :issue:`4286`)
1161
1162*   :ref:`Media pipelines <topics-media-pipeline>` now support :ref:`FTP
1163    storage <media-pipeline-ftp>` (:issue:`3928`, :issue:`3961`)
1164
1165*   The new :attr:`Response.certificate <scrapy.http.Response.certificate>`
1166    attribute exposes the SSL certificate of the server as a
1167    :class:`twisted.internet.ssl.Certificate` object for HTTPS responses
1168    (:issue:`2726`, :issue:`4054`)
1169
1170*   A new :setting:`DNS_RESOLVER` setting allows enabling IPv6 support
1171    (:issue:`1031`, :issue:`4227`)
1172
1173*   A new :setting:`SCRAPER_SLOT_MAX_ACTIVE_SIZE` setting allows configuring
1174    the existing soft limit that pauses request downloads when the total
1175    response data being processed is too high (:issue:`1410`, :issue:`3551`)
1176
1177*   A new :setting:`TWISTED_REACTOR` setting allows customizing the
1178    :mod:`~twisted.internet.reactor` that Scrapy uses, allowing to
1179    :doc:`enable asyncio support <topics/asyncio>` or deal with a
1180    :ref:`common macOS issue <faq-specific-reactor>` (:issue:`2905`,
1181    :issue:`4294`)
1182
1183*   Scheduler disk and memory queues may now use the class methods
1184    ``from_crawler`` or ``from_settings`` (:issue:`3884`)
1185
1186*   The new :attr:`Response.cb_kwargs <scrapy.http.Response.cb_kwargs>`
1187    attribute serves as a shortcut for :attr:`Response.request.cb_kwargs
1188    <scrapy.http.Request.cb_kwargs>` (:issue:`4331`)
1189
1190*   :meth:`Response.follow <scrapy.http.Response.follow>` now supports a
1191    ``flags`` parameter, for consistency with :class:`~scrapy.http.Request`
1192    (:issue:`4277`, :issue:`4279`)
1193
1194*   :ref:`Item loader processors <topics-loaders-processors>` can now be
1195    regular functions, they no longer need to be methods (:issue:`3899`)
1196
1197*   :class:`~scrapy.spiders.Rule` now accepts an ``errback`` parameter
1198    (:issue:`4000`)
1199
1200*   :class:`~scrapy.http.Request` no longer requires a ``callback`` parameter
1201    when an ``errback`` parameter is specified (:issue:`3586`, :issue:`4008`)
1202
1203*   :class:`~scrapy.logformatter.LogFormatter` now supports some additional
1204    methods:
1205
1206    *   :class:`~scrapy.logformatter.LogFormatter.download_error` for
1207        download errors
1208
1209    *   :class:`~scrapy.logformatter.LogFormatter.item_error` for exceptions
1210        raised during item processing by :ref:`item pipelines
1211        <topics-item-pipeline>`
1212
1213    *   :class:`~scrapy.logformatter.LogFormatter.spider_error` for exceptions
1214        raised from :ref:`spider callbacks <topics-spiders>`
1215
1216    (:issue:`374`, :issue:`3986`, :issue:`3989`, :issue:`4176`, :issue:`4188`)
1217
1218*   The :setting:`FEED_URI` setting now supports :class:`pathlib.Path` values
1219    (:issue:`3731`, :issue:`4074`)
1220
1221*   A new :signal:`request_left_downloader` signal is sent when a request
1222    leaves the downloader (:issue:`4303`)
1223
1224*   Scrapy logs a warning when it detects a request callback or errback that
1225    uses ``yield`` but also returns a value, since the returned value would be
1226    lost (:issue:`3484`, :issue:`3869`)
1227
1228*   :class:`~scrapy.spiders.Spider` objects now raise an :exc:`AttributeError`
1229    exception if they do not have a :class:`~scrapy.spiders.Spider.start_urls`
1230    attribute nor reimplement :class:`~scrapy.spiders.Spider.start_requests`,
1231    but have a ``start_url`` attribute (:issue:`4133`, :issue:`4170`)
1232
1233*   :class:`~scrapy.exporters.BaseItemExporter` subclasses may now use
1234    ``super().__init__(**kwargs)`` instead of ``self._configure(kwargs)`` in
1235    their ``__init__`` method, passing ``dont_fail=True`` to the parent
1236    ``__init__`` method if needed, and accessing ``kwargs`` at ``self._kwargs``
1237    after calling their parent ``__init__`` method (:issue:`4193`,
1238    :issue:`4370`)
1239
1240*   A new ``keep_fragments`` parameter of
1241    :func:`scrapy.utils.request.request_fingerprint` allows to generate
1242    different fingerprints for requests with different fragments in their URL
1243    (:issue:`4104`)
1244
1245*   Download handlers (see :setting:`DOWNLOAD_HANDLERS`) may now use the
1246    ``from_settings`` and ``from_crawler`` class methods that other Scrapy
1247    components already supported (:issue:`4126`)
1248
1249*   :class:`scrapy.utils.python.MutableChain.__iter__` now returns ``self``,
1250    `allowing it to be used as a sequence <https://lgtm.com/rules/4850080/>`_
1251    (:issue:`4153`)
1252
1253
1254Bug fixes
1255~~~~~~~~~
1256
1257*   The :command:`crawl` command now also exits with exit code 1 when an
1258    exception happens before the crawling starts (:issue:`4175`, :issue:`4207`)
1259
1260*   :class:`LinkExtractor.extract_links
1261    <scrapy.linkextractors.lxmlhtml.LxmlLinkExtractor.extract_links>` no longer
1262    re-encodes the query string or URLs from non-UTF-8 responses in UTF-8
1263    (:issue:`998`, :issue:`1403`, :issue:`1949`, :issue:`4321`)
1264
1265*   The first spider middleware (see :setting:`SPIDER_MIDDLEWARES`) now also
1266    processes exceptions raised from callbacks that are generators
1267    (:issue:`4260`, :issue:`4272`)
1268
1269*   Redirects to URLs starting with 3 slashes (``///``) are now supported
1270    (:issue:`4032`, :issue:`4042`)
1271
1272*   :class:`~scrapy.http.Request` no longer accepts strings as ``url`` simply
1273    because they have a colon (:issue:`2552`, :issue:`4094`)
1274
1275*   The correct encoding is now used for attach names in
1276    :class:`~scrapy.mail.MailSender` (:issue:`4229`, :issue:`4239`)
1277
1278*   :class:`~scrapy.dupefilters.RFPDupeFilter`, the default
1279    :setting:`DUPEFILTER_CLASS`, no longer writes an extra ``\r`` character on
1280    each line in Windows, which made the size of the ``requests.seen`` file
1281    unnecessarily large on that platform (:issue:`4283`)
1282
1283*   Z shell auto-completion now looks for ``.html`` files, not ``.http`` files,
1284    and covers the ``-h`` command-line switch (:issue:`4122`, :issue:`4291`)
1285
1286*   Adding items to a :class:`scrapy.utils.datatypes.LocalCache` object
1287    without a ``limit`` defined no longer raises a :exc:`TypeError` exception
1288    (:issue:`4123`)
1289
1290*   Fixed a typo in the message of the :exc:`ValueError` exception raised when
1291    :func:`scrapy.utils.misc.create_instance` gets both ``settings`` and
1292    ``crawler`` set to ``None`` (:issue:`4128`)
1293
1294
1295Documentation
1296~~~~~~~~~~~~~
1297
1298*   API documentation now links to an online, syntax-highlighted view of the
1299    corresponding source code (:issue:`4148`)
1300
1301*   Links to unexisting documentation pages now allow access to the sidebar
1302    (:issue:`4152`, :issue:`4169`)
1303
1304*   Cross-references within our documentation now display a tooltip when
1305    hovered (:issue:`4173`, :issue:`4183`)
1306
1307*   Improved the documentation about :meth:`LinkExtractor.extract_links
1308    <scrapy.linkextractors.lxmlhtml.LxmlLinkExtractor.extract_links>` and
1309    simplified :ref:`topics-link-extractors` (:issue:`4045`)
1310
1311*   Clarified how :class:`ItemLoader.item <scrapy.loader.ItemLoader.item>`
1312    works (:issue:`3574`, :issue:`4099`)
1313
1314*   Clarified that :func:`logging.basicConfig` should not be used when also
1315    using :class:`~scrapy.crawler.CrawlerProcess` (:issue:`2149`,
1316    :issue:`2352`, :issue:`3146`, :issue:`3960`)
1317
1318*   Clarified the requirements for :class:`~scrapy.http.Request` objects
1319    :ref:`when using persistence <request-serialization>` (:issue:`4124`,
1320    :issue:`4139`)
1321
1322*   Clarified how to install a :ref:`custom image pipeline
1323    <media-pipeline-example>` (:issue:`4034`, :issue:`4252`)
1324
1325*   Fixed the signatures of the ``file_path`` method in :ref:`media pipeline
1326    <topics-media-pipeline>` examples (:issue:`4290`)
1327
1328*   Covered a backward-incompatible change in Scrapy 1.7.0 affecting custom
1329    :class:`scrapy.core.scheduler.Scheduler` subclasses (:issue:`4274`)
1330
1331*   Improved the ``README.rst`` and ``CODE_OF_CONDUCT.md`` files
1332    (:issue:`4059`)
1333
1334*   Documentation examples are now checked as part of our test suite and we
1335    have fixed some of the issues detected (:issue:`4142`, :issue:`4146`,
1336    :issue:`4171`, :issue:`4184`, :issue:`4190`)
1337
1338*   Fixed logic issues, broken links and typos (:issue:`4247`, :issue:`4258`,
1339    :issue:`4282`, :issue:`4288`, :issue:`4305`, :issue:`4308`, :issue:`4323`,
1340    :issue:`4338`, :issue:`4359`, :issue:`4361`)
1341
1342*   Improved consistency when referring to the ``__init__`` method of an object
1343    (:issue:`4086`, :issue:`4088`)
1344
1345*   Fixed an inconsistency between code and output in :ref:`intro-overview`
1346    (:issue:`4213`)
1347
1348*   Extended :mod:`~sphinx.ext.intersphinx` usage (:issue:`4147`,
1349    :issue:`4172`, :issue:`4185`, :issue:`4194`, :issue:`4197`)
1350
1351*   We now use a recent version of Python to build the documentation
1352    (:issue:`4140`, :issue:`4249`)
1353
1354*   Cleaned up documentation (:issue:`4143`, :issue:`4275`)
1355
1356
1357Quality assurance
1358~~~~~~~~~~~~~~~~~
1359
1360*   Re-enabled proxy ``CONNECT`` tests (:issue:`2545`, :issue:`4114`)
1361
1362*   Added Bandit_ security checks to our test suite (:issue:`4162`,
1363    :issue:`4181`)
1364
1365*   Added Flake8_ style checks to our test suite and applied many of the
1366    corresponding changes (:issue:`3944`, :issue:`3945`, :issue:`4137`,
1367    :issue:`4157`, :issue:`4167`, :issue:`4174`, :issue:`4186`, :issue:`4195`,
1368    :issue:`4238`, :issue:`4246`, :issue:`4355`, :issue:`4360`, :issue:`4365`)
1369
1370*   Improved test coverage (:issue:`4097`, :issue:`4218`, :issue:`4236`)
1371
1372*   Started reporting slowest tests, and improved the performance of some of
1373    them (:issue:`4163`, :issue:`4164`)
1374
1375*   Fixed broken tests and refactored some tests (:issue:`4014`, :issue:`4095`,
1376    :issue:`4244`, :issue:`4268`, :issue:`4372`)
1377
1378*   Modified the :doc:`tox <tox:index>` configuration to allow running tests
1379    with any Python version, run Bandit_ and Flake8_ tests by default, and
1380    enforce a minimum tox version programmatically (:issue:`4179`)
1381
1382*   Cleaned up code (:issue:`3937`, :issue:`4208`, :issue:`4209`,
1383    :issue:`4210`, :issue:`4212`, :issue:`4369`, :issue:`4376`, :issue:`4378`)
1384
1385.. _Bandit: https://bandit.readthedocs.io/
1386.. _Flake8: https://flake8.pycqa.org/en/latest/
1387
1388
1389.. _2-0-0-scheduler-queue-changes:
1390
1391Changes to scheduler queue classes
1392~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1393
1394The following changes may impact any custom queue classes of all types:
1395
1396*   The ``push`` method no longer receives a second positional parameter
1397    containing ``request.priority * -1``. If you need that value, get it
1398    from the first positional parameter, ``request``, instead, or use
1399    the new :meth:`~scrapy.core.scheduler.ScrapyPriorityQueue.priority`
1400    method in :class:`scrapy.core.scheduler.ScrapyPriorityQueue`
1401    subclasses.
1402
1403The following changes may impact custom priority queue classes:
1404
1405*   In the ``__init__`` method or the ``from_crawler`` or ``from_settings``
1406    class methods:
1407
1408    *   The parameter that used to contain a factory function,
1409        ``qfactory``, is now passed as a keyword parameter named
1410        ``downstream_queue_cls``.
1411
1412    *   A new keyword parameter has been added: ``key``. It is a string
1413        that is always an empty string for memory queues and indicates the
1414        :setting:`JOB_DIR` value for disk queues.
1415
1416    *   The parameter for disk queues that contains data from the previous
1417        crawl, ``startprios`` or ``slot_startprios``, is now passed as a
1418        keyword parameter named ``startprios``.
1419
1420    *   The ``serialize`` parameter is no longer passed. The disk queue
1421        class must take care of request serialization on its own before
1422        writing to disk, using the
1423        :func:`~scrapy.utils.reqser.request_to_dict` and
1424        :func:`~scrapy.utils.reqser.request_from_dict` functions from the
1425        :mod:`scrapy.utils.reqser` module.
1426
1427The following changes may impact custom disk and memory queue classes:
1428
1429*   The signature of the ``__init__`` method is now
1430    ``__init__(self, crawler, key)``.
1431
1432The following changes affect specifically the
1433:class:`~scrapy.core.scheduler.ScrapyPriorityQueue` and
1434:class:`~scrapy.core.scheduler.DownloaderAwarePriorityQueue` classes from
1435:mod:`scrapy.core.scheduler` and may affect subclasses:
1436
1437*   In the ``__init__`` method, most of the changes described above apply.
1438
1439    ``__init__`` may still receive all parameters as positional parameters,
1440    however:
1441
1442    *   ``downstream_queue_cls``, which replaced ``qfactory``, must be
1443        instantiated differently.
1444
1445        ``qfactory`` was instantiated with a priority value (integer).
1446
1447        Instances of ``downstream_queue_cls`` should be created using
1448        the new
1449        :meth:`ScrapyPriorityQueue.qfactory <scrapy.core.scheduler.ScrapyPriorityQueue.qfactory>`
1450        or
1451        :meth:`DownloaderAwarePriorityQueue.pqfactory <scrapy.core.scheduler.DownloaderAwarePriorityQueue.pqfactory>`
1452        methods.
1453
1454    *   The new ``key`` parameter displaced the ``startprios``
1455        parameter 1 position to the right.
1456
1457*   The following class attributes have been added:
1458
1459    *   :attr:`~scrapy.core.scheduler.ScrapyPriorityQueue.crawler`
1460
1461    *   :attr:`~scrapy.core.scheduler.ScrapyPriorityQueue.downstream_queue_cls`
1462        (details above)
1463
1464    *   :attr:`~scrapy.core.scheduler.ScrapyPriorityQueue.key` (details above)
1465
1466*   The ``serialize`` attribute has been removed (details above)
1467
1468The following changes affect specifically the
1469:class:`~scrapy.core.scheduler.ScrapyPriorityQueue` class and may affect
1470subclasses:
1471
1472*   A new :meth:`~scrapy.core.scheduler.ScrapyPriorityQueue.priority`
1473    method has been added which, given a request, returns
1474    ``request.priority * -1``.
1475
1476    It is used in :meth:`~scrapy.core.scheduler.ScrapyPriorityQueue.push`
1477    to make up for the removal of its ``priority`` parameter.
1478
1479*   The ``spider`` attribute has been removed. Use
1480    :attr:`crawler.spider <scrapy.core.scheduler.ScrapyPriorityQueue.crawler>`
1481    instead.
1482
1483The following changes affect specifically the
1484:class:`~scrapy.core.scheduler.DownloaderAwarePriorityQueue` class and may
1485affect subclasses:
1486
1487*   A new :attr:`~scrapy.core.scheduler.DownloaderAwarePriorityQueue.pqueues`
1488    attribute offers a mapping of downloader slot names to the
1489    corresponding instances of
1490    :attr:`~scrapy.core.scheduler.DownloaderAwarePriorityQueue.downstream_queue_cls`.
1491
1492(:issue:`3884`)
1493
1494
1495.. _release-1.8.0:
1496
1497Scrapy 1.8.0 (2019-10-28)
1498-------------------------
1499
1500Highlights:
1501
1502* Dropped Python 3.4 support and updated minimum requirements; made Python 3.8
1503  support official
1504* New :meth:`Request.from_curl <scrapy.http.Request.from_curl>` class method
1505* New :setting:`ROBOTSTXT_PARSER` and :setting:`ROBOTSTXT_USER_AGENT` settings
1506* New :setting:`DOWNLOADER_CLIENT_TLS_CIPHERS` and
1507  :setting:`DOWNLOADER_CLIENT_TLS_VERBOSE_LOGGING` settings
1508
1509Backward-incompatible changes
1510~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1511
1512*   Python 3.4 is no longer supported, and some of the minimum requirements of
1513    Scrapy have also changed:
1514
1515    *   :doc:`cssselect <cssselect:index>` 0.9.1
1516    *   cryptography_ 2.0
1517    *   lxml_ 3.5.0
1518    *   pyOpenSSL_ 16.2.0
1519    *   queuelib_ 1.4.2
1520    *   service_identity_ 16.0.0
1521    *   six_ 1.10.0
1522    *   Twisted_ 17.9.0 (16.0.0 with Python 2)
1523    *   zope.interface_ 4.1.3
1524
1525    (:issue:`3892`)
1526
1527*   ``JSONRequest`` is now called :class:`~scrapy.http.JsonRequest` for
1528    consistency with similar classes (:issue:`3929`, :issue:`3982`)
1529
1530*   If you are using a custom context factory
1531    (:setting:`DOWNLOADER_CLIENTCONTEXTFACTORY`), its ``__init__`` method must
1532    accept two new parameters: ``tls_verbose_logging`` and ``tls_ciphers``
1533    (:issue:`2111`, :issue:`3392`, :issue:`3442`, :issue:`3450`)
1534
1535*   :class:`~scrapy.loader.ItemLoader` now turns the values of its input item
1536    into lists:
1537
1538    >>> item = MyItem()
1539    >>> item['field'] = 'value1'
1540    >>> loader = ItemLoader(item=item)
1541    >>> item['field']
1542    ['value1']
1543
1544    This is needed to allow adding values to existing fields
1545    (``loader.add_value('field', 'value2')``).
1546
1547    (:issue:`3804`, :issue:`3819`, :issue:`3897`, :issue:`3976`, :issue:`3998`,
1548    :issue:`4036`)
1549
1550See also :ref:`1.8-deprecation-removals` below.
1551
1552
1553New features
1554~~~~~~~~~~~~
1555
1556*   A new :meth:`Request.from_curl <scrapy.http.Request.from_curl>` class
1557    method allows :ref:`creating a request from a cURL command
1558    <requests-from-curl>` (:issue:`2985`, :issue:`3862`)
1559
1560*   A new :setting:`ROBOTSTXT_PARSER` setting allows choosing which robots.txt_
1561    parser to use. It includes built-in support for
1562    :ref:`RobotFileParser <python-robotfileparser>`,
1563    :ref:`Protego <protego-parser>` (default), :ref:`Reppy <reppy-parser>`, and
1564    :ref:`Robotexclusionrulesparser <rerp-parser>`, and allows you to
1565    :ref:`implement support for additional parsers
1566    <support-for-new-robots-parser>` (:issue:`754`, :issue:`2669`,
1567    :issue:`3796`, :issue:`3935`, :issue:`3969`, :issue:`4006`)
1568
1569*   A new :setting:`ROBOTSTXT_USER_AGENT` setting allows defining a separate
1570    user agent string to use for robots.txt_ parsing (:issue:`3931`,
1571    :issue:`3966`)
1572
1573*   :class:`~scrapy.spiders.Rule` no longer requires a :class:`LinkExtractor
1574    <scrapy.linkextractors.lxmlhtml.LxmlLinkExtractor>` parameter
1575    (:issue:`781`, :issue:`4016`)
1576
1577*   Use the new :setting:`DOWNLOADER_CLIENT_TLS_CIPHERS` setting to customize
1578    the TLS/SSL ciphers used by the default HTTP/1.1 downloader (:issue:`3392`,
1579    :issue:`3442`)
1580
1581*   Set the new :setting:`DOWNLOADER_CLIENT_TLS_VERBOSE_LOGGING` setting to
1582    ``True`` to enable debug-level messages about TLS connection parameters
1583    after establishing HTTPS connections (:issue:`2111`, :issue:`3450`)
1584
1585*   Callbacks that receive keyword arguments
1586    (see :attr:`Request.cb_kwargs <scrapy.http.Request.cb_kwargs>`) can now be
1587    tested using the new :class:`@cb_kwargs
1588    <scrapy.contracts.default.CallbackKeywordArgumentsContract>`
1589    :ref:`spider contract <topics-contracts>` (:issue:`3985`, :issue:`3988`)
1590
1591*   When a :class:`@scrapes <scrapy.contracts.default.ScrapesContract>` spider
1592    contract fails, all missing fields are now reported (:issue:`766`,
1593    :issue:`3939`)
1594
1595*   :ref:`Custom log formats <custom-log-formats>` can now drop messages by
1596    having the corresponding methods of the configured :setting:`LOG_FORMATTER`
1597    return ``None`` (:issue:`3984`, :issue:`3987`)
1598
1599*   A much improved completion definition is now available for Zsh_
1600    (:issue:`4069`)
1601
1602
1603Bug fixes
1604~~~~~~~~~
1605
1606*   :meth:`ItemLoader.load_item() <scrapy.loader.ItemLoader.load_item>` no
1607    longer makes later calls to :meth:`ItemLoader.get_output_value()
1608    <scrapy.loader.ItemLoader.get_output_value>` or
1609    :meth:`ItemLoader.load_item() <scrapy.loader.ItemLoader.load_item>` return
1610    empty data (:issue:`3804`, :issue:`3819`, :issue:`3897`, :issue:`3976`,
1611    :issue:`3998`, :issue:`4036`)
1612
1613*   Fixed :class:`~scrapy.statscollectors.DummyStatsCollector` raising a
1614    :exc:`TypeError` exception (:issue:`4007`, :issue:`4052`)
1615
1616*   :meth:`FilesPipeline.file_path
1617    <scrapy.pipelines.files.FilesPipeline.file_path>` and
1618    :meth:`ImagesPipeline.file_path
1619    <scrapy.pipelines.images.ImagesPipeline.file_path>` no longer choose
1620    file extensions that are not `registered with IANA`_ (:issue:`1287`,
1621    :issue:`3953`, :issue:`3954`)
1622
1623*   When using botocore_ to persist files in S3, all botocore-supported headers
1624    are properly mapped now (:issue:`3904`, :issue:`3905`)
1625
1626*   FTP passwords in :setting:`FEED_URI` containing percent-escaped characters
1627    are now properly decoded (:issue:`3941`)
1628
1629*   A memory-handling and error-handling issue in
1630    :func:`scrapy.utils.ssl.get_temp_key_info` has been fixed (:issue:`3920`)
1631
1632
1633Documentation
1634~~~~~~~~~~~~~
1635
1636*   The documentation now covers how to define and configure a :ref:`custom log
1637    format <custom-log-formats>` (:issue:`3616`, :issue:`3660`)
1638
1639*   API documentation added for :class:`~scrapy.exporters.MarshalItemExporter`
1640    and :class:`~scrapy.exporters.PythonItemExporter` (:issue:`3973`)
1641
1642*   API documentation added for :class:`~scrapy.item.BaseItem` and
1643    :class:`~scrapy.item.ItemMeta` (:issue:`3999`)
1644
1645*   Minor documentation fixes (:issue:`2998`, :issue:`3398`, :issue:`3597`,
1646    :issue:`3894`, :issue:`3934`, :issue:`3978`, :issue:`3993`, :issue:`4022`,
1647    :issue:`4028`, :issue:`4033`, :issue:`4046`, :issue:`4050`, :issue:`4055`,
1648    :issue:`4056`, :issue:`4061`, :issue:`4072`, :issue:`4071`, :issue:`4079`,
1649    :issue:`4081`, :issue:`4089`, :issue:`4093`)
1650
1651
1652.. _1.8-deprecation-removals:
1653
1654Deprecation removals
1655~~~~~~~~~~~~~~~~~~~~
1656
1657*   ``scrapy.xlib`` has been removed (:issue:`4015`)
1658
1659
1660.. _1.8-deprecations:
1661
1662Deprecations
1663~~~~~~~~~~~~
1664
1665*   The LevelDB_ storage backend
1666    (``scrapy.extensions.httpcache.LeveldbCacheStorage``) of
1667    :class:`~scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware` is
1668    deprecated (:issue:`4085`, :issue:`4092`)
1669
1670*   Use of the undocumented ``SCRAPY_PICKLED_SETTINGS_TO_OVERRIDE`` environment
1671    variable is deprecated (:issue:`3910`)
1672
1673*   ``scrapy.item.DictItem`` is deprecated, use :class:`~scrapy.item.Item`
1674    instead (:issue:`3999`)
1675
1676
1677Other changes
1678~~~~~~~~~~~~~
1679
1680*   Minimum versions of optional Scrapy requirements that are covered by
1681    continuous integration tests have been updated:
1682
1683    *   botocore_ 1.3.23
1684    *   Pillow_ 3.4.2
1685
1686    Lower versions of these optional requirements may work, but it is not
1687    guaranteed (:issue:`3892`)
1688
1689*   GitHub templates for bug reports and feature requests (:issue:`3126`,
1690    :issue:`3471`, :issue:`3749`, :issue:`3754`)
1691
1692*   Continuous integration fixes (:issue:`3923`)
1693
1694*   Code cleanup (:issue:`3391`, :issue:`3907`, :issue:`3946`, :issue:`3950`,
1695    :issue:`4023`, :issue:`4031`)
1696
1697
1698.. _release-1.7.4:
1699
1700Scrapy 1.7.4 (2019-10-21)
1701-------------------------
1702
1703Revert the fix for :issue:`3804` (:issue:`3819`), which has a few undesired
1704side effects (:issue:`3897`, :issue:`3976`).
1705
1706As a result, when an item loader is initialized with an item,
1707:meth:`ItemLoader.load_item() <scrapy.loader.ItemLoader.load_item>` once again
1708makes later calls to :meth:`ItemLoader.get_output_value()
1709<scrapy.loader.ItemLoader.get_output_value>` or :meth:`ItemLoader.load_item()
1710<scrapy.loader.ItemLoader.load_item>` return empty data.
1711
1712
1713.. _release-1.7.3:
1714
1715Scrapy 1.7.3 (2019-08-01)
1716-------------------------
1717
1718Enforce lxml 4.3.5 or lower for Python 3.4 (:issue:`3912`, :issue:`3918`).
1719
1720
1721.. _release-1.7.2:
1722
1723Scrapy 1.7.2 (2019-07-23)
1724-------------------------
1725
1726Fix Python 2 support (:issue:`3889`, :issue:`3893`, :issue:`3896`).
1727
1728
1729.. _release-1.7.1:
1730
1731Scrapy 1.7.1 (2019-07-18)
1732-------------------------
1733
1734Re-packaging of Scrapy 1.7.0, which was missing some changes in PyPI.
1735
1736
1737.. _release-1.7.0:
1738
1739Scrapy 1.7.0 (2019-07-18)
1740-------------------------
1741
1742.. note:: Make sure you install Scrapy 1.7.1. The Scrapy 1.7.0 package in PyPI
1743          is the result of an erroneous commit tagging and does not include all
1744          the changes described below.
1745
1746Highlights:
1747
1748* Improvements for crawls targeting multiple domains
1749* A cleaner way to pass arguments to callbacks
1750* A new class for JSON requests
1751* Improvements for rule-based spiders
1752* New features for feed exports
1753
1754Backward-incompatible changes
1755~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1756
1757*   ``429`` is now part of the :setting:`RETRY_HTTP_CODES` setting by default
1758
1759    This change is **backward incompatible**. If you don’t want to retry
1760    ``429``, you must override :setting:`RETRY_HTTP_CODES` accordingly.
1761
1762*   :class:`~scrapy.crawler.Crawler`,
1763    :class:`CrawlerRunner.crawl <scrapy.crawler.CrawlerRunner.crawl>` and
1764    :class:`CrawlerRunner.create_crawler <scrapy.crawler.CrawlerRunner.create_crawler>`
1765    no longer accept a :class:`~scrapy.spiders.Spider` subclass instance, they
1766    only accept a :class:`~scrapy.spiders.Spider` subclass now.
1767
1768    :class:`~scrapy.spiders.Spider` subclass instances were never meant to
1769    work, and they were not working as one would expect: instead of using the
1770    passed :class:`~scrapy.spiders.Spider` subclass instance, their
1771    :class:`~scrapy.spiders.Spider.from_crawler` method was called to generate
1772    a new instance.
1773
1774*   Non-default values for the :setting:`SCHEDULER_PRIORITY_QUEUE` setting
1775    may stop working. Scheduler priority queue classes now need to handle
1776    :class:`~scrapy.http.Request` objects instead of arbitrary Python data
1777    structures.
1778
1779*   An additional ``crawler`` parameter has been added to the ``__init__``
1780    method of the :class:`~scrapy.core.scheduler.Scheduler` class. Custom
1781    scheduler subclasses which don't accept arbitrary parameters in their
1782    ``__init__`` method might break because of this change.
1783
1784    For more information, see :setting:`SCHEDULER`.
1785
1786See also :ref:`1.7-deprecation-removals` below.
1787
1788
1789New features
1790~~~~~~~~~~~~
1791
1792*   A new scheduler priority queue,
1793    ``scrapy.pqueues.DownloaderAwarePriorityQueue``, may be
1794    :ref:`enabled <broad-crawls-scheduler-priority-queue>` for a significant
1795    scheduling improvement on crawls targetting multiple web domains, at the
1796    cost of no :setting:`CONCURRENT_REQUESTS_PER_IP` support (:issue:`3520`)
1797
1798*   A new :attr:`Request.cb_kwargs <scrapy.http.Request.cb_kwargs>` attribute
1799    provides a cleaner way to pass keyword arguments to callback methods
1800    (:issue:`1138`, :issue:`3563`)
1801
1802*   A new :class:`JSONRequest <scrapy.http.JsonRequest>` class offers a more
1803    convenient way to build JSON requests (:issue:`3504`, :issue:`3505`)
1804
1805*   A ``process_request`` callback passed to the :class:`~scrapy.spiders.Rule`
1806    ``__init__`` method now receives the :class:`~scrapy.http.Response` object that
1807    originated the request as its second argument (:issue:`3682`)
1808
1809*   A new ``restrict_text`` parameter for the
1810    :attr:`LinkExtractor <scrapy.linkextractors.lxmlhtml.LxmlLinkExtractor>`
1811    ``__init__`` method allows filtering links by linking text (:issue:`3622`,
1812    :issue:`3635`)
1813
1814*   A new :setting:`FEED_STORAGE_S3_ACL` setting allows defining a custom ACL
1815    for feeds exported to Amazon S3 (:issue:`3607`)
1816
1817*   A new :setting:`FEED_STORAGE_FTP_ACTIVE` setting allows using FTP’s active
1818    connection mode for feeds exported to FTP servers (:issue:`3829`)
1819
1820*   A new :setting:`METAREFRESH_IGNORE_TAGS` setting allows overriding which
1821    HTML tags are ignored when searching a response for HTML meta tags that
1822    trigger a redirect (:issue:`1422`, :issue:`3768`)
1823
1824*   A new :reqmeta:`redirect_reasons` request meta key exposes the reason
1825    (status code, meta refresh) behind every followed redirect (:issue:`3581`,
1826    :issue:`3687`)
1827
1828*   The ``SCRAPY_CHECK`` variable is now set to the ``true`` string during runs
1829    of the :command:`check` command, which allows :ref:`detecting contract
1830    check runs from code <detecting-contract-check-runs>` (:issue:`3704`,
1831    :issue:`3739`)
1832
1833*   A new :meth:`Item.deepcopy() <scrapy.item.Item.deepcopy>` method makes it
1834    easier to :ref:`deep-copy items <copying-items>` (:issue:`1493`,
1835    :issue:`3671`)
1836
1837*   :class:`~scrapy.extensions.corestats.CoreStats` also logs
1838    ``elapsed_time_seconds`` now (:issue:`3638`)
1839
1840*   Exceptions from :class:`~scrapy.loader.ItemLoader` :ref:`input and output
1841    processors <topics-loaders-processors>` are now more verbose
1842    (:issue:`3836`, :issue:`3840`)
1843
1844*   :class:`~scrapy.crawler.Crawler`,
1845    :class:`CrawlerRunner.crawl <scrapy.crawler.CrawlerRunner.crawl>` and
1846    :class:`CrawlerRunner.create_crawler <scrapy.crawler.CrawlerRunner.create_crawler>`
1847    now fail gracefully if they receive a :class:`~scrapy.spiders.Spider`
1848    subclass instance instead of the subclass itself (:issue:`2283`,
1849    :issue:`3610`, :issue:`3872`)
1850
1851
1852Bug fixes
1853~~~~~~~~~
1854
1855*   :meth:`~scrapy.spidermiddlewares.SpiderMiddleware.process_spider_exception`
1856    is now also invoked for generators (:issue:`220`, :issue:`2061`)
1857
1858*   System exceptions like KeyboardInterrupt_ are no longer caught
1859    (:issue:`3726`)
1860
1861*   :meth:`ItemLoader.load_item() <scrapy.loader.ItemLoader.load_item>` no
1862    longer makes later calls to :meth:`ItemLoader.get_output_value()
1863    <scrapy.loader.ItemLoader.get_output_value>` or
1864    :meth:`ItemLoader.load_item() <scrapy.loader.ItemLoader.load_item>` return
1865    empty data (:issue:`3804`, :issue:`3819`)
1866
1867*   The images pipeline (:class:`~scrapy.pipelines.images.ImagesPipeline`) no
1868    longer ignores these Amazon S3 settings: :setting:`AWS_ENDPOINT_URL`,
1869    :setting:`AWS_REGION_NAME`, :setting:`AWS_USE_SSL`, :setting:`AWS_VERIFY`
1870    (:issue:`3625`)
1871
1872*   Fixed a memory leak in ``scrapy.pipelines.media.MediaPipeline`` affecting,
1873    for example, non-200 responses and exceptions from custom middlewares
1874    (:issue:`3813`)
1875
1876*   Requests with private callbacks are now correctly unserialized from disk
1877    (:issue:`3790`)
1878
1879*   :meth:`FormRequest.from_response() <scrapy.http.FormRequest.from_response>`
1880    now handles invalid methods like major web browsers (:issue:`3777`,
1881    :issue:`3794`)
1882
1883
1884Documentation
1885~~~~~~~~~~~~~
1886
1887*   A new topic, :ref:`topics-dynamic-content`, covers recommended approaches
1888    to read dynamically-loaded data (:issue:`3703`)
1889
1890*   :ref:`topics-broad-crawls` now features information about memory usage
1891    (:issue:`1264`, :issue:`3866`)
1892
1893*   The documentation of :class:`~scrapy.spiders.Rule` now covers how to access
1894    the text of a link when using :class:`~scrapy.spiders.CrawlSpider`
1895    (:issue:`3711`, :issue:`3712`)
1896
1897*   A new section, :ref:`httpcache-storage-custom`, covers writing a custom
1898    cache storage backend for
1899    :class:`~scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware`
1900    (:issue:`3683`, :issue:`3692`)
1901
1902*   A new :ref:`FAQ <faq>` entry, :ref:`faq-split-item`, explains what to do
1903    when you want to split an item into multiple items from an item pipeline
1904    (:issue:`2240`, :issue:`3672`)
1905
1906*   Updated the :ref:`FAQ entry about crawl order <faq-bfo-dfo>` to explain why
1907    the first few requests rarely follow the desired order (:issue:`1739`,
1908    :issue:`3621`)
1909
1910*   The :setting:`LOGSTATS_INTERVAL` setting (:issue:`3730`), the
1911    :meth:`FilesPipeline.file_path <scrapy.pipelines.files.FilesPipeline.file_path>`
1912    and
1913    :meth:`ImagesPipeline.file_path <scrapy.pipelines.images.ImagesPipeline.file_path>`
1914    methods (:issue:`2253`, :issue:`3609`) and the
1915    :meth:`Crawler.stop() <scrapy.crawler.Crawler.stop>` method (:issue:`3842`)
1916    are now documented
1917
1918*   Some parts of the documentation that were confusing or misleading are now
1919    clearer (:issue:`1347`, :issue:`1789`, :issue:`2289`, :issue:`3069`,
1920    :issue:`3615`, :issue:`3626`, :issue:`3668`, :issue:`3670`, :issue:`3673`,
1921    :issue:`3728`, :issue:`3762`, :issue:`3861`, :issue:`3882`)
1922
1923*   Minor documentation fixes (:issue:`3648`, :issue:`3649`, :issue:`3662`,
1924    :issue:`3674`, :issue:`3676`, :issue:`3694`, :issue:`3724`, :issue:`3764`,
1925    :issue:`3767`, :issue:`3791`, :issue:`3797`, :issue:`3806`, :issue:`3812`)
1926
1927.. _1.7-deprecation-removals:
1928
1929Deprecation removals
1930~~~~~~~~~~~~~~~~~~~~
1931
1932The following deprecated APIs have been removed (:issue:`3578`):
1933
1934*   ``scrapy.conf`` (use :attr:`Crawler.settings
1935    <scrapy.crawler.Crawler.settings>`)
1936
1937*   From ``scrapy.core.downloader.handlers``:
1938
1939    *   ``http.HttpDownloadHandler`` (use ``http10.HTTP10DownloadHandler``)
1940
1941*   ``scrapy.loader.ItemLoader._get_values`` (use ``_get_xpathvalues``)
1942
1943*   ``scrapy.loader.XPathItemLoader`` (use :class:`~scrapy.loader.ItemLoader`)
1944
1945*   ``scrapy.log`` (see :ref:`topics-logging`)
1946
1947*   From ``scrapy.pipelines``:
1948
1949    *   ``files.FilesPipeline.file_key`` (use ``file_path``)
1950
1951    *   ``images.ImagesPipeline.file_key`` (use ``file_path``)
1952
1953    *   ``images.ImagesPipeline.image_key`` (use ``file_path``)
1954
1955    *   ``images.ImagesPipeline.thumb_key`` (use ``thumb_path``)
1956
1957*   From both ``scrapy.selector`` and ``scrapy.selector.lxmlsel``:
1958
1959    *   ``HtmlXPathSelector`` (use :class:`~scrapy.selector.Selector`)
1960
1961    *   ``XmlXPathSelector`` (use :class:`~scrapy.selector.Selector`)
1962
1963    *   ``XPathSelector`` (use :class:`~scrapy.selector.Selector`)
1964
1965    *   ``XPathSelectorList`` (use :class:`~scrapy.selector.Selector`)
1966
1967*   From ``scrapy.selector.csstranslator``:
1968
1969    *   ``ScrapyGenericTranslator`` (use parsel.csstranslator.GenericTranslator_)
1970
1971    *   ``ScrapyHTMLTranslator`` (use parsel.csstranslator.HTMLTranslator_)
1972
1973    *   ``ScrapyXPathExpr`` (use parsel.csstranslator.XPathExpr_)
1974
1975*   From :class:`~scrapy.selector.Selector`:
1976
1977    *   ``_root`` (both the ``__init__`` method argument and the object property, use
1978        ``root``)
1979
1980    *   ``extract_unquoted`` (use ``getall``)
1981
1982    *   ``select`` (use ``xpath``)
1983
1984*   From :class:`~scrapy.selector.SelectorList`:
1985
1986    *   ``extract_unquoted`` (use ``getall``)
1987
1988    *   ``select`` (use ``xpath``)
1989
1990    *   ``x`` (use ``xpath``)
1991
1992*   ``scrapy.spiders.BaseSpider`` (use :class:`~scrapy.spiders.Spider`)
1993
1994*   From :class:`~scrapy.spiders.Spider` (and subclasses):
1995
1996    *   ``DOWNLOAD_DELAY`` (use :ref:`download_delay
1997        <spider-download_delay-attribute>`)
1998
1999    *   ``set_crawler`` (use :meth:`~scrapy.spiders.Spider.from_crawler`)
2000
2001*   ``scrapy.spiders.spiders`` (use :class:`~scrapy.spiderloader.SpiderLoader`)
2002
2003*   ``scrapy.telnet`` (use :mod:`scrapy.extensions.telnet`)
2004
2005*   From ``scrapy.utils.python``:
2006
2007    *   ``str_to_unicode`` (use ``to_unicode``)
2008
2009    *   ``unicode_to_str`` (use ``to_bytes``)
2010
2011*   ``scrapy.utils.response.body_or_str``
2012
2013The following deprecated settings have also been removed (:issue:`3578`):
2014
2015*   ``SPIDER_MANAGER_CLASS`` (use :setting:`SPIDER_LOADER_CLASS`)
2016
2017
2018.. _1.7-deprecations:
2019
2020Deprecations
2021~~~~~~~~~~~~
2022
2023*   The ``queuelib.PriorityQueue`` value for the
2024    :setting:`SCHEDULER_PRIORITY_QUEUE` setting is deprecated. Use
2025    ``scrapy.pqueues.ScrapyPriorityQueue`` instead.
2026
2027*   ``process_request`` callbacks passed to :class:`~scrapy.spiders.Rule` that
2028    do not accept two arguments are deprecated.
2029
2030*   The following modules are deprecated:
2031
2032    *   ``scrapy.utils.http`` (use `w3lib.http`_)
2033
2034    *   ``scrapy.utils.markup`` (use `w3lib.html`_)
2035
2036    *   ``scrapy.utils.multipart`` (use `urllib3`_)
2037
2038*   The ``scrapy.utils.datatypes.MergeDict`` class is deprecated for Python 3
2039    code bases. Use :class:`~collections.ChainMap` instead. (:issue:`3878`)
2040
2041*   The ``scrapy.utils.gz.is_gzipped`` function is deprecated. Use
2042    ``scrapy.utils.gz.gzip_magic_number`` instead.
2043
2044.. _urllib3: https://urllib3.readthedocs.io/en/latest/index.html
2045.. _w3lib.html: https://w3lib.readthedocs.io/en/latest/w3lib.html#module-w3lib.html
2046.. _w3lib.http: https://w3lib.readthedocs.io/en/latest/w3lib.html#module-w3lib.http
2047
2048
2049Other changes
2050~~~~~~~~~~~~~
2051
2052*   It is now possible to run all tests from the same tox_ environment in
2053    parallel; the documentation now covers :ref:`this and other ways to run
2054    tests <running-tests>` (:issue:`3707`)
2055
2056*   It is now possible to generate an API documentation coverage report
2057    (:issue:`3806`, :issue:`3810`, :issue:`3860`)
2058
2059*   The :ref:`documentation policies <documentation-policies>` now require
2060    docstrings_ (:issue:`3701`) that follow `PEP 257`_ (:issue:`3748`)
2061
2062*   Internal fixes and cleanup (:issue:`3629`, :issue:`3643`, :issue:`3684`,
2063    :issue:`3698`, :issue:`3734`, :issue:`3735`, :issue:`3736`, :issue:`3737`,
2064    :issue:`3809`, :issue:`3821`, :issue:`3825`, :issue:`3827`, :issue:`3833`,
2065    :issue:`3857`, :issue:`3877`)
2066
2067.. _release-1.6.0:
2068
2069Scrapy 1.6.0 (2019-01-30)
2070-------------------------
2071
2072Highlights:
2073
2074* better Windows support;
2075* Python 3.7 compatibility;
2076* big documentation improvements, including a switch
2077  from ``.extract_first()`` + ``.extract()`` API to ``.get()`` + ``.getall()``
2078  API;
2079* feed exports, FilePipeline and MediaPipeline improvements;
2080* better extensibility: :signal:`item_error` and
2081  :signal:`request_reached_downloader` signals; ``from_crawler`` support
2082  for feed exporters, feed storages and dupefilters.
2083* ``scrapy.contracts`` fixes and new features;
2084* telnet console security improvements, first released as a
2085  backport in :ref:`release-1.5.2`;
2086* clean-up of the deprecated code;
2087* various bug fixes, small new features and usability improvements across
2088  the codebase.
2089
2090Selector API changes
2091~~~~~~~~~~~~~~~~~~~~
2092
2093While these are not changes in Scrapy itself, but rather in the parsel_
2094library which Scrapy uses for xpath/css selectors, these changes are
2095worth mentioning here. Scrapy now depends on parsel >= 1.5, and
2096Scrapy documentation is updated to follow recent ``parsel`` API conventions.
2097
2098Most visible change is that ``.get()`` and ``.getall()`` selector
2099methods are now preferred over ``.extract_first()`` and ``.extract()``.
2100We feel that these new methods result in a more concise and readable code.
2101See :ref:`old-extraction-api` for more details.
2102
2103.. note::
2104    There are currently **no plans** to deprecate ``.extract()``
2105    and ``.extract_first()`` methods.
2106
2107Another useful new feature is the introduction of ``Selector.attrib`` and
2108``SelectorList.attrib`` properties, which make it easier to get
2109attributes of HTML elements. See :ref:`selecting-attributes`.
2110
2111CSS selectors are cached in parsel >= 1.5, which makes them faster
2112when the same CSS path is used many times. This is very common in
2113case of Scrapy spiders: callbacks are usually called several times,
2114on different pages.
2115
2116If you're using custom ``Selector`` or ``SelectorList`` subclasses,
2117a **backward incompatible** change in parsel may affect your code.
2118See `parsel changelog`_ for a detailed description, as well as for the
2119full list of improvements.
2120
2121.. _parsel changelog: https://parsel.readthedocs.io/en/latest/history.html
2122
2123Telnet console
2124~~~~~~~~~~~~~~
2125
2126**Backward incompatible**: Scrapy's telnet console now requires username
2127and password. See :ref:`topics-telnetconsole` for more details. This change
2128fixes a **security issue**; see :ref:`release-1.5.2` release notes for details.
2129
2130New extensibility features
2131~~~~~~~~~~~~~~~~~~~~~~~~~~
2132
2133* ``from_crawler`` support is added to feed exporters and feed storages. This,
2134  among other things, allows to access Scrapy settings from custom feed
2135  storages and exporters (:issue:`1605`, :issue:`3348`).
2136* ``from_crawler`` support is added to dupefilters (:issue:`2956`); this allows
2137  to access e.g. settings or a spider from a dupefilter.
2138* :signal:`item_error` is fired when an error happens in a pipeline
2139  (:issue:`3256`);
2140* :signal:`request_reached_downloader` is fired when Downloader gets
2141  a new Request; this signal can be useful e.g. for custom Schedulers
2142  (:issue:`3393`).
2143* new SitemapSpider :meth:`~.SitemapSpider.sitemap_filter` method which allows
2144  to select sitemap entries based on their attributes in SitemapSpider
2145  subclasses (:issue:`3512`).
2146* Lazy loading of Downloader Handlers is now optional; this enables better
2147  initialization error handling in custom Downloader Handlers (:issue:`3394`).
2148
2149New FilePipeline and MediaPipeline features
2150~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2151
2152* Expose more options for S3FilesStore: :setting:`AWS_ENDPOINT_URL`,
2153  :setting:`AWS_USE_SSL`, :setting:`AWS_VERIFY`, :setting:`AWS_REGION_NAME`.
2154  For example, this allows to use alternative or self-hosted
2155  AWS-compatible providers (:issue:`2609`, :issue:`3548`).
2156* ACL support for Google Cloud Storage: :setting:`FILES_STORE_GCS_ACL` and
2157  :setting:`IMAGES_STORE_GCS_ACL` (:issue:`3199`).
2158
2159``scrapy.contracts`` improvements
2160~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2161
2162* Exceptions in contracts code are handled better (:issue:`3377`);
2163* ``dont_filter=True`` is used for contract requests, which allows to test
2164  different callbacks with the same URL (:issue:`3381`);
2165* ``request_cls`` attribute in Contract subclasses allow to use different
2166  Request classes in contracts, for example FormRequest (:issue:`3383`).
2167* Fixed errback handling in contracts, e.g. for cases where a contract
2168  is executed for URL which returns non-200 response (:issue:`3371`).
2169
2170Usability improvements
2171~~~~~~~~~~~~~~~~~~~~~~
2172
2173* more stats for RobotsTxtMiddleware (:issue:`3100`)
2174* INFO log level is used to show telnet host/port (:issue:`3115`)
2175* a message is added to IgnoreRequest in RobotsTxtMiddleware (:issue:`3113`)
2176* better validation of ``url`` argument in ``Response.follow`` (:issue:`3131`)
2177* non-zero exit code is returned from Scrapy commands when error happens
2178  on spider initialization (:issue:`3226`)
2179* Link extraction improvements: "ftp" is added to scheme list (:issue:`3152`);
2180  "flv" is added to common video extensions (:issue:`3165`)
2181* better error message when an exporter is disabled (:issue:`3358`);
2182* ``scrapy shell --help`` mentions syntax required for local files
2183  (``./file.html``) - :issue:`3496`.
2184* Referer header value is added to RFPDupeFilter log messages (:issue:`3588`)
2185
2186Bug fixes
2187~~~~~~~~~
2188
2189* fixed issue with extra blank lines in .csv exports under Windows
2190  (:issue:`3039`);
2191* proper handling of pickling errors in Python 3 when serializing objects
2192  for disk queues (:issue:`3082`)
2193* flags are now preserved when copying Requests (:issue:`3342`);
2194* FormRequest.from_response clickdata shouldn't ignore elements with
2195  ``input[type=image]`` (:issue:`3153`).
2196* FormRequest.from_response should preserve duplicate keys (:issue:`3247`)
2197
2198Documentation improvements
2199~~~~~~~~~~~~~~~~~~~~~~~~~~
2200
2201* Docs are re-written to suggest .get/.getall API instead of
2202  .extract/.extract_first. Also, :ref:`topics-selectors` docs are updated
2203  and re-structured to match latest parsel docs; they now contain more topics,
2204  such as :ref:`selecting-attributes` or :ref:`topics-selectors-css-extensions`
2205  (:issue:`3390`).
2206* :ref:`topics-developer-tools` is a new tutorial which replaces
2207  old Firefox and Firebug tutorials (:issue:`3400`).
2208* SCRAPY_PROJECT environment variable is documented (:issue:`3518`);
2209* troubleshooting section is added to install instructions (:issue:`3517`);
2210* improved links to beginner resources in the tutorial
2211  (:issue:`3367`, :issue:`3468`);
2212* fixed :setting:`RETRY_HTTP_CODES` default values in docs (:issue:`3335`);
2213* remove unused ``DEPTH_STATS`` option from docs (:issue:`3245`);
2214* other cleanups (:issue:`3347`, :issue:`3350`, :issue:`3445`, :issue:`3544`,
2215  :issue:`3605`).
2216
2217Deprecation removals
2218~~~~~~~~~~~~~~~~~~~~
2219
2220Compatibility shims for pre-1.0 Scrapy module names are removed
2221(:issue:`3318`):
2222
2223* ``scrapy.command``
2224* ``scrapy.contrib`` (with all submodules)
2225* ``scrapy.contrib_exp`` (with all submodules)
2226* ``scrapy.dupefilter``
2227* ``scrapy.linkextractor``
2228* ``scrapy.project``
2229* ``scrapy.spider``
2230* ``scrapy.spidermanager``
2231* ``scrapy.squeue``
2232* ``scrapy.stats``
2233* ``scrapy.statscol``
2234* ``scrapy.utils.decorator``
2235
2236See :ref:`module-relocations` for more information, or use suggestions
2237from Scrapy 1.5.x deprecation warnings to update your code.
2238
2239Other deprecation removals:
2240
2241* Deprecated scrapy.interfaces.ISpiderManager is removed; please use
2242  scrapy.interfaces.ISpiderLoader.
2243* Deprecated ``CrawlerSettings`` class is removed (:issue:`3327`).
2244* Deprecated ``Settings.overrides`` and ``Settings.defaults`` attributes
2245  are removed (:issue:`3327`, :issue:`3359`).
2246
2247Other improvements, cleanups
2248~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2249
2250* All Scrapy tests now pass on Windows; Scrapy testing suite is executed
2251  in a Windows environment on CI (:issue:`3315`).
2252* Python 3.7 support (:issue:`3326`, :issue:`3150`, :issue:`3547`).
2253* Testing and CI fixes (:issue:`3526`, :issue:`3538`, :issue:`3308`,
2254  :issue:`3311`, :issue:`3309`, :issue:`3305`, :issue:`3210`, :issue:`3299`)
2255* ``scrapy.http.cookies.CookieJar.clear`` accepts "domain", "path" and "name"
2256  optional arguments (:issue:`3231`).
2257* additional files are included to sdist (:issue:`3495`);
2258* code style fixes (:issue:`3405`, :issue:`3304`);
2259* unneeded .strip() call is removed (:issue:`3519`);
2260* collections.deque is used to store MiddlewareManager methods instead
2261  of a list (:issue:`3476`)
2262
2263.. _release-1.5.2:
2264
2265Scrapy 1.5.2 (2019-01-22)
2266-------------------------
2267
2268* *Security bugfix*: Telnet console extension can be easily exploited by rogue
2269  websites POSTing content to http://localhost:6023, we haven't found a way to
2270  exploit it from Scrapy, but it is very easy to trick a browser to do so and
2271  elevates the risk for local development environment.
2272
2273  *The fix is backward incompatible*, it enables telnet user-password
2274  authentication by default with a random generated password. If you can't
2275  upgrade right away, please consider setting :setting:`TELNETCONSOLE_PORT`
2276  out of its default value.
2277
2278  See :ref:`telnet console <topics-telnetconsole>` documentation for more info
2279
2280* Backport CI build failure under GCE environment due to boto import error.
2281
2282.. _release-1.5.1:
2283
2284Scrapy 1.5.1 (2018-07-12)
2285-------------------------
2286
2287This is a maintenance release with important bug fixes, but no new features:
2288
2289* ``O(N^2)`` gzip decompression issue which affected Python 3 and PyPy
2290  is fixed (:issue:`3281`);
2291* skipping of TLS validation errors is improved (:issue:`3166`);
2292* Ctrl-C handling is fixed in Python 3.5+ (:issue:`3096`);
2293* testing fixes (:issue:`3092`, :issue:`3263`);
2294* documentation improvements (:issue:`3058`, :issue:`3059`, :issue:`3089`,
2295  :issue:`3123`, :issue:`3127`, :issue:`3189`, :issue:`3224`, :issue:`3280`,
2296  :issue:`3279`, :issue:`3201`, :issue:`3260`, :issue:`3284`, :issue:`3298`,
2297  :issue:`3294`).
2298
2299
2300.. _release-1.5.0:
2301
2302Scrapy 1.5.0 (2017-12-29)
2303-------------------------
2304
2305This release brings small new features and improvements across the codebase.
2306Some highlights:
2307
2308* Google Cloud Storage is supported in FilesPipeline and ImagesPipeline.
2309* Crawling with proxy servers becomes more efficient, as connections
2310  to proxies can be reused now.
2311* Warnings, exception and logging messages are improved to make debugging
2312  easier.
2313* ``scrapy parse`` command now allows to set custom request meta via
2314  ``--meta`` argument.
2315* Compatibility with Python 3.6, PyPy and PyPy3 is improved;
2316  PyPy and PyPy3 are now supported officially, by running tests on CI.
2317* Better default handling of HTTP 308, 522 and 524 status codes.
2318* Documentation is improved, as usual.
2319
2320Backward Incompatible Changes
2321~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2322
2323* Scrapy 1.5 drops support for Python 3.3.
2324* Default Scrapy User-Agent now uses https link to scrapy.org (:issue:`2983`).
2325  **This is technically backward-incompatible**; override
2326  :setting:`USER_AGENT` if you relied on old value.
2327* Logging of settings overridden by ``custom_settings`` is fixed;
2328  **this is technically backward-incompatible** because the logger
2329  changes from ``[scrapy.utils.log]`` to ``[scrapy.crawler]``. If you're
2330  parsing Scrapy logs, please update your log parsers (:issue:`1343`).
2331* LinkExtractor now ignores ``m4v`` extension by default, this is change
2332  in behavior.
2333* 522 and 524 status codes are added to ``RETRY_HTTP_CODES`` (:issue:`2851`)
2334
2335New features
2336~~~~~~~~~~~~
2337
2338- Support ``<link>`` tags in ``Response.follow`` (:issue:`2785`)
2339- Support for ``ptpython`` REPL (:issue:`2654`)
2340- Google Cloud Storage support for FilesPipeline and ImagesPipeline
2341  (:issue:`2923`).
2342- New ``--meta`` option of the "scrapy parse" command allows to pass additional
2343  request.meta (:issue:`2883`)
2344- Populate spider variable when using ``shell.inspect_response`` (:issue:`2812`)
2345- Handle HTTP 308 Permanent Redirect (:issue:`2844`)
2346- Add 522 and 524 to ``RETRY_HTTP_CODES`` (:issue:`2851`)
2347- Log versions information at startup (:issue:`2857`)
2348- ``scrapy.mail.MailSender`` now works in Python 3 (it requires Twisted 17.9.0)
2349- Connections to proxy servers are reused (:issue:`2743`)
2350- Add template for a downloader middleware (:issue:`2755`)
2351- Explicit message for NotImplementedError when parse callback not defined
2352  (:issue:`2831`)
2353- CrawlerProcess got an option to disable installation of root log handler
2354  (:issue:`2921`)
2355- LinkExtractor now ignores ``m4v`` extension by default
2356- Better log messages for responses over :setting:`DOWNLOAD_WARNSIZE` and
2357  :setting:`DOWNLOAD_MAXSIZE` limits (:issue:`2927`)
2358- Show warning when a URL is put to ``Spider.allowed_domains`` instead of
2359  a domain (:issue:`2250`).
2360
2361Bug fixes
2362~~~~~~~~~
2363
2364- Fix logging of settings overridden by ``custom_settings``;
2365  **this is technically backward-incompatible** because the logger
2366  changes from ``[scrapy.utils.log]`` to ``[scrapy.crawler]``, so please
2367  update your log parsers if needed (:issue:`1343`)
2368- Default Scrapy User-Agent now uses https link to scrapy.org (:issue:`2983`).
2369  **This is technically backward-incompatible**; override
2370  :setting:`USER_AGENT` if you relied on old value.
2371- Fix PyPy and PyPy3 test failures, support them officially
2372  (:issue:`2793`, :issue:`2935`, :issue:`2990`, :issue:`3050`, :issue:`2213`,
2373  :issue:`3048`)
2374- Fix DNS resolver when ``DNSCACHE_ENABLED=False`` (:issue:`2811`)
2375- Add ``cryptography`` for Debian Jessie tox test env (:issue:`2848`)
2376- Add verification to check if Request callback is callable (:issue:`2766`)
2377- Port ``extras/qpsclient.py`` to Python 3 (:issue:`2849`)
2378- Use getfullargspec under the scenes for Python 3 to stop DeprecationWarning
2379  (:issue:`2862`)
2380- Update deprecated test aliases (:issue:`2876`)
2381- Fix ``SitemapSpider`` support for alternate links (:issue:`2853`)
2382
2383Docs
2384~~~~
2385
2386- Added missing bullet point for the ``AUTOTHROTTLE_TARGET_CONCURRENCY``
2387  setting. (:issue:`2756`)
2388- Update Contributing docs, document new support channels
2389  (:issue:`2762`, issue:`3038`)
2390- Include references to Scrapy subreddit in the docs
2391- Fix broken links; use https:// for external links
2392  (:issue:`2978`, :issue:`2982`, :issue:`2958`)
2393- Document CloseSpider extension better (:issue:`2759`)
2394- Use ``pymongo.collection.Collection.insert_one()`` in MongoDB example
2395  (:issue:`2781`)
2396- Spelling mistake and typos
2397  (:issue:`2828`, :issue:`2837`, :issue:`2884`, :issue:`2924`)
2398- Clarify ``CSVFeedSpider.headers`` documentation (:issue:`2826`)
2399- Document ``DontCloseSpider`` exception and clarify ``spider_idle``
2400  (:issue:`2791`)
2401- Update "Releases" section in README (:issue:`2764`)
2402- Fix rst syntax in ``DOWNLOAD_FAIL_ON_DATALOSS`` docs (:issue:`2763`)
2403- Small fix in description of startproject arguments (:issue:`2866`)
2404- Clarify data types in Response.body docs (:issue:`2922`)
2405- Add a note about ``request.meta['depth']`` to DepthMiddleware docs (:issue:`2374`)
2406- Add a note about ``request.meta['dont_merge_cookies']`` to CookiesMiddleware
2407  docs (:issue:`2999`)
2408- Up-to-date example of project structure (:issue:`2964`, :issue:`2976`)
2409- A better example of ItemExporters usage (:issue:`2989`)
2410- Document ``from_crawler`` methods for spider and downloader middlewares
2411  (:issue:`3019`)
2412
2413.. _release-1.4.0:
2414
2415Scrapy 1.4.0 (2017-05-18)
2416-------------------------
2417
2418Scrapy 1.4 does not bring that many breathtaking new features
2419but quite a few handy improvements nonetheless.
2420
2421Scrapy now supports anonymous FTP sessions with customizable user and
2422password via the new :setting:`FTP_USER` and :setting:`FTP_PASSWORD` settings.
2423And if you're using Twisted version 17.1.0 or above, FTP is now available
2424with Python 3.
2425
2426There's a new :meth:`response.follow <scrapy.http.TextResponse.follow>` method
2427for creating requests; **it is now a recommended way to create Requests
2428in Scrapy spiders**. This method makes it easier to write correct
2429spiders; ``response.follow`` has several advantages over creating
2430``scrapy.Request`` objects directly:
2431
2432* it handles relative URLs;
2433* it works properly with non-ascii URLs on non-UTF8 pages;
2434* in addition to absolute and relative URLs it supports Selectors;
2435  for ``<a>`` elements it can also extract their href values.
2436
2437For example, instead of this::
2438
2439    for href in response.css('li.page a::attr(href)').extract():
2440        url = response.urljoin(href)
2441        yield scrapy.Request(url, self.parse, encoding=response.encoding)
2442
2443One can now write this::
2444
2445    for a in response.css('li.page a'):
2446        yield response.follow(a, self.parse)
2447
2448Link extractors are also improved. They work similarly to what a regular
2449modern browser would do: leading and trailing whitespace are removed
2450from attributes (think ``href="   http://example.com"``) when building
2451``Link`` objects. This whitespace-stripping also happens for ``action``
2452attributes with ``FormRequest``.
2453
2454**Please also note that link extractors do not canonicalize URLs by default
2455anymore.** This was puzzling users every now and then, and it's not what
2456browsers do in fact, so we removed that extra transformation on extracted
2457links.
2458
2459For those of you wanting more control on the ``Referer:`` header that Scrapy
2460sends when following links, you can set your own ``Referrer Policy``.
2461Prior to Scrapy 1.4, the default ``RefererMiddleware`` would simply and
2462blindly set it to the URL of the response that generated the HTTP request
2463(which could leak information on your URL seeds).
2464By default, Scrapy now behaves much like your regular browser does.
2465And this policy is fully customizable with W3C standard values
2466(or with something really custom of your own if you wish).
2467See :setting:`REFERRER_POLICY` for details.
2468
2469To make Scrapy spiders easier to debug, Scrapy logs more stats by default
2470in 1.4: memory usage stats, detailed retry stats, detailed HTTP error code
2471stats. A similar change is that HTTP cache path is also visible in logs now.
2472
2473Last but not least, Scrapy now has the option to make JSON and XML items
2474more human-readable, with newlines between items and even custom indenting
2475offset, using the new :setting:`FEED_EXPORT_INDENT` setting.
2476
2477Enjoy! (Or read on for the rest of changes in this release.)
2478
2479Deprecations and Backward Incompatible Changes
2480~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2481
2482- Default to ``canonicalize=False`` in
2483  :class:`scrapy.linkextractors.LinkExtractor
2484  <scrapy.linkextractors.lxmlhtml.LxmlLinkExtractor>`
2485  (:issue:`2537`, fixes :issue:`1941` and :issue:`1982`):
2486  **warning, this is technically backward-incompatible**
2487- Enable memusage extension by default (:issue:`2539`, fixes :issue:`2187`);
2488  **this is technically backward-incompatible** so please check if you have
2489  any non-default ``MEMUSAGE_***`` options set.
2490- ``EDITOR`` environment variable now takes precedence over ``EDITOR``
2491  option defined in settings.py (:issue:`1829`); Scrapy default settings
2492  no longer depend on environment variables. **This is technically a backward
2493  incompatible change**.
2494- ``Spider.make_requests_from_url`` is deprecated
2495  (:issue:`1728`, fixes :issue:`1495`).
2496
2497New Features
2498~~~~~~~~~~~~
2499
2500- Accept proxy credentials in :reqmeta:`proxy` request meta key (:issue:`2526`)
2501- Support `brotli`_-compressed content; requires optional `brotlipy`_
2502  (:issue:`2535`)
2503- New :ref:`response.follow <response-follow-example>` shortcut
2504  for creating requests (:issue:`1940`)
2505- Added ``flags`` argument and attribute to :class:`Request <scrapy.http.Request>`
2506  objects (:issue:`2047`)
2507- Support Anonymous FTP (:issue:`2342`)
2508- Added ``retry/count``, ``retry/max_reached`` and ``retry/reason_count/<reason>``
2509  stats to :class:`RetryMiddleware <scrapy.downloadermiddlewares.retry.RetryMiddleware>`
2510  (:issue:`2543`)
2511- Added ``httperror/response_ignored_count`` and ``httperror/response_ignored_status_count/<status>``
2512  stats to :class:`HttpErrorMiddleware <scrapy.spidermiddlewares.httperror.HttpErrorMiddleware>`
2513  (:issue:`2566`)
2514- Customizable :setting:`Referrer policy <REFERRER_POLICY>` in
2515  :class:`RefererMiddleware <scrapy.spidermiddlewares.referer.RefererMiddleware>`
2516  (:issue:`2306`)
2517- New ``data:`` URI download handler (:issue:`2334`, fixes :issue:`2156`)
2518- Log cache directory when HTTP Cache is used (:issue:`2611`, fixes :issue:`2604`)
2519- Warn users when project contains duplicate spider names (fixes :issue:`2181`)
2520- ``scrapy.utils.datatypes.CaselessDict`` now accepts ``Mapping`` instances and
2521  not only dicts (:issue:`2646`)
2522- :ref:`Media downloads <topics-media-pipeline>`, with
2523  :class:`~scrapy.pipelines.files.FilesPipeline` or
2524  :class:`~scrapy.pipelines.images.ImagesPipeline`, can now optionally handle
2525  HTTP redirects using the new :setting:`MEDIA_ALLOW_REDIRECTS` setting
2526  (:issue:`2616`, fixes :issue:`2004`)
2527- Accept non-complete responses from websites using a new
2528  :setting:`DOWNLOAD_FAIL_ON_DATALOSS` setting (:issue:`2590`, fixes :issue:`2586`)
2529- Optional pretty-printing of JSON and XML items via
2530  :setting:`FEED_EXPORT_INDENT` setting (:issue:`2456`, fixes :issue:`1327`)
2531- Allow dropping fields in ``FormRequest.from_response`` formdata when
2532  ``None`` value is passed (:issue:`667`)
2533- Per-request retry times with the new :reqmeta:`max_retry_times` meta key
2534  (:issue:`2642`)
2535- ``python -m scrapy`` as a more explicit alternative to ``scrapy`` command
2536  (:issue:`2740`)
2537
2538.. _brotli: https://github.com/google/brotli
2539.. _brotlipy: https://github.com/python-hyper/brotlipy/
2540
2541Bug fixes
2542~~~~~~~~~
2543
2544- LinkExtractor now strips leading and trailing whitespaces from attributes
2545  (:issue:`2547`, fixes :issue:`1614`)
2546- Properly handle whitespaces in action attribute in
2547  :class:`~scrapy.http.FormRequest` (:issue:`2548`)
2548- Buffer CONNECT response bytes from proxy until all HTTP headers are received
2549  (:issue:`2495`, fixes :issue:`2491`)
2550- FTP downloader now works on Python 3, provided you use Twisted>=17.1
2551  (:issue:`2599`)
2552- Use body to choose response type after decompressing content (:issue:`2393`,
2553  fixes :issue:`2145`)
2554- Always decompress ``Content-Encoding: gzip`` at :class:`HttpCompressionMiddleware
2555  <scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware>` stage (:issue:`2391`)
2556- Respect custom log level in ``Spider.custom_settings`` (:issue:`2581`,
2557  fixes :issue:`1612`)
2558- 'make htmlview' fix for macOS (:issue:`2661`)
2559- Remove "commands" from the command list  (:issue:`2695`)
2560- Fix duplicate Content-Length header for POST requests with empty body (:issue:`2677`)
2561- Properly cancel large downloads, i.e. above :setting:`DOWNLOAD_MAXSIZE` (:issue:`1616`)
2562- ImagesPipeline: fixed processing of transparent PNG images with palette
2563  (:issue:`2675`)
2564
2565Cleanups & Refactoring
2566~~~~~~~~~~~~~~~~~~~~~~
2567
2568- Tests: remove temp files and folders (:issue:`2570`),
2569  fixed ProjectUtilsTest on macOS (:issue:`2569`),
2570  use portable pypy for Linux on Travis CI (:issue:`2710`)
2571- Separate building request from ``_requests_to_follow`` in CrawlSpider (:issue:`2562`)
2572- Remove “Python 3 progress” badge (:issue:`2567`)
2573- Add a couple more lines to ``.gitignore`` (:issue:`2557`)
2574- Remove bumpversion prerelease configuration (:issue:`2159`)
2575- Add codecov.yml file (:issue:`2750`)
2576- Set context factory implementation based on Twisted version (:issue:`2577`,
2577  fixes :issue:`2560`)
2578- Add omitted ``self`` arguments in default project middleware template (:issue:`2595`)
2579- Remove redundant ``slot.add_request()`` call in ExecutionEngine (:issue:`2617`)
2580- Catch more specific ``os.error`` exception in
2581  ``scrapy.pipelines.files.FSFilesStore`` (:issue:`2644`)
2582- Change "localhost" test server certificate (:issue:`2720`)
2583- Remove unused ``MEMUSAGE_REPORT`` setting (:issue:`2576`)
2584
2585Documentation
2586~~~~~~~~~~~~~
2587
2588- Binary mode is required for exporters (:issue:`2564`, fixes :issue:`2553`)
2589- Mention issue with :meth:`FormRequest.from_response
2590  <scrapy.http.FormRequest.from_response>` due to bug in lxml (:issue:`2572`)
2591- Use single quotes uniformly in templates (:issue:`2596`)
2592- Document :reqmeta:`ftp_user` and :reqmeta:`ftp_password` meta keys (:issue:`2587`)
2593- Removed section on deprecated ``contrib/`` (:issue:`2636`)
2594- Recommend Anaconda when installing Scrapy on Windows
2595  (:issue:`2477`, fixes :issue:`2475`)
2596- FAQ: rewrite note on Python 3 support on Windows (:issue:`2690`)
2597- Rearrange selector sections (:issue:`2705`)
2598- Remove ``__nonzero__`` from :class:`~scrapy.selector.SelectorList`
2599  docs (:issue:`2683`)
2600- Mention how to disable request filtering in documentation of
2601  :setting:`DUPEFILTER_CLASS` setting (:issue:`2714`)
2602- Add sphinx_rtd_theme to docs setup readme (:issue:`2668`)
2603- Open file in text mode in JSON item writer example (:issue:`2729`)
2604- Clarify ``allowed_domains`` example (:issue:`2670`)
2605
2606
2607.. _release-1.3.3:
2608
2609Scrapy 1.3.3 (2017-03-10)
2610-------------------------
2611
2612Bug fixes
2613~~~~~~~~~
2614
2615- Make ``SpiderLoader`` raise ``ImportError`` again by default for missing
2616  dependencies and wrong :setting:`SPIDER_MODULES`.
2617  These exceptions were silenced as warnings since 1.3.0.
2618  A new setting is introduced to toggle between warning or exception if needed ;
2619  see :setting:`SPIDER_LOADER_WARN_ONLY` for details.
2620
2621.. _release-1.3.2:
2622
2623Scrapy 1.3.2 (2017-02-13)
2624-------------------------
2625
2626Bug fixes
2627~~~~~~~~~
2628
2629- Preserve request class when converting to/from dicts (utils.reqser) (:issue:`2510`).
2630- Use consistent selectors for author field in tutorial (:issue:`2551`).
2631- Fix TLS compatibility in Twisted 17+ (:issue:`2558`)
2632
2633.. _release-1.3.1:
2634
2635Scrapy 1.3.1 (2017-02-08)
2636-------------------------
2637
2638New features
2639~~~~~~~~~~~~
2640
2641- Support ``'True'`` and ``'False'`` string values for boolean settings (:issue:`2519`);
2642  you can now do something like ``scrapy crawl myspider -s REDIRECT_ENABLED=False``.
2643- Support kwargs with ``response.xpath()`` to use :ref:`XPath variables <topics-selectors-xpath-variables>`
2644  and ad-hoc namespaces declarations ;
2645  this requires at least Parsel v1.1 (:issue:`2457`).
2646- Add support for Python 3.6 (:issue:`2485`).
2647- Run tests on PyPy (warning: some tests still fail, so PyPy is not supported yet).
2648
2649Bug fixes
2650~~~~~~~~~
2651
2652- Enforce ``DNS_TIMEOUT`` setting (:issue:`2496`).
2653- Fix :command:`view` command ; it was a regression in v1.3.0 (:issue:`2503`).
2654- Fix tests regarding ``*_EXPIRES settings`` with Files/Images pipelines (:issue:`2460`).
2655- Fix name of generated pipeline class when using basic project template (:issue:`2466`).
2656- Fix compatibility with Twisted 17+ (:issue:`2496`, :issue:`2528`).
2657- Fix ``scrapy.Item`` inheritance on Python 3.6 (:issue:`2511`).
2658- Enforce numeric values for components order in ``SPIDER_MIDDLEWARES``,
2659  ``DOWNLOADER_MIDDLEWARES``, ``EXTENSIONS`` and ``SPIDER_CONTRACTS`` (:issue:`2420`).
2660
2661Documentation
2662~~~~~~~~~~~~~
2663
2664- Reword Code of Conduct section and upgrade to Contributor Covenant v1.4
2665  (:issue:`2469`).
2666- Clarify that passing spider arguments converts them to spider attributes
2667  (:issue:`2483`).
2668- Document ``formid`` argument on ``FormRequest.from_response()`` (:issue:`2497`).
2669- Add .rst extension to README files (:issue:`2507`).
2670- Mention LevelDB cache storage backend (:issue:`2525`).
2671- Use ``yield`` in sample callback code (:issue:`2533`).
2672- Add note about HTML entities decoding with ``.re()/.re_first()`` (:issue:`1704`).
2673- Typos (:issue:`2512`, :issue:`2534`, :issue:`2531`).
2674
2675Cleanups
2676~~~~~~~~
2677
2678- Remove redundant check in ``MetaRefreshMiddleware`` (:issue:`2542`).
2679- Faster checks in ``LinkExtractor`` for allow/deny patterns (:issue:`2538`).
2680- Remove dead code supporting old Twisted versions (:issue:`2544`).
2681
2682
2683.. _release-1.3.0:
2684
2685Scrapy 1.3.0 (2016-12-21)
2686-------------------------
2687
2688This release comes rather soon after 1.2.2 for one main reason:
2689it was found out that releases since 0.18 up to 1.2.2 (included) use
2690some backported code from Twisted (``scrapy.xlib.tx.*``),
2691even if newer Twisted modules are available.
2692Scrapy now uses ``twisted.web.client`` and ``twisted.internet.endpoints`` directly.
2693(See also cleanups below.)
2694
2695As it is a major change, we wanted to get the bug fix out quickly
2696while not breaking any projects using the 1.2 series.
2697
2698New Features
2699~~~~~~~~~~~~
2700
2701- ``MailSender`` now accepts single strings as values for ``to`` and ``cc``
2702  arguments (:issue:`2272`)
2703- ``scrapy fetch url``, ``scrapy shell url`` and ``fetch(url)`` inside
2704  Scrapy shell now follow HTTP redirections by default (:issue:`2290`);
2705  See :command:`fetch` and :command:`shell` for details.
2706- ``HttpErrorMiddleware`` now logs errors with ``INFO`` level instead of ``DEBUG``;
2707  this is technically **backward incompatible** so please check your log parsers.
2708- By default, logger names now use a long-form path, e.g. ``[scrapy.extensions.logstats]``,
2709  instead of the shorter "top-level" variant of prior releases (e.g. ``[scrapy]``);
2710  this is **backward incompatible** if you have log parsers expecting the short
2711  logger name part. You can switch back to short logger names using :setting:`LOG_SHORT_NAMES`
2712  set to ``True``.
2713
2714Dependencies & Cleanups
2715~~~~~~~~~~~~~~~~~~~~~~~
2716
2717- Scrapy now requires Twisted >= 13.1 which is the case for many Linux
2718  distributions already.
2719- As a consequence, we got rid of ``scrapy.xlib.tx.*`` modules, which
2720  copied some of Twisted code for users stuck with an "old" Twisted version
2721- ``ChunkedTransferMiddleware`` is deprecated and removed from the default
2722  downloader middlewares.
2723
2724.. _release-1.2.3:
2725
2726Scrapy 1.2.3 (2017-03-03)
2727-------------------------
2728
2729- Packaging fix: disallow unsupported Twisted versions in setup.py
2730
2731
2732.. _release-1.2.2:
2733
2734Scrapy 1.2.2 (2016-12-06)
2735-------------------------
2736
2737Bug fixes
2738~~~~~~~~~
2739
2740- Fix a cryptic traceback when a pipeline fails on ``open_spider()`` (:issue:`2011`)
2741- Fix embedded IPython shell variables (fixing :issue:`396` that re-appeared
2742  in 1.2.0, fixed in :issue:`2418`)
2743- A couple of patches when dealing with robots.txt:
2744
2745  - handle (non-standard) relative sitemap URLs (:issue:`2390`)
2746  - handle non-ASCII URLs and User-Agents in Python 2 (:issue:`2373`)
2747
2748Documentation
2749~~~~~~~~~~~~~
2750
2751- Document ``"download_latency"`` key in ``Request``'s ``meta`` dict (:issue:`2033`)
2752- Remove page on (deprecated & unsupported) Ubuntu packages from ToC (:issue:`2335`)
2753- A few fixed typos (:issue:`2346`, :issue:`2369`, :issue:`2369`, :issue:`2380`)
2754  and clarifications (:issue:`2354`, :issue:`2325`, :issue:`2414`)
2755
2756Other changes
2757~~~~~~~~~~~~~
2758
2759- Advertize `conda-forge`_ as Scrapy's official conda channel (:issue:`2387`)
2760- More helpful error messages when trying to use ``.css()`` or ``.xpath()``
2761  on non-Text Responses (:issue:`2264`)
2762- ``startproject`` command now generates a sample ``middlewares.py`` file (:issue:`2335`)
2763- Add more dependencies' version info in ``scrapy version`` verbose output (:issue:`2404`)
2764- Remove all ``*.pyc`` files from source distribution (:issue:`2386`)
2765
2766.. _conda-forge: https://anaconda.org/conda-forge/scrapy
2767
2768
2769.. _release-1.2.1:
2770
2771Scrapy 1.2.1 (2016-10-21)
2772-------------------------
2773
2774Bug fixes
2775~~~~~~~~~
2776
2777- Include OpenSSL's more permissive default ciphers when establishing
2778  TLS/SSL connections (:issue:`2314`).
2779- Fix "Location" HTTP header decoding on non-ASCII URL redirects (:issue:`2321`).
2780
2781Documentation
2782~~~~~~~~~~~~~
2783
2784- Fix JsonWriterPipeline example (:issue:`2302`).
2785- Various notes: :issue:`2330` on spider names,
2786  :issue:`2329` on middleware methods processing order,
2787  :issue:`2327` on getting multi-valued HTTP headers as lists.
2788
2789Other changes
2790~~~~~~~~~~~~~
2791
2792- Removed ``www.`` from ``start_urls`` in built-in spider templates (:issue:`2299`).
2793
2794
2795.. _release-1.2.0:
2796
2797Scrapy 1.2.0 (2016-10-03)
2798-------------------------
2799
2800New Features
2801~~~~~~~~~~~~
2802
2803- New :setting:`FEED_EXPORT_ENCODING` setting to customize the encoding
2804  used when writing items to a file.
2805  This can be used to turn off ``\uXXXX`` escapes in JSON output.
2806  This is also useful for those wanting something else than UTF-8
2807  for XML or CSV output (:issue:`2034`).
2808- ``startproject`` command now supports an optional destination directory
2809  to override the default one based on the project name (:issue:`2005`).
2810- New :setting:`SCHEDULER_DEBUG` setting to log requests serialization
2811  failures (:issue:`1610`).
2812- JSON encoder now supports serialization of ``set`` instances (:issue:`2058`).
2813- Interpret ``application/json-amazonui-streaming`` as ``TextResponse`` (:issue:`1503`).
2814- ``scrapy`` is imported by default when using shell tools (:command:`shell`,
2815  :ref:`inspect_response <topics-shell-inspect-response>`) (:issue:`2248`).
2816
2817Bug fixes
2818~~~~~~~~~
2819
2820- DefaultRequestHeaders middleware now runs before UserAgent middleware
2821  (:issue:`2088`). **Warning: this is technically backward incompatible**,
2822  though we consider this a bug fix.
2823- HTTP cache extension and plugins that use the ``.scrapy`` data directory now
2824  work outside projects (:issue:`1581`).  **Warning: this is technically
2825  backward incompatible**, though we consider this a bug fix.
2826- ``Selector`` does not allow passing both ``response`` and ``text`` anymore
2827  (:issue:`2153`).
2828- Fixed logging of wrong callback name with ``scrapy parse`` (:issue:`2169`).
2829- Fix for an odd gzip decompression bug (:issue:`1606`).
2830- Fix for selected callbacks when using ``CrawlSpider`` with :command:`scrapy parse <parse>`
2831  (:issue:`2225`).
2832- Fix for invalid JSON and XML files when spider yields no items (:issue:`872`).
2833- Implement ``flush()`` fpr ``StreamLogger`` avoiding a warning in logs (:issue:`2125`).
2834
2835Refactoring
2836~~~~~~~~~~~
2837
2838- ``canonicalize_url`` has been moved to `w3lib.url`_ (:issue:`2168`).
2839
2840.. _w3lib.url: https://w3lib.readthedocs.io/en/latest/w3lib.html#w3lib.url.canonicalize_url
2841
2842Tests & Requirements
2843~~~~~~~~~~~~~~~~~~~~
2844
2845Scrapy's new requirements baseline is Debian 8 "Jessie". It was previously
2846Ubuntu 12.04 Precise.
2847What this means in practice is that we run continuous integration tests
2848with these (main) packages versions at a minimum:
2849Twisted 14.0, pyOpenSSL 0.14, lxml 3.4.
2850
2851Scrapy may very well work with older versions of these packages
2852(the code base still has switches for older Twisted versions for example)
2853but it is not guaranteed (because it's not tested anymore).
2854
2855Documentation
2856~~~~~~~~~~~~~
2857
2858- Grammar fixes: :issue:`2128`, :issue:`1566`.
2859- Download stats badge removed from README (:issue:`2160`).
2860- New Scrapy :ref:`architecture diagram <topics-architecture>` (:issue:`2165`).
2861- Updated ``Response`` parameters documentation (:issue:`2197`).
2862- Reworded misleading :setting:`RANDOMIZE_DOWNLOAD_DELAY` description (:issue:`2190`).
2863- Add StackOverflow as a support channel (:issue:`2257`).
2864
2865.. _release-1.1.4:
2866
2867Scrapy 1.1.4 (2017-03-03)
2868-------------------------
2869
2870- Packaging fix: disallow unsupported Twisted versions in setup.py
2871
2872.. _release-1.1.3:
2873
2874Scrapy 1.1.3 (2016-09-22)
2875-------------------------
2876
2877Bug fixes
2878~~~~~~~~~
2879
2880- Class attributes for subclasses of ``ImagesPipeline`` and ``FilesPipeline``
2881  work as they did before 1.1.1 (:issue:`2243`, fixes :issue:`2198`)
2882
2883Documentation
2884~~~~~~~~~~~~~
2885
2886- :ref:`Overview <intro-overview>` and :ref:`tutorial <intro-tutorial>`
2887  rewritten to use http://toscrape.com websites
2888  (:issue:`2236`, :issue:`2249`, :issue:`2252`).
2889
2890.. _release-1.1.2:
2891
2892Scrapy 1.1.2 (2016-08-18)
2893-------------------------
2894
2895Bug fixes
2896~~~~~~~~~
2897
2898- Introduce a missing :setting:`IMAGES_STORE_S3_ACL` setting to override
2899  the default ACL policy in ``ImagesPipeline`` when uploading images to S3
2900  (note that default ACL policy is "private" -- instead of "public-read" --
2901  since Scrapy 1.1.0)
2902- :setting:`IMAGES_EXPIRES` default value set back to 90
2903  (the regression was introduced in 1.1.1)
2904
2905.. _release-1.1.1:
2906
2907Scrapy 1.1.1 (2016-07-13)
2908-------------------------
2909
2910Bug fixes
2911~~~~~~~~~
2912
2913- Add "Host" header in CONNECT requests to HTTPS proxies (:issue:`2069`)
2914- Use response ``body`` when choosing response class
2915  (:issue:`2001`, fixes :issue:`2000`)
2916- Do not fail on canonicalizing URLs with wrong netlocs
2917  (:issue:`2038`, fixes :issue:`2010`)
2918- a few fixes for ``HttpCompressionMiddleware`` (and ``SitemapSpider``):
2919
2920  - Do not decode HEAD responses (:issue:`2008`, fixes :issue:`1899`)
2921  - Handle charset parameter in gzip Content-Type header
2922    (:issue:`2050`, fixes :issue:`2049`)
2923  - Do not decompress gzip octet-stream responses
2924    (:issue:`2065`, fixes :issue:`2063`)
2925
2926- Catch (and ignore with a warning) exception when verifying certificate
2927  against IP-address hosts (:issue:`2094`, fixes :issue:`2092`)
2928- Make ``FilesPipeline`` and ``ImagesPipeline`` backward compatible again
2929  regarding the use of legacy class attributes for customization
2930  (:issue:`1989`, fixes :issue:`1985`)
2931
2932
2933New features
2934~~~~~~~~~~~~
2935
2936- Enable genspider command outside project folder (:issue:`2052`)
2937- Retry HTTPS CONNECT ``TunnelError`` by default (:issue:`1974`)
2938
2939
2940Documentation
2941~~~~~~~~~~~~~
2942
2943- ``FEED_TEMPDIR`` setting at lexicographical position (:commit:`9b3c72c`)
2944- Use idiomatic ``.extract_first()`` in overview (:issue:`1994`)
2945- Update years in copyright notice (:commit:`c2c8036`)
2946- Add information and example on errbacks (:issue:`1995`)
2947- Use "url" variable in downloader middleware example (:issue:`2015`)
2948- Grammar fixes (:issue:`2054`, :issue:`2120`)
2949- New FAQ entry on using BeautifulSoup in spider callbacks (:issue:`2048`)
2950- Add notes about Scrapy not working on Windows with Python 3 (:issue:`2060`)
2951- Encourage complete titles in pull requests (:issue:`2026`)
2952
2953Tests
2954~~~~~
2955
2956- Upgrade py.test requirement on Travis CI and Pin pytest-cov to 2.2.1 (:issue:`2095`)
2957
2958.. _release-1.1.0:
2959
2960Scrapy 1.1.0 (2016-05-11)
2961-------------------------
2962
2963This 1.1 release brings a lot of interesting features and bug fixes:
2964
2965- Scrapy 1.1 has beta Python 3 support (requires Twisted >= 15.5). See
2966  :ref:`news_betapy3` for more details and some limitations.
2967- Hot new features:
2968
2969  - Item loaders now support nested loaders (:issue:`1467`).
2970  - ``FormRequest.from_response`` improvements (:issue:`1382`, :issue:`1137`).
2971  - Added setting :setting:`AUTOTHROTTLE_TARGET_CONCURRENCY` and improved
2972    AutoThrottle docs (:issue:`1324`).
2973  - Added ``response.text`` to get body as unicode (:issue:`1730`).
2974  - Anonymous S3 connections (:issue:`1358`).
2975  - Deferreds in downloader middlewares (:issue:`1473`). This enables better
2976    robots.txt handling (:issue:`1471`).
2977  - HTTP caching now follows RFC2616 more closely, added settings
2978    :setting:`HTTPCACHE_ALWAYS_STORE` and
2979    :setting:`HTTPCACHE_IGNORE_RESPONSE_CACHE_CONTROLS` (:issue:`1151`).
2980  - Selectors were extracted to the parsel_ library (:issue:`1409`). This means
2981    you can use Scrapy Selectors without Scrapy and also upgrade the
2982    selectors engine without needing to upgrade Scrapy.
2983  - HTTPS downloader now does TLS protocol negotiation by default,
2984    instead of forcing TLS 1.0. You can also set the SSL/TLS method
2985    using the new :setting:`DOWNLOADER_CLIENT_TLS_METHOD`.
2986
2987- These bug fixes may require your attention:
2988
2989  - Don't retry bad requests (HTTP 400) by default (:issue:`1289`).
2990    If you need the old behavior, add ``400`` to :setting:`RETRY_HTTP_CODES`.
2991  - Fix shell files argument handling (:issue:`1710`, :issue:`1550`).
2992    If you try ``scrapy shell index.html`` it will try to load the URL http://index.html,
2993    use ``scrapy shell ./index.html`` to load a local file.
2994  - Robots.txt compliance is now enabled by default for newly-created projects
2995    (:issue:`1724`). Scrapy will also wait for robots.txt to be downloaded
2996    before proceeding with the crawl (:issue:`1735`). If you want to disable
2997    this behavior, update :setting:`ROBOTSTXT_OBEY` in ``settings.py`` file
2998    after creating a new project.
2999  - Exporters now work on unicode, instead of bytes by default (:issue:`1080`).
3000    If you use :class:`~scrapy.exporters.PythonItemExporter`, you may want to
3001    update your code to disable binary mode which is now deprecated.
3002  - Accept XML node names containing dots as valid (:issue:`1533`).
3003  - When uploading files or images to S3 (with ``FilesPipeline`` or
3004    ``ImagesPipeline``), the default ACL policy is now "private" instead
3005    of "public" **Warning: backward incompatible!**.
3006    You can use :setting:`FILES_STORE_S3_ACL` to change it.
3007  - We've reimplemented ``canonicalize_url()`` for more correct output,
3008    especially for URLs with non-ASCII characters (:issue:`1947`).
3009    This could change link extractors output compared to previous Scrapy versions.
3010    This may also invalidate some cache entries you could still have from pre-1.1 runs.
3011    **Warning: backward incompatible!**.
3012
3013Keep reading for more details on other improvements and bug fixes.
3014
3015.. _news_betapy3:
3016
3017Beta Python 3 Support
3018~~~~~~~~~~~~~~~~~~~~~
3019
3020We have been `hard at work to make Scrapy run on Python 3
3021<https://github.com/scrapy/scrapy/wiki/Python-3-Porting>`_. As a result, now
3022you can run spiders on Python 3.3, 3.4 and 3.5 (Twisted >= 15.5 required). Some
3023features are still missing (and some may never be ported).
3024
3025
3026Almost all builtin extensions/middlewares are expected to work.
3027However, we are aware of some limitations in Python 3:
3028
3029- Scrapy does not work on Windows with Python 3
3030- Sending emails is not supported
3031- FTP download handler is not supported
3032- Telnet console is not supported
3033
3034Additional New Features and Enhancements
3035~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3036
3037- Scrapy now has a `Code of Conduct`_ (:issue:`1681`).
3038- Command line tool now has completion for zsh (:issue:`934`).
3039- Improvements to ``scrapy shell``:
3040
3041  - Support for bpython and configure preferred Python shell via
3042    ``SCRAPY_PYTHON_SHELL`` (:issue:`1100`, :issue:`1444`).
3043  - Support URLs without scheme (:issue:`1498`)
3044    **Warning: backward incompatible!**
3045  - Bring back support for relative file path (:issue:`1710`, :issue:`1550`).
3046
3047- Added :setting:`MEMUSAGE_CHECK_INTERVAL_SECONDS` setting to change default check
3048  interval (:issue:`1282`).
3049- Download handlers are now lazy-loaded on first request using their
3050  scheme (:issue:`1390`, :issue:`1421`).
3051- HTTPS download handlers do not force TLS 1.0 anymore; instead,
3052  OpenSSL's ``SSLv23_method()/TLS_method()`` is used allowing to try
3053  negotiating with the remote hosts the highest TLS protocol version
3054  it can (:issue:`1794`, :issue:`1629`).
3055- ``RedirectMiddleware`` now skips the status codes from
3056  ``handle_httpstatus_list`` on spider attribute
3057  or in ``Request``'s ``meta`` key (:issue:`1334`, :issue:`1364`,
3058  :issue:`1447`).
3059- Form submission:
3060
3061  - now works with ``<button>`` elements too (:issue:`1469`).
3062  - an empty string is now used for submit buttons without a value
3063    (:issue:`1472`)
3064
3065- Dict-like settings now have per-key priorities
3066  (:issue:`1135`, :issue:`1149` and :issue:`1586`).
3067- Sending non-ASCII emails (:issue:`1662`)
3068- ``CloseSpider`` and ``SpiderState`` extensions now get disabled if no relevant
3069  setting is set (:issue:`1723`, :issue:`1725`).
3070- Added method ``ExecutionEngine.close`` (:issue:`1423`).
3071- Added method ``CrawlerRunner.create_crawler`` (:issue:`1528`).
3072- Scheduler priority queue can now be customized via
3073  :setting:`SCHEDULER_PRIORITY_QUEUE` (:issue:`1822`).
3074- ``.pps`` links are now ignored by default in link extractors (:issue:`1835`).
3075- temporary data folder for FTP and S3 feed storages can be customized
3076  using a new :setting:`FEED_TEMPDIR` setting (:issue:`1847`).
3077- ``FilesPipeline`` and ``ImagesPipeline`` settings are now instance attributes
3078  instead of class attributes, enabling spider-specific behaviors (:issue:`1891`).
3079- ``JsonItemExporter`` now formats opening and closing square brackets
3080  on their own line (first and last lines of output file) (:issue:`1950`).
3081- If available, ``botocore`` is used for ``S3FeedStorage``, ``S3DownloadHandler``
3082  and ``S3FilesStore`` (:issue:`1761`, :issue:`1883`).
3083- Tons of documentation updates and related fixes (:issue:`1291`, :issue:`1302`,
3084  :issue:`1335`, :issue:`1683`, :issue:`1660`, :issue:`1642`, :issue:`1721`,
3085  :issue:`1727`, :issue:`1879`).
3086- Other refactoring, optimizations and cleanup (:issue:`1476`, :issue:`1481`,
3087  :issue:`1477`, :issue:`1315`, :issue:`1290`, :issue:`1750`, :issue:`1881`).
3088
3089.. _`Code of Conduct`: https://github.com/scrapy/scrapy/blob/master/CODE_OF_CONDUCT.md
3090
3091
3092Deprecations and Removals
3093~~~~~~~~~~~~~~~~~~~~~~~~~
3094
3095- Added ``to_bytes`` and ``to_unicode``, deprecated ``str_to_unicode`` and
3096  ``unicode_to_str`` functions (:issue:`778`).
3097- ``binary_is_text`` is introduced, to replace use of ``isbinarytext``
3098  (but with inverse return value) (:issue:`1851`)
3099- The ``optional_features`` set has been removed (:issue:`1359`).
3100- The ``--lsprof`` command line option has been removed (:issue:`1689`).
3101  **Warning: backward incompatible**, but doesn't break user code.
3102- The following datatypes were deprecated (:issue:`1720`):
3103
3104  + ``scrapy.utils.datatypes.MultiValueDictKeyError``
3105  + ``scrapy.utils.datatypes.MultiValueDict``
3106  + ``scrapy.utils.datatypes.SiteNode``
3107
3108- The previously bundled ``scrapy.xlib.pydispatch`` library was deprecated and
3109  replaced by `pydispatcher <https://pypi.org/project/PyDispatcher/>`_.
3110
3111
3112Relocations
3113~~~~~~~~~~~
3114
3115- ``telnetconsole`` was relocated to ``extensions/`` (:issue:`1524`).
3116
3117  + Note: telnet is not enabled on Python 3
3118    (https://github.com/scrapy/scrapy/pull/1524#issuecomment-146985595)
3119
3120.. _parsel: https://github.com/scrapy/parsel
3121
3122
3123Bugfixes
3124~~~~~~~~
3125
3126- Scrapy does not retry requests that got a ``HTTP 400 Bad Request``
3127  response anymore (:issue:`1289`). **Warning: backward incompatible!**
3128- Support empty password for http_proxy config (:issue:`1274`).
3129- Interpret ``application/x-json`` as ``TextResponse`` (:issue:`1333`).
3130- Support link rel attribute with multiple values (:issue:`1201`).
3131- Fixed ``scrapy.http.FormRequest.from_response`` when there is a ``<base>``
3132  tag (:issue:`1564`).
3133- Fixed :setting:`TEMPLATES_DIR` handling (:issue:`1575`).
3134- Various ``FormRequest`` fixes (:issue:`1595`, :issue:`1596`, :issue:`1597`).
3135- Makes ``_monkeypatches`` more robust (:issue:`1634`).
3136- Fixed bug on ``XMLItemExporter`` with non-string fields in
3137  items (:issue:`1738`).
3138- Fixed startproject command in macOS (:issue:`1635`).
3139- Fixed :class:`~scrapy.exporters.PythonItemExporter` and CSVExporter for
3140  non-string item types (:issue:`1737`).
3141- Various logging related fixes (:issue:`1294`, :issue:`1419`, :issue:`1263`,
3142  :issue:`1624`, :issue:`1654`, :issue:`1722`, :issue:`1726` and :issue:`1303`).
3143- Fixed bug in ``utils.template.render_templatefile()`` (:issue:`1212`).
3144- sitemaps extraction from ``robots.txt`` is now case-insensitive (:issue:`1902`).
3145- HTTPS+CONNECT tunnels could get mixed up when using multiple proxies
3146  to same remote host (:issue:`1912`).
3147
3148.. _release-1.0.7:
3149
3150Scrapy 1.0.7 (2017-03-03)
3151-------------------------
3152
3153- Packaging fix: disallow unsupported Twisted versions in setup.py
3154
3155.. _release-1.0.6:
3156
3157Scrapy 1.0.6 (2016-05-04)
3158-------------------------
3159
3160- FIX: RetryMiddleware is now robust to non-standard HTTP status codes (:issue:`1857`)
3161- FIX: Filestorage HTTP cache was checking wrong modified time (:issue:`1875`)
3162- DOC: Support for Sphinx 1.4+ (:issue:`1893`)
3163- DOC: Consistency in selectors examples (:issue:`1869`)
3164
3165.. _release-1.0.5:
3166
3167Scrapy 1.0.5 (2016-02-04)
3168-------------------------
3169
3170- FIX: [Backport] Ignore bogus links in LinkExtractors (fixes :issue:`907`, :commit:`108195e`)
3171- TST: Changed buildbot makefile to use 'pytest' (:commit:`1f3d90a`)
3172- DOC: Fixed typos in tutorial and media-pipeline (:commit:`808a9ea` and :commit:`803bd87`)
3173- DOC: Add AjaxCrawlMiddleware to DOWNLOADER_MIDDLEWARES_BASE in settings docs (:commit:`aa94121`)
3174
3175.. _release-1.0.4:
3176
3177Scrapy 1.0.4 (2015-12-30)
3178-------------------------
3179
3180- Ignoring xlib/tx folder, depending on Twisted version. (:commit:`7dfa979`)
3181- Run on new travis-ci infra (:commit:`6e42f0b`)
3182- Spelling fixes (:commit:`823a1cc`)
3183- escape nodename in xmliter regex (:commit:`da3c155`)
3184- test xml nodename with dots (:commit:`4418fc3`)
3185- TST don't use broken Pillow version in tests (:commit:`a55078c`)
3186- disable log on version command. closes #1426 (:commit:`86fc330`)
3187- disable log on startproject command (:commit:`db4c9fe`)
3188- Add PyPI download stats badge (:commit:`df2b944`)
3189- don't run tests twice on Travis if a PR is made from a scrapy/scrapy branch (:commit:`a83ab41`)
3190- Add Python 3 porting status badge to the README (:commit:`73ac80d`)
3191- fixed RFPDupeFilter persistence (:commit:`97d080e`)
3192- TST a test to show that dupefilter persistence is not working (:commit:`97f2fb3`)
3193- explicit close file on file:// scheme handler (:commit:`d9b4850`)
3194- Disable dupefilter in shell (:commit:`c0d0734`)
3195- DOC: Add captions to toctrees which appear in sidebar (:commit:`aa239ad`)
3196- DOC Removed pywin32 from install instructions as it's already declared as dependency. (:commit:`10eb400`)
3197- Added installation notes about using Conda for Windows and other OSes. (:commit:`1c3600a`)
3198- Fixed minor grammar issues. (:commit:`7f4ddd5`)
3199- fixed a typo in the documentation. (:commit:`b71f677`)
3200- Version 1 now exists (:commit:`5456c0e`)
3201- fix another invalid xpath error (:commit:`0a1366e`)
3202- fix ValueError: Invalid XPath: //div/[id="not-exists"]/text() on selectors.rst (:commit:`ca8d60f`)
3203- Typos corrections (:commit:`7067117`)
3204- fix typos in downloader-middleware.rst and exceptions.rst, middlware -> middleware (:commit:`32f115c`)
3205- Add note to Ubuntu install section about Debian compatibility (:commit:`23fda69`)
3206- Replace alternative macOS install workaround with virtualenv (:commit:`98b63ee`)
3207- Reference Homebrew's homepage for installation instructions (:commit:`1925db1`)
3208- Add oldest supported tox version to contributing docs (:commit:`5d10d6d`)
3209- Note in install docs about pip being already included in python>=2.7.9 (:commit:`85c980e`)
3210- Add non-python dependencies to Ubuntu install section in the docs (:commit:`fbd010d`)
3211- Add macOS installation section to docs (:commit:`d8f4cba`)
3212- DOC(ENH): specify path to rtd theme explicitly (:commit:`de73b1a`)
3213- minor: scrapy.Spider docs grammar (:commit:`1ddcc7b`)
3214- Make common practices sample code match the comments (:commit:`1b85bcf`)
3215- nextcall repetitive calls (heartbeats). (:commit:`55f7104`)
3216- Backport fix compatibility with Twisted 15.4.0 (:commit:`b262411`)
3217- pin pytest to 2.7.3 (:commit:`a6535c2`)
3218- Merge pull request #1512 from mgedmin/patch-1 (:commit:`8876111`)
3219- Merge pull request #1513 from mgedmin/patch-2 (:commit:`5d4daf8`)
3220- Typo (:commit:`f8d0682`)
3221- Fix list formatting (:commit:`5f83a93`)
3222- fix Scrapy squeue tests after recent changes to queuelib (:commit:`3365c01`)
3223- Merge pull request #1475 from rweindl/patch-1 (:commit:`2d688cd`)
3224- Update tutorial.rst (:commit:`fbc1f25`)
3225- Merge pull request #1449 from rhoekman/patch-1 (:commit:`7d6538c`)
3226- Small grammatical change (:commit:`8752294`)
3227- Add openssl version to version command (:commit:`13c45ac`)
3228
3229.. _release-1.0.3:
3230
3231Scrapy 1.0.3 (2015-08-11)
3232-------------------------
3233
3234- add service_identity to Scrapy install_requires (:commit:`cbc2501`)
3235- Workaround for travis#296 (:commit:`66af9cd`)
3236
3237.. _release-1.0.2:
3238
3239Scrapy 1.0.2 (2015-08-06)
3240-------------------------
3241
3242- Twisted 15.3.0 does not raises PicklingError serializing lambda functions (:commit:`b04dd7d`)
3243- Minor method name fix (:commit:`6f85c7f`)
3244- minor: scrapy.Spider grammar and clarity (:commit:`9c9d2e0`)
3245- Put a blurb about support channels in CONTRIBUTING (:commit:`c63882b`)
3246- Fixed typos (:commit:`a9ae7b0`)
3247- Fix doc reference. (:commit:`7c8a4fe`)
3248
3249.. _release-1.0.1:
3250
3251Scrapy 1.0.1 (2015-07-01)
3252-------------------------
3253
3254- Unquote request path before passing to FTPClient, it already escape paths (:commit:`cc00ad2`)
3255- include tests/ to source distribution in MANIFEST.in (:commit:`eca227e`)
3256- DOC Fix SelectJmes documentation (:commit:`b8567bc`)
3257- DOC Bring Ubuntu and Archlinux outside of Windows subsection (:commit:`392233f`)
3258- DOC remove version suffix from Ubuntu package (:commit:`5303c66`)
3259- DOC Update release date for 1.0 (:commit:`c89fa29`)
3260
3261.. _release-1.0.0:
3262
3263Scrapy 1.0.0 (2015-06-19)
3264-------------------------
3265
3266You will find a lot of new features and bugfixes in this major release.  Make
3267sure to check our updated :ref:`overview <intro-overview>` to get a glance of
3268some of the changes, along with our brushed :ref:`tutorial <intro-tutorial>`.
3269
3270Support for returning dictionaries in spiders
3271~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3272
3273Declaring and returning Scrapy Items is no longer necessary to collect the
3274scraped data from your spider, you can now return explicit dictionaries
3275instead.
3276
3277*Classic version*
3278
3279::
3280
3281    class MyItem(scrapy.Item):
3282        url = scrapy.Field()
3283
3284    class MySpider(scrapy.Spider):
3285        def parse(self, response):
3286            return MyItem(url=response.url)
3287
3288*New version*
3289
3290::
3291
3292    class MySpider(scrapy.Spider):
3293        def parse(self, response):
3294            return {'url': response.url}
3295
3296Per-spider settings (GSoC 2014)
3297~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3298
3299Last Google Summer of Code project accomplished an important redesign of the
3300mechanism used for populating settings, introducing explicit priorities to
3301override any given setting. As an extension of that goal, we included a new
3302level of priority for settings that act exclusively for a single spider,
3303allowing them to redefine project settings.
3304
3305Start using it by defining a :attr:`~scrapy.spiders.Spider.custom_settings`
3306class variable in your spider::
3307
3308    class MySpider(scrapy.Spider):
3309        custom_settings = {
3310            "DOWNLOAD_DELAY": 5.0,
3311            "RETRY_ENABLED": False,
3312        }
3313
3314Read more about settings population: :ref:`topics-settings`
3315
3316Python Logging
3317~~~~~~~~~~~~~~
3318
3319Scrapy 1.0 has moved away from Twisted logging to support Python built in’s
3320as default logging system. We’re maintaining backward compatibility for most
3321of the old custom interface to call logging functions, but you’ll get
3322warnings to switch to the Python logging API entirely.
3323
3324*Old version*
3325
3326::
3327
3328    from scrapy import log
3329    log.msg('MESSAGE', log.INFO)
3330
3331*New version*
3332
3333::
3334
3335    import logging
3336    logging.info('MESSAGE')
3337
3338Logging with spiders remains the same, but on top of the
3339:meth:`~scrapy.spiders.Spider.log` method you’ll have access to a custom
3340:attr:`~scrapy.spiders.Spider.logger` created for the spider to issue log
3341events:
3342
3343::
3344
3345    class MySpider(scrapy.Spider):
3346        def parse(self, response):
3347            self.logger.info('Response received')
3348
3349Read more in the logging documentation: :ref:`topics-logging`
3350
3351Crawler API refactoring (GSoC 2014)
3352~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3353
3354Another milestone for last Google Summer of Code was a refactoring of the
3355internal API, seeking a simpler and easier usage. Check new core interface
3356in: :ref:`topics-api`
3357
3358A common situation where you will face these changes is while running Scrapy
3359from scripts. Here’s a quick example of how to run a Spider manually with the
3360new API:
3361
3362::
3363
3364    from scrapy.crawler import CrawlerProcess
3365
3366    process = CrawlerProcess({
3367        'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
3368    })
3369    process.crawl(MySpider)
3370    process.start()
3371
3372Bear in mind this feature is still under development and its API may change
3373until it reaches a stable status.
3374
3375See more examples for scripts running Scrapy: :ref:`topics-practices`
3376
3377.. _module-relocations:
3378
3379Module Relocations
3380~~~~~~~~~~~~~~~~~~
3381
3382There’s been a large rearrangement of modules trying to improve the general
3383structure of Scrapy. Main changes were separating various subpackages into
3384new projects and dissolving both ``scrapy.contrib`` and ``scrapy.contrib_exp``
3385into top level packages. Backward compatibility was kept among internal
3386relocations, while importing deprecated modules expect warnings indicating
3387their new place.
3388
3389Full list of relocations
3390************************
3391
3392Outsourced packages
3393
3394.. note::
3395    These extensions went through some minor changes, e.g. some setting names
3396    were changed. Please check the documentation in each new repository to
3397    get familiar with the new usage.
3398
3399+-------------------------------------+-------------------------------------+
3400| Old location                        | New location                        |
3401+=====================================+=====================================+
3402| scrapy.commands.deploy              | `scrapyd-client <https://github.com |
3403|                                     | /scrapy/scrapyd-client>`_           |
3404|                                     | (See other alternatives here:       |
3405|                                     | :ref:`topics-deploy`)               |
3406+-------------------------------------+-------------------------------------+
3407| scrapy.contrib.djangoitem           | `scrapy-djangoitem <https://github. |
3408|                                     | com/scrapy-plugins/scrapy-djangoite |
3409|                                     | m>`_                                |
3410+-------------------------------------+-------------------------------------+
3411| scrapy.webservice                   | `scrapy-jsonrpc <https://github.com |
3412|                                     | /scrapy-plugins/scrapy-jsonrpc>`_   |
3413+-------------------------------------+-------------------------------------+
3414
3415``scrapy.contrib_exp`` and ``scrapy.contrib`` dissolutions
3416
3417+-------------------------------------+-------------------------------------+
3418| Old location                        | New location                        |
3419+=====================================+=====================================+
3420| scrapy.contrib\_exp.downloadermidd\ | scrapy.downloadermiddlewares.decom\ |
3421| leware.decompression                | pression                            |
3422+-------------------------------------+-------------------------------------+
3423| scrapy.contrib\_exp.iterators       | scrapy.utils.iterators              |
3424+-------------------------------------+-------------------------------------+
3425| scrapy.contrib.downloadermiddleware | scrapy.downloadermiddlewares        |
3426+-------------------------------------+-------------------------------------+
3427| scrapy.contrib.exporter             | scrapy.exporters                    |
3428+-------------------------------------+-------------------------------------+
3429| scrapy.contrib.linkextractors       | scrapy.linkextractors               |
3430+-------------------------------------+-------------------------------------+
3431| scrapy.contrib.loader               | scrapy.loader                       |
3432+-------------------------------------+-------------------------------------+
3433| scrapy.contrib.loader.processor     | scrapy.loader.processors            |
3434+-------------------------------------+-------------------------------------+
3435| scrapy.contrib.pipeline             | scrapy.pipelines                    |
3436+-------------------------------------+-------------------------------------+
3437| scrapy.contrib.spidermiddleware     | scrapy.spidermiddlewares            |
3438+-------------------------------------+-------------------------------------+
3439| scrapy.contrib.spiders              | scrapy.spiders                      |
3440+-------------------------------------+-------------------------------------+
3441| * scrapy.contrib.closespider        | scrapy.extensions.\*                |
3442| * scrapy.contrib.corestats          |                                     |
3443| * scrapy.contrib.debug              |                                     |
3444| * scrapy.contrib.feedexport         |                                     |
3445| * scrapy.contrib.httpcache          |                                     |
3446| * scrapy.contrib.logstats           |                                     |
3447| * scrapy.contrib.memdebug           |                                     |
3448| * scrapy.contrib.memusage           |                                     |
3449| * scrapy.contrib.spiderstate        |                                     |
3450| * scrapy.contrib.statsmailer        |                                     |
3451| * scrapy.contrib.throttle           |                                     |
3452+-------------------------------------+-------------------------------------+
3453
3454Plural renames and Modules unification
3455
3456+-------------------------------------+-------------------------------------+
3457| Old location                        | New location                        |
3458+=====================================+=====================================+
3459| scrapy.command                      | scrapy.commands                     |
3460+-------------------------------------+-------------------------------------+
3461| scrapy.dupefilter                   | scrapy.dupefilters                  |
3462+-------------------------------------+-------------------------------------+
3463| scrapy.linkextractor                | scrapy.linkextractors               |
3464+-------------------------------------+-------------------------------------+
3465| scrapy.spider                       | scrapy.spiders                      |
3466+-------------------------------------+-------------------------------------+
3467| scrapy.squeue                       | scrapy.squeues                      |
3468+-------------------------------------+-------------------------------------+
3469| scrapy.statscol                     | scrapy.statscollectors              |
3470+-------------------------------------+-------------------------------------+
3471| scrapy.utils.decorator              | scrapy.utils.decorators             |
3472+-------------------------------------+-------------------------------------+
3473
3474Class renames
3475
3476+-------------------------------------+-------------------------------------+
3477| Old location                        | New location                        |
3478+=====================================+=====================================+
3479| scrapy.spidermanager.SpiderManager  | scrapy.spiderloader.SpiderLoader    |
3480+-------------------------------------+-------------------------------------+
3481
3482Settings renames
3483
3484+-------------------------------------+-------------------------------------+
3485| Old location                        | New location                        |
3486+=====================================+=====================================+
3487| SPIDER\_MANAGER\_CLASS              | SPIDER\_LOADER\_CLASS               |
3488+-------------------------------------+-------------------------------------+
3489
3490Changelog
3491~~~~~~~~~
3492
3493New Features and Enhancements
3494
3495- Python logging (:issue:`1060`, :issue:`1235`, :issue:`1236`, :issue:`1240`,
3496  :issue:`1259`, :issue:`1278`, :issue:`1286`)
3497- FEED_EXPORT_FIELDS option (:issue:`1159`, :issue:`1224`)
3498- Dns cache size and timeout options (:issue:`1132`)
3499- support namespace prefix in xmliter_lxml (:issue:`963`)
3500- Reactor threadpool max size setting (:issue:`1123`)
3501- Allow spiders to return dicts. (:issue:`1081`)
3502- Add Response.urljoin() helper (:issue:`1086`)
3503- look in ~/.config/scrapy.cfg for user config (:issue:`1098`)
3504- handle TLS SNI (:issue:`1101`)
3505- Selectorlist extract first (:issue:`624`, :issue:`1145`)
3506- Added JmesSelect (:issue:`1016`)
3507- add gzip compression to filesystem http cache backend (:issue:`1020`)
3508- CSS support in link extractors (:issue:`983`)
3509- httpcache dont_cache meta #19 #689 (:issue:`821`)
3510- add signal to be sent when request is dropped by the scheduler
3511  (:issue:`961`)
3512- avoid download large response (:issue:`946`)
3513- Allow to specify the quotechar in CSVFeedSpider (:issue:`882`)
3514- Add referer to "Spider error processing" log message (:issue:`795`)
3515- process robots.txt once (:issue:`896`)
3516- GSoC Per-spider settings (:issue:`854`)
3517- Add project name validation (:issue:`817`)
3518- GSoC API cleanup (:issue:`816`, :issue:`1128`, :issue:`1147`,
3519  :issue:`1148`, :issue:`1156`, :issue:`1185`, :issue:`1187`, :issue:`1258`,
3520  :issue:`1268`, :issue:`1276`, :issue:`1285`, :issue:`1284`)
3521- Be more responsive with IO operations (:issue:`1074` and :issue:`1075`)
3522- Do leveldb compaction for httpcache on closing (:issue:`1297`)
3523
3524Deprecations and Removals
3525
3526- Deprecate htmlparser link extractor (:issue:`1205`)
3527- remove deprecated code from FeedExporter (:issue:`1155`)
3528- a leftover for.15 compatibility (:issue:`925`)
3529- drop support for CONCURRENT_REQUESTS_PER_SPIDER (:issue:`895`)
3530- Drop old engine code (:issue:`911`)
3531- Deprecate SgmlLinkExtractor (:issue:`777`)
3532
3533Relocations
3534
3535- Move exporters/__init__.py to exporters.py (:issue:`1242`)
3536- Move base classes to their packages (:issue:`1218`, :issue:`1233`)
3537- Module relocation (:issue:`1181`, :issue:`1210`)
3538- rename SpiderManager to SpiderLoader (:issue:`1166`)
3539- Remove djangoitem (:issue:`1177`)
3540- remove scrapy deploy command (:issue:`1102`)
3541- dissolve contrib_exp (:issue:`1134`)
3542- Deleted bin folder from root, fixes #913 (:issue:`914`)
3543- Remove jsonrpc based webservice (:issue:`859`)
3544- Move Test cases under project root dir (:issue:`827`, :issue:`841`)
3545- Fix backward incompatibility for relocated paths in settings
3546  (:issue:`1267`)
3547
3548Documentation
3549
3550- CrawlerProcess documentation (:issue:`1190`)
3551- Favoring web scraping over screen scraping in the descriptions
3552  (:issue:`1188`)
3553- Some improvements for Scrapy tutorial (:issue:`1180`)
3554- Documenting Files Pipeline together with Images Pipeline (:issue:`1150`)
3555- deployment docs tweaks (:issue:`1164`)
3556- Added deployment section covering scrapyd-deploy and shub (:issue:`1124`)
3557- Adding more settings to project template (:issue:`1073`)
3558- some improvements to overview page (:issue:`1106`)
3559- Updated link in docs/topics/architecture.rst (:issue:`647`)
3560- DOC reorder topics (:issue:`1022`)
3561- updating list of Request.meta special keys (:issue:`1071`)
3562- DOC document download_timeout (:issue:`898`)
3563- DOC simplify extension docs (:issue:`893`)
3564- Leaks docs (:issue:`894`)
3565- DOC document from_crawler method for item pipelines (:issue:`904`)
3566- Spider_error doesn't support deferreds (:issue:`1292`)
3567- Corrections & Sphinx related fixes (:issue:`1220`, :issue:`1219`,
3568  :issue:`1196`, :issue:`1172`, :issue:`1171`, :issue:`1169`, :issue:`1160`,
3569  :issue:`1154`, :issue:`1127`, :issue:`1112`, :issue:`1105`, :issue:`1041`,
3570  :issue:`1082`, :issue:`1033`, :issue:`944`, :issue:`866`, :issue:`864`,
3571  :issue:`796`, :issue:`1260`, :issue:`1271`, :issue:`1293`, :issue:`1298`)
3572
3573Bugfixes
3574
3575- Item multi inheritance fix (:issue:`353`, :issue:`1228`)
3576- ItemLoader.load_item: iterate over copy of fields (:issue:`722`)
3577- Fix Unhandled error in Deferred (RobotsTxtMiddleware) (:issue:`1131`,
3578  :issue:`1197`)
3579- Force to read DOWNLOAD_TIMEOUT as int (:issue:`954`)
3580- scrapy.utils.misc.load_object should print full traceback (:issue:`902`)
3581- Fix bug for ".local" host name (:issue:`878`)
3582- Fix for Enabled extensions, middlewares, pipelines info not printed
3583  anymore (:issue:`879`)
3584- fix dont_merge_cookies bad behaviour when set to false on meta
3585  (:issue:`846`)
3586
3587Python 3 In Progress Support
3588
3589- disable scrapy.telnet if twisted.conch is not available (:issue:`1161`)
3590- fix Python 3 syntax errors in ajaxcrawl.py (:issue:`1162`)
3591- more python3 compatibility changes for urllib (:issue:`1121`)
3592- assertItemsEqual was renamed to assertCountEqual in Python 3.
3593  (:issue:`1070`)
3594- Import unittest.mock if available. (:issue:`1066`)
3595- updated deprecated cgi.parse_qsl to use six's parse_qsl (:issue:`909`)
3596- Prevent Python 3 port regressions (:issue:`830`)
3597- PY3: use MutableMapping for python 3 (:issue:`810`)
3598- PY3: use six.BytesIO and six.moves.cStringIO (:issue:`803`)
3599- PY3: fix xmlrpclib and email imports (:issue:`801`)
3600- PY3: use six for robotparser and urlparse (:issue:`800`)
3601- PY3: use six.iterkeys, six.iteritems, and tempfile (:issue:`799`)
3602- PY3: fix has_key and use six.moves.configparser (:issue:`798`)
3603- PY3: use six.moves.cPickle (:issue:`797`)
3604- PY3 make it possible to run some tests in Python3 (:issue:`776`)
3605
3606Tests
3607
3608- remove unnecessary lines from py3-ignores (:issue:`1243`)
3609- Fix remaining warnings from pytest while collecting tests (:issue:`1206`)
3610- Add docs build to travis (:issue:`1234`)
3611- TST don't collect tests from deprecated modules. (:issue:`1165`)
3612- install service_identity package in tests to prevent warnings
3613  (:issue:`1168`)
3614- Fix deprecated settings API in tests (:issue:`1152`)
3615- Add test for webclient with POST method and no body given (:issue:`1089`)
3616- py3-ignores.txt supports comments (:issue:`1044`)
3617- modernize some of the asserts (:issue:`835`)
3618- selector.__repr__ test (:issue:`779`)
3619
3620Code refactoring
3621
3622- CSVFeedSpider cleanup: use iterate_spider_output (:issue:`1079`)
3623- remove unnecessary check from scrapy.utils.spider.iter_spider_output
3624  (:issue:`1078`)
3625- Pydispatch pep8 (:issue:`992`)
3626- Removed unused 'load=False' parameter from walk_modules() (:issue:`871`)
3627- For consistency, use ``job_dir`` helper in ``SpiderState`` extension.
3628  (:issue:`805`)
3629- rename "sflo" local variables to less cryptic "log_observer" (:issue:`775`)
3630
3631Scrapy 0.24.6 (2015-04-20)
3632--------------------------
3633
3634- encode invalid xpath with unicode_escape under PY2 (:commit:`07cb3e5`)
3635- fix IPython shell scope issue and load IPython user config (:commit:`2c8e573`)
3636- Fix small typo in the docs (:commit:`d694019`)
3637- Fix small typo (:commit:`f92fa83`)
3638- Converted sel.xpath() calls to response.xpath() in Extracting the data (:commit:`c2c6d15`)
3639
3640
3641Scrapy 0.24.5 (2015-02-25)
3642--------------------------
3643
3644- Support new _getEndpoint Agent signatures on Twisted 15.0.0 (:commit:`540b9bc`)
3645- DOC a couple more references are fixed (:commit:`b4c454b`)
3646- DOC fix a reference (:commit:`e3c1260`)
3647- t.i.b.ThreadedResolver is now a new-style class (:commit:`9e13f42`)
3648- S3DownloadHandler: fix auth for requests with quoted paths/query params (:commit:`cdb9a0b`)
3649- fixed the variable types in mailsender documentation (:commit:`bb3a848`)
3650- Reset items_scraped instead of item_count (:commit:`edb07a4`)
3651- Tentative attention message about what document to read for contributions (:commit:`7ee6f7a`)
3652- mitmproxy 0.10.1 needs netlib 0.10.1 too (:commit:`874fcdd`)
3653- pin mitmproxy 0.10.1 as >0.11 does not work with tests (:commit:`c6b21f0`)
3654- Test the parse command locally instead of against an external url (:commit:`c3a6628`)
3655- Patches Twisted issue while closing the connection pool on HTTPDownloadHandler (:commit:`d0bf957`)
3656- Updates documentation on dynamic item classes. (:commit:`eeb589a`)
3657- Merge pull request #943 from Lazar-T/patch-3 (:commit:`5fdab02`)
3658- typo (:commit:`b0ae199`)
3659- pywin32 is required by Twisted. closes #937 (:commit:`5cb0cfb`)
3660- Update install.rst (:commit:`781286b`)
3661- Merge pull request #928 from Lazar-T/patch-1 (:commit:`b415d04`)
3662- comma instead of fullstop (:commit:`627b9ba`)
3663- Merge pull request #885 from jsma/patch-1 (:commit:`de909ad`)
3664- Update request-response.rst (:commit:`3f3263d`)
3665- SgmlLinkExtractor - fix for parsing <area> tag with Unicode present (:commit:`49b40f0`)
3666
3667Scrapy 0.24.4 (2014-08-09)
3668--------------------------
3669
3670- pem file is used by mockserver and required by scrapy bench (:commit:`5eddc68`)
3671- scrapy bench needs scrapy.tests* (:commit:`d6cb999`)
3672
3673Scrapy 0.24.3 (2014-08-09)
3674--------------------------
3675
3676- no need to waste travis-ci time on py3 for 0.24 (:commit:`8e080c1`)
3677- Update installation docs (:commit:`1d0c096`)
3678- There is a trove classifier for Scrapy framework! (:commit:`4c701d7`)
3679- update other places where w3lib version is mentioned (:commit:`d109c13`)
3680- Update w3lib requirement to 1.8.0 (:commit:`39d2ce5`)
3681- Use w3lib.html.replace_entities() (remove_entities() is deprecated) (:commit:`180d3ad`)
3682- set zip_safe=False (:commit:`a51ee8b`)
3683- do not ship tests package (:commit:`ee3b371`)
3684- scrapy.bat is not needed anymore (:commit:`c3861cf`)
3685- Modernize setup.py (:commit:`362e322`)
3686- headers can not handle non-string values (:commit:`94a5c65`)
3687- fix ftp test cases (:commit:`a274a7f`)
3688- The sum up of travis-ci builds are taking like 50min to complete (:commit:`ae1e2cc`)
3689- Update shell.rst typo (:commit:`e49c96a`)
3690- removes weird indentation in the shell results (:commit:`1ca489d`)
3691- improved explanations, clarified blog post as source, added link for XPath string functions in the spec (:commit:`65c8f05`)
3692- renamed UserTimeoutError and ServerTimeouterror #583 (:commit:`037f6ab`)
3693- adding some xpath tips to selectors docs (:commit:`2d103e0`)
3694- fix tests to account for https://github.com/scrapy/w3lib/pull/23 (:commit:`f8d366a`)
3695- get_func_args maximum recursion fix #728 (:commit:`81344ea`)
3696- Updated input/ouput processor example according to #560. (:commit:`f7c4ea8`)
3697- Fixed Python syntax in tutorial. (:commit:`db59ed9`)
3698- Add test case for tunneling proxy (:commit:`f090260`)
3699- Bugfix for leaking Proxy-Authorization header to remote host when using tunneling (:commit:`d8793af`)
3700- Extract links from XHTML documents with MIME-Type "application/xml" (:commit:`ed1f376`)
3701- Merge pull request #793 from roysc/patch-1 (:commit:`91a1106`)
3702- Fix typo in commands.rst (:commit:`743e1e2`)
3703- better testcase for settings.overrides.setdefault (:commit:`e22daaf`)
3704- Using CRLF as line marker according to http 1.1 definition (:commit:`5ec430b`)
3705
3706Scrapy 0.24.2 (2014-07-08)
3707--------------------------
3708
3709- Use a mutable mapping to proxy deprecated settings.overrides and settings.defaults attribute (:commit:`e5e8133`)
3710- there is not support for python3 yet (:commit:`3cd6146`)
3711- Update python compatible version set to Debian packages (:commit:`fa5d76b`)
3712- DOC fix formatting in release notes (:commit:`c6a9e20`)
3713
3714Scrapy 0.24.1 (2014-06-27)
3715--------------------------
3716
3717- Fix deprecated CrawlerSettings and increase backward compatibility with
3718  .defaults attribute (:commit:`8e3f20a`)
3719
3720
3721Scrapy 0.24.0 (2014-06-26)
3722--------------------------
3723
3724Enhancements
3725~~~~~~~~~~~~
3726
3727- Improve Scrapy top-level namespace (:issue:`494`, :issue:`684`)
3728- Add selector shortcuts to responses (:issue:`554`, :issue:`690`)
3729- Add new lxml based LinkExtractor to replace unmaintained SgmlLinkExtractor
3730  (:issue:`559`, :issue:`761`, :issue:`763`)
3731- Cleanup settings API - part of per-spider settings **GSoC project** (:issue:`737`)
3732- Add UTF8 encoding header to templates (:issue:`688`, :issue:`762`)
3733- Telnet console now binds to 127.0.0.1 by default (:issue:`699`)
3734- Update Debian/Ubuntu install instructions (:issue:`509`, :issue:`549`)
3735- Disable smart strings in lxml XPath evaluations (:issue:`535`)
3736- Restore filesystem based cache as default for http
3737  cache middleware (:issue:`541`, :issue:`500`, :issue:`571`)
3738- Expose current crawler in Scrapy shell (:issue:`557`)
3739- Improve testsuite comparing CSV and XML exporters (:issue:`570`)
3740- New ``offsite/filtered`` and ``offsite/domains`` stats (:issue:`566`)
3741- Support process_links as generator in CrawlSpider (:issue:`555`)
3742- Verbose logging and new stats counters for DupeFilter (:issue:`553`)
3743- Add a mimetype parameter to ``MailSender.send()`` (:issue:`602`)
3744- Generalize file pipeline log messages (:issue:`622`)
3745- Replace unencodeable codepoints with html entities in SGMLLinkExtractor (:issue:`565`)
3746- Converted SEP documents to rst format (:issue:`629`, :issue:`630`,
3747  :issue:`638`, :issue:`632`, :issue:`636`, :issue:`640`, :issue:`635`,
3748  :issue:`634`, :issue:`639`, :issue:`637`, :issue:`631`, :issue:`633`,
3749  :issue:`641`, :issue:`642`)
3750- Tests and docs for clickdata's nr index in FormRequest (:issue:`646`, :issue:`645`)
3751- Allow to disable a downloader handler just like any other component (:issue:`650`)
3752- Log when a request is discarded after too many redirections (:issue:`654`)
3753- Log error responses if they are not handled by spider callbacks
3754  (:issue:`612`, :issue:`656`)
3755- Add content-type check to http compression mw (:issue:`193`, :issue:`660`)
3756- Run pypy tests using latest pypi from ppa (:issue:`674`)
3757- Run test suite using pytest instead of trial (:issue:`679`)
3758- Build docs and check for dead links in tox environment (:issue:`687`)
3759- Make scrapy.version_info a tuple of integers (:issue:`681`, :issue:`692`)
3760- Infer exporter's output format from filename extensions
3761  (:issue:`546`, :issue:`659`, :issue:`760`)
3762- Support case-insensitive domains in ``url_is_from_any_domain()`` (:issue:`693`)
3763- Remove pep8 warnings in project and spider templates (:issue:`698`)
3764- Tests and docs for ``request_fingerprint`` function (:issue:`597`)
3765- Update SEP-19 for GSoC project ``per-spider settings`` (:issue:`705`)
3766- Set exit code to non-zero when contracts fails (:issue:`727`)
3767- Add a setting to control what class is instantiated as Downloader component
3768  (:issue:`738`)
3769- Pass response in ``item_dropped`` signal (:issue:`724`)
3770- Improve ``scrapy check`` contracts command (:issue:`733`, :issue:`752`)
3771- Document ``spider.closed()`` shortcut (:issue:`719`)
3772- Document ``request_scheduled`` signal (:issue:`746`)
3773- Add a note about reporting security issues (:issue:`697`)
3774- Add LevelDB http cache storage backend (:issue:`626`, :issue:`500`)
3775- Sort spider list output of ``scrapy list`` command (:issue:`742`)
3776- Multiple documentation enhancements and fixes
3777  (:issue:`575`, :issue:`587`, :issue:`590`, :issue:`596`, :issue:`610`,
3778  :issue:`617`, :issue:`618`, :issue:`627`, :issue:`613`, :issue:`643`,
3779  :issue:`654`, :issue:`675`, :issue:`663`, :issue:`711`, :issue:`714`)
3780
3781Bugfixes
3782~~~~~~~~
3783
3784- Encode unicode URL value when creating Links in RegexLinkExtractor (:issue:`561`)
3785- Ignore None values in ItemLoader processors (:issue:`556`)
3786- Fix link text when there is an inner tag in SGMLLinkExtractor and
3787  HtmlParserLinkExtractor (:issue:`485`, :issue:`574`)
3788- Fix wrong checks on subclassing of deprecated classes
3789  (:issue:`581`, :issue:`584`)
3790- Handle errors caused by inspect.stack() failures (:issue:`582`)
3791- Fix a reference to unexistent engine attribute (:issue:`593`, :issue:`594`)
3792- Fix dynamic itemclass example usage of type() (:issue:`603`)
3793- Use lucasdemarchi/codespell to fix typos (:issue:`628`)
3794- Fix default value of attrs argument in SgmlLinkExtractor to be tuple (:issue:`661`)
3795- Fix XXE flaw in sitemap reader (:issue:`676`)
3796- Fix engine to support filtered start requests (:issue:`707`)
3797- Fix offsite middleware case on urls with no hostnames (:issue:`745`)
3798- Testsuite doesn't require PIL anymore (:issue:`585`)
3799
3800
3801Scrapy 0.22.2 (released 2014-02-14)
3802-----------------------------------
3803
3804- fix a reference to unexistent engine.slots. closes #593 (:commit:`13c099a`)
3805- downloaderMW doc typo (spiderMW doc copy remnant) (:commit:`8ae11bf`)
3806- Correct typos (:commit:`1346037`)
3807
3808Scrapy 0.22.1 (released 2014-02-08)
3809-----------------------------------
3810
3811- localhost666 can resolve under certain circumstances (:commit:`2ec2279`)
3812- test inspect.stack failure (:commit:`cc3eda3`)
3813- Handle cases when inspect.stack() fails (:commit:`8cb44f9`)
3814- Fix wrong checks on subclassing of deprecated classes. closes #581 (:commit:`46d98d6`)
3815- Docs: 4-space indent for final spider example (:commit:`13846de`)
3816- Fix HtmlParserLinkExtractor and tests after #485 merge (:commit:`368a946`)
3817- BaseSgmlLinkExtractor: Fixed the missing space when the link has an inner tag (:commit:`b566388`)
3818- BaseSgmlLinkExtractor: Added unit test of a link with an inner tag (:commit:`c1cb418`)
3819- BaseSgmlLinkExtractor: Fixed unknown_endtag() so that it only set current_link=None when the end tag match the opening tag (:commit:`7e4d627`)
3820- Fix tests for Travis-CI build (:commit:`76c7e20`)
3821- replace unencodable codepoints with html entities. fixes #562 and #285 (:commit:`5f87b17`)
3822- RegexLinkExtractor: encode URL unicode value when creating Links (:commit:`d0ee545`)
3823- Updated the tutorial crawl output with latest output. (:commit:`8da65de`)
3824- Updated shell docs with the crawler reference and fixed the actual shell output. (:commit:`875b9ab`)
3825- PEP8 minor edits. (:commit:`f89efaf`)
3826- Expose current crawler in the Scrapy shell. (:commit:`5349cec`)
3827- Unused re import and PEP8 minor edits. (:commit:`387f414`)
3828- Ignore None's values when using the ItemLoader. (:commit:`0632546`)
3829- DOC Fixed HTTPCACHE_STORAGE typo in the default value which is now Filesystem instead Dbm. (:commit:`cde9a8c`)
3830- show Ubuntu setup instructions as literal code (:commit:`fb5c9c5`)
3831- Update Ubuntu installation instructions (:commit:`70fb105`)
3832- Merge pull request #550 from stray-leone/patch-1 (:commit:`6f70b6a`)
3833- modify the version of Scrapy Ubuntu package (:commit:`725900d`)
3834- fix 0.22.0 release date (:commit:`af0219a`)
3835- fix typos in news.rst and remove (not released yet) header (:commit:`b7f58f4`)
3836
3837Scrapy 0.22.0 (released 2014-01-17)
3838-----------------------------------
3839
3840Enhancements
3841~~~~~~~~~~~~
3842
3843- [**Backward incompatible**] Switched HTTPCacheMiddleware backend to filesystem (:issue:`541`)
3844  To restore old backend set ``HTTPCACHE_STORAGE`` to ``scrapy.contrib.httpcache.DbmCacheStorage``
3845- Proxy \https:// urls using CONNECT method (:issue:`392`, :issue:`397`)
3846- Add a middleware to crawl ajax crawleable pages as defined by google (:issue:`343`)
3847- Rename scrapy.spider.BaseSpider to scrapy.spider.Spider (:issue:`510`, :issue:`519`)
3848- Selectors register EXSLT namespaces by default (:issue:`472`)
3849- Unify item loaders similar to selectors renaming (:issue:`461`)
3850- Make ``RFPDupeFilter`` class easily subclassable (:issue:`533`)
3851- Improve test coverage and forthcoming Python 3 support (:issue:`525`)
3852- Promote startup info on settings and middleware to INFO level (:issue:`520`)
3853- Support partials in ``get_func_args`` util (:issue:`506`, issue:`504`)
3854- Allow running individual tests via tox (:issue:`503`)
3855- Update extensions ignored by link extractors (:issue:`498`)
3856- Add middleware methods to get files/images/thumbs paths (:issue:`490`)
3857- Improve offsite middleware tests (:issue:`478`)
3858- Add a way to skip default Referer header set by RefererMiddleware (:issue:`475`)
3859- Do not send ``x-gzip`` in default ``Accept-Encoding`` header (:issue:`469`)
3860- Support defining http error handling using settings (:issue:`466`)
3861- Use modern python idioms wherever you find legacies (:issue:`497`)
3862- Improve and correct documentation
3863  (:issue:`527`, :issue:`524`, :issue:`521`, :issue:`517`, :issue:`512`, :issue:`505`,
3864  :issue:`502`, :issue:`489`, :issue:`465`, :issue:`460`, :issue:`425`, :issue:`536`)
3865
3866Fixes
3867~~~~~
3868
3869- Update Selector class imports in CrawlSpider template (:issue:`484`)
3870- Fix unexistent reference to ``engine.slots`` (:issue:`464`)
3871- Do not try to call ``body_as_unicode()`` on a non-TextResponse instance (:issue:`462`)
3872- Warn when subclassing XPathItemLoader, previously it only warned on
3873  instantiation. (:issue:`523`)
3874- Warn when subclassing XPathSelector, previously it only warned on
3875  instantiation. (:issue:`537`)
3876- Multiple fixes to memory stats (:issue:`531`, :issue:`530`, :issue:`529`)
3877- Fix overriding url in ``FormRequest.from_response()`` (:issue:`507`)
3878- Fix tests runner under pip 1.5 (:issue:`513`)
3879- Fix logging error when spider name is unicode (:issue:`479`)
3880
3881Scrapy 0.20.2 (released 2013-12-09)
3882-----------------------------------
3883
3884- Update CrawlSpider Template with Selector changes (:commit:`6d1457d`)
3885- fix method name in tutorial. closes GH-480 (:commit:`b4fc359`
3886
3887Scrapy 0.20.1 (released 2013-11-28)
3888-----------------------------------
3889
3890- include_package_data is required to build wheels from published sources (:commit:`5ba1ad5`)
3891- process_parallel was leaking the failures on its internal deferreds.  closes #458 (:commit:`419a780`)
3892
3893Scrapy 0.20.0 (released 2013-11-08)
3894-----------------------------------
3895
3896Enhancements
3897~~~~~~~~~~~~
3898
3899- New Selector's API including CSS selectors (:issue:`395` and :issue:`426`),
3900- Request/Response url/body attributes are now immutable
3901  (modifying them had been deprecated for a long time)
3902- :setting:`ITEM_PIPELINES` is now defined as a dict (instead of a list)
3903- Sitemap spider can fetch alternate URLs (:issue:`360`)
3904- ``Selector.remove_namespaces()`` now remove namespaces from element's attributes. (:issue:`416`)
3905- Paved the road for Python 3.3+ (:issue:`435`, :issue:`436`, :issue:`431`, :issue:`452`)
3906- New item exporter using native python types with nesting support (:issue:`366`)
3907- Tune HTTP1.1 pool size so it matches concurrency defined by settings (:commit:`b43b5f575`)
3908- scrapy.mail.MailSender now can connect over TLS or upgrade using STARTTLS (:issue:`327`)
3909- New FilesPipeline with functionality factored out from ImagesPipeline (:issue:`370`, :issue:`409`)
3910- Recommend Pillow instead of PIL for image handling (:issue:`317`)
3911- Added Debian packages for Ubuntu Quantal and Raring (:commit:`86230c0`)
3912- Mock server (used for tests) can listen for HTTPS requests (:issue:`410`)
3913- Remove multi spider support from multiple core components
3914  (:issue:`422`, :issue:`421`, :issue:`420`, :issue:`419`, :issue:`423`, :issue:`418`)
3915- Travis-CI now tests Scrapy changes against development versions of ``w3lib`` and ``queuelib`` python packages.
3916- Add pypy 2.1 to continuous integration tests (:commit:`ecfa7431`)
3917- Pylinted, pep8 and removed old-style exceptions from source (:issue:`430`, :issue:`432`)
3918- Use importlib for parametric imports (:issue:`445`)
3919- Handle a regression introduced in Python 2.7.5 that affects XmlItemExporter (:issue:`372`)
3920- Bugfix crawling shutdown on SIGINT (:issue:`450`)
3921- Do not submit ``reset`` type inputs in FormRequest.from_response (:commit:`b326b87`)
3922- Do not silence download errors when request errback raises an exception (:commit:`684cfc0`)
3923
3924Bugfixes
3925~~~~~~~~
3926
3927- Fix tests under Django 1.6 (:commit:`b6bed44c`)
3928- Lot of bugfixes to retry middleware under disconnections using HTTP 1.1 download handler
3929- Fix inconsistencies among Twisted releases (:issue:`406`)
3930- Fix Scrapy shell bugs (:issue:`418`, :issue:`407`)
3931- Fix invalid variable name in setup.py (:issue:`429`)
3932- Fix tutorial references (:issue:`387`)
3933- Improve request-response docs (:issue:`391`)
3934- Improve best practices docs (:issue:`399`, :issue:`400`, :issue:`401`, :issue:`402`)
3935- Improve django integration docs (:issue:`404`)
3936- Document ``bindaddress`` request meta (:commit:`37c24e01d7`)
3937- Improve ``Request`` class documentation (:issue:`226`)
3938
3939Other
3940~~~~~
3941
3942- Dropped Python 2.6 support (:issue:`448`)
3943- Add :doc:`cssselect <cssselect:index>` python package as install dependency
3944- Drop libxml2 and multi selector's backend support, `lxml`_ is required from now on.
3945- Minimum Twisted version increased to 10.0.0, dropped Twisted 8.0 support.
3946- Running test suite now requires ``mock`` python library (:issue:`390`)
3947
3948
3949Thanks
3950~~~~~~
3951
3952Thanks to everyone who contribute to this release!
3953
3954List of contributors sorted by number of commits::
3955
3956     69 Daniel Graña <dangra@...>
3957     37 Pablo Hoffman <pablo@...>
3958     13 Mikhail Korobov <kmike84@...>
3959      9 Alex Cepoi <alex.cepoi@...>
3960      9 alexanderlukanin13 <alexander.lukanin.13@...>
3961      8 Rolando Espinoza La fuente <darkrho@...>
3962      8 Lukasz Biedrycki <lukasz.biedrycki@...>
3963      6 Nicolas Ramirez <nramirez.uy@...>
3964      3 Paul Tremberth <paul.tremberth@...>
3965      2 Martin Olveyra <molveyra@...>
3966      2 Stefan <misc@...>
3967      2 Rolando Espinoza <darkrho@...>
3968      2 Loren Davie <loren@...>
3969      2 irgmedeiros <irgmedeiros@...>
3970      1 Stefan Koch <taikano@...>
3971      1 Stefan <cct@...>
3972      1 scraperdragon <dragon@...>
3973      1 Kumara Tharmalingam <ktharmal@...>
3974      1 Francesco Piccinno <stack.box@...>
3975      1 Marcos Campal <duendex@...>
3976      1 Dragon Dave <dragon@...>
3977      1 Capi Etheriel <barraponto@...>
3978      1 cacovsky <amarquesferraz@...>
3979      1 Berend Iwema <berend@...>
3980
3981Scrapy 0.18.4 (released 2013-10-10)
3982-----------------------------------
3983
3984- IPython refuses to update the namespace. fix #396 (:commit:`3d32c4f`)
3985- Fix AlreadyCalledError replacing a request in shell command. closes #407 (:commit:`b1d8919`)
3986- Fix start_requests laziness and early hangs (:commit:`89faf52`)
3987
3988Scrapy 0.18.3 (released 2013-10-03)
3989-----------------------------------
3990
3991- fix regression on lazy evaluation of start requests (:commit:`12693a5`)
3992- forms: do not submit reset inputs (:commit:`e429f63`)
3993- increase unittest timeouts to decrease travis false positive failures (:commit:`912202e`)
3994- backport master fixes to json exporter (:commit:`cfc2d46`)
3995- Fix permission and set umask before generating sdist tarball (:commit:`06149e0`)
3996
3997Scrapy 0.18.2 (released 2013-09-03)
3998-----------------------------------
3999
4000- Backport ``scrapy check`` command fixes and backward compatible multi
4001  crawler process(:issue:`339`)
4002
4003Scrapy 0.18.1 (released 2013-08-27)
4004-----------------------------------
4005
4006- remove extra import added by cherry picked changes (:commit:`d20304e`)
4007- fix crawling tests under twisted pre 11.0.0 (:commit:`1994f38`)
4008- py26 can not format zero length fields {} (:commit:`abf756f`)
4009- test PotentiaDataLoss errors on unbound responses (:commit:`b15470d`)
4010- Treat responses without content-length or Transfer-Encoding as good responses (:commit:`c4bf324`)
4011- do no include ResponseFailed if http11 handler is not enabled (:commit:`6cbe684`)
4012- New HTTP client wraps connection lost in ResponseFailed exception. fix #373 (:commit:`1a20bba`)
4013- limit travis-ci build matrix (:commit:`3b01bb8`)
4014- Merge pull request #375 from peterarenot/patch-1 (:commit:`fa766d7`)
4015- Fixed so it refers to the correct folder (:commit:`3283809`)
4016- added Quantal & Raring to support Ubuntu releases (:commit:`1411923`)
4017- fix retry middleware which didn't retry certain connection errors after the upgrade to http1 client, closes GH-373 (:commit:`bb35ed0`)
4018- fix XmlItemExporter in Python 2.7.4 and 2.7.5 (:commit:`de3e451`)
4019- minor updates to 0.18 release notes (:commit:`c45e5f1`)
4020- fix contributors list format (:commit:`0b60031`)
4021
4022Scrapy 0.18.0 (released 2013-08-09)
4023-----------------------------------
4024
4025- Lot of improvements to testsuite run using Tox, including a way to test on pypi
4026- Handle GET parameters for AJAX crawleable urls (:commit:`3fe2a32`)
4027- Use lxml recover option to parse sitemaps (:issue:`347`)
4028- Bugfix cookie merging by hostname and not by netloc (:issue:`352`)
4029- Support disabling ``HttpCompressionMiddleware`` using a flag setting (:issue:`359`)
4030- Support xml namespaces using ``iternodes`` parser in ``XMLFeedSpider`` (:issue:`12`)
4031- Support ``dont_cache`` request meta flag (:issue:`19`)
4032- Bugfix ``scrapy.utils.gz.gunzip`` broken by changes in python 2.7.4 (:commit:`4dc76e`)
4033- Bugfix url encoding on ``SgmlLinkExtractor`` (:issue:`24`)
4034- Bugfix ``TakeFirst`` processor shouldn't discard zero (0) value (:issue:`59`)
4035- Support nested items in xml exporter (:issue:`66`)
4036- Improve cookies handling performance (:issue:`77`)
4037- Log dupe filtered requests once (:issue:`105`)
4038- Split redirection middleware into status and meta based middlewares (:issue:`78`)
4039- Use HTTP1.1 as default downloader handler (:issue:`109` and :issue:`318`)
4040- Support xpath form selection on ``FormRequest.from_response`` (:issue:`185`)
4041- Bugfix unicode decoding error on ``SgmlLinkExtractor`` (:issue:`199`)
4042- Bugfix signal dispatching on pypi interpreter (:issue:`205`)
4043- Improve request delay and concurrency handling (:issue:`206`)
4044- Add RFC2616 cache policy to ``HttpCacheMiddleware`` (:issue:`212`)
4045- Allow customization of messages logged by engine (:issue:`214`)
4046- Multiples improvements to ``DjangoItem`` (:issue:`217`, :issue:`218`, :issue:`221`)
4047- Extend Scrapy commands using setuptools entry points (:issue:`260`)
4048- Allow spider ``allowed_domains`` value to be set/tuple (:issue:`261`)
4049- Support ``settings.getdict`` (:issue:`269`)
4050- Simplify internal ``scrapy.core.scraper`` slot handling (:issue:`271`)
4051- Added ``Item.copy`` (:issue:`290`)
4052- Collect idle downloader slots (:issue:`297`)
4053- Add ``ftp://`` scheme downloader handler (:issue:`329`)
4054- Added downloader benchmark webserver and spider tools :ref:`benchmarking`
4055- Moved persistent (on disk) queues to a separate project (queuelib_) which Scrapy now depends on
4056- Add Scrapy commands using external libraries (:issue:`260`)
4057- Added ``--pdb`` option to ``scrapy`` command line tool
4058- Added :meth:`XPathSelector.remove_namespaces <scrapy.selector.Selector.remove_namespaces>` which allows to remove all namespaces from XML documents for convenience (to work with namespace-less XPaths). Documented in :ref:`topics-selectors`.
4059- Several improvements to spider contracts
4060- New default middleware named MetaRefreshMiddldeware that handles meta-refresh html tag redirections,
4061- MetaRefreshMiddldeware and RedirectMiddleware have different priorities to address #62
4062- added from_crawler method to spiders
4063- added system tests with mock server
4064- more improvements to macOS compatibility (thanks Alex Cepoi)
4065- several more cleanups to singletons and multi-spider support (thanks Nicolas Ramirez)
4066- support custom download slots
4067- added --spider option to "shell" command.
4068- log overridden settings when Scrapy starts
4069
4070Thanks to everyone who contribute to this release. Here is a list of
4071contributors sorted by number of commits::
4072
4073    130 Pablo Hoffman <pablo@...>
4074     97 Daniel Graña <dangra@...>
4075     20 Nicolás Ramírez <nramirez.uy@...>
4076     13 Mikhail Korobov <kmike84@...>
4077     12 Pedro Faustino <pedrobandim@...>
4078     11 Steven Almeroth <sroth77@...>
4079      5 Rolando Espinoza La fuente <darkrho@...>
4080      4 Michal Danilak <mimino.coder@...>
4081      4 Alex Cepoi <alex.cepoi@...>
4082      4 Alexandr N Zamaraev (aka tonal) <tonal@...>
4083      3 paul <paul.tremberth@...>
4084      3 Martin Olveyra <molveyra@...>
4085      3 Jordi Llonch <llonchj@...>
4086      3 arijitchakraborty <myself.arijit@...>
4087      2 Shane Evans <shane.evans@...>
4088      2 joehillen <joehillen@...>
4089      2 Hart <HartSimha@...>
4090      2 Dan <ellisd23@...>
4091      1 Zuhao Wan <wanzuhao@...>
4092      1 whodatninja <blake@...>
4093      1 vkrest <v.krestiannykov@...>
4094      1 tpeng <pengtaoo@...>
4095      1 Tom Mortimer-Jones <tom@...>
4096      1 Rocio Aramberri <roschegel@...>
4097      1 Pedro <pedro@...>
4098      1 notsobad <wangxiaohugg@...>
4099      1 Natan L <kuyanatan.nlao@...>
4100      1 Mark Grey <mark.grey@...>
4101      1 Luan <luanpab@...>
4102      1 Libor Nenadál <libor.nenadal@...>
4103      1 Juan M Uys <opyate@...>
4104      1 Jonas Brunsgaard <jonas.brunsgaard@...>
4105      1 Ilya Baryshev <baryshev@...>
4106      1 Hasnain Lakhani <m.hasnain.lakhani@...>
4107      1 Emanuel Schorsch <emschorsch@...>
4108      1 Chris Tilden <chris.tilden@...>
4109      1 Capi Etheriel <barraponto@...>
4110      1 cacovsky <amarquesferraz@...>
4111      1 Berend Iwema <berend@...>
4112
4113
4114Scrapy 0.16.5 (released 2013-05-30)
4115-----------------------------------
4116
4117- obey request method when Scrapy deploy is redirected to a new endpoint (:commit:`8c4fcee`)
4118- fix inaccurate downloader middleware documentation. refs #280 (:commit:`40667cb`)
4119- doc: remove links to diveintopython.org, which is no longer available. closes #246 (:commit:`bd58bfa`)
4120- Find form nodes in invalid html5 documents (:commit:`e3d6945`)
4121- Fix typo labeling attrs type bool instead of list (:commit:`a274276`)
4122
4123Scrapy 0.16.4 (released 2013-01-23)
4124-----------------------------------
4125
4126- fixes spelling errors in documentation (:commit:`6d2b3aa`)
4127- add doc about disabling an extension. refs #132 (:commit:`c90de33`)
4128- Fixed error message formatting. log.err() doesn't support cool formatting and when error occurred, the message was:    "ERROR: Error processing %(item)s" (:commit:`c16150c`)
4129- lint and improve images pipeline error logging (:commit:`56b45fc`)
4130- fixed doc typos (:commit:`243be84`)
4131- add documentation topics: Broad Crawls & Common Practices (:commit:`1fbb715`)
4132- fix bug in Scrapy parse command when spider is not specified explicitly. closes #209 (:commit:`c72e682`)
4133- Update docs/topics/commands.rst (:commit:`28eac7a`)
4134
4135Scrapy 0.16.3 (released 2012-12-07)
4136-----------------------------------
4137
4138- Remove concurrency limitation when using download delays and still ensure inter-request delays are enforced (:commit:`487b9b5`)
4139- add error details when image pipeline fails (:commit:`8232569`)
4140- improve macOS compatibility (:commit:`8dcf8aa`)
4141- setup.py: use README.rst to populate long_description (:commit:`7b5310d`)
4142- doc: removed obsolete references to ClientForm (:commit:`80f9bb6`)
4143- correct docs for default storage backend (:commit:`2aa491b`)
4144- doc: removed broken proxyhub link from FAQ (:commit:`bdf61c4`)
4145- Fixed docs typo in SpiderOpenCloseLogging example (:commit:`7184094`)
4146
4147
4148Scrapy 0.16.2 (released 2012-11-09)
4149-----------------------------------
4150
4151- Scrapy contracts: python2.6 compat (:commit:`a4a9199`)
4152- Scrapy contracts verbose option (:commit:`ec41673`)
4153- proper unittest-like output for Scrapy contracts (:commit:`86635e4`)
4154- added open_in_browser to debugging doc (:commit:`c9b690d`)
4155- removed reference to global Scrapy stats from settings doc (:commit:`dd55067`)
4156- Fix SpiderState bug in Windows platforms (:commit:`58998f4`)
4157
4158
4159Scrapy 0.16.1 (released 2012-10-26)
4160-----------------------------------
4161
4162- fixed LogStats extension, which got broken after a wrong merge before the 0.16 release (:commit:`8c780fd`)
4163- better backward compatibility for scrapy.conf.settings (:commit:`3403089`)
4164- extended documentation on how to access crawler stats from extensions (:commit:`c4da0b5`)
4165- removed .hgtags (no longer needed now that Scrapy uses git) (:commit:`d52c188`)
4166- fix dashes under rst headers (:commit:`fa4f7f9`)
4167- set release date for 0.16.0 in news (:commit:`e292246`)
4168
4169
4170Scrapy 0.16.0 (released 2012-10-18)
4171-----------------------------------
4172
4173Scrapy changes:
4174
4175- added :ref:`topics-contracts`, a mechanism for testing spiders in a formal/reproducible way
4176- added options ``-o`` and ``-t`` to the :command:`runspider` command
4177- documented :doc:`topics/autothrottle` and added to extensions installed by default. You still need to enable it with :setting:`AUTOTHROTTLE_ENABLED`
4178- major Stats Collection refactoring: removed separation of global/per-spider stats, removed stats-related signals (``stats_spider_opened``, etc). Stats are much simpler now, backward compatibility is kept on the Stats Collector API and signals.
4179- added :meth:`~scrapy.spidermiddlewares.SpiderMiddleware.process_start_requests` method to spider middlewares
4180- dropped Signals singleton. Signals should now be accessed through the Crawler.signals attribute. See the signals documentation for more info.
4181- dropped Stats Collector singleton. Stats can now be accessed through the Crawler.stats attribute. See the stats collection documentation for more info.
4182- documented :ref:`topics-api`
4183- ``lxml`` is now the default selectors backend instead of ``libxml2``
4184- ported FormRequest.from_response() to use `lxml`_ instead of `ClientForm`_
4185- removed modules: ``scrapy.xlib.BeautifulSoup`` and ``scrapy.xlib.ClientForm``
4186- SitemapSpider: added support for sitemap urls ending in .xml and .xml.gz, even if they advertise a wrong content type (:commit:`10ed28b`)
4187- StackTraceDump extension: also dump trackref live references (:commit:`fe2ce93`)
4188- nested items now fully supported in JSON and JSONLines exporters
4189- added :reqmeta:`cookiejar` Request meta key to support multiple cookie sessions per spider
4190- decoupled encoding detection code to `w3lib.encoding`_, and ported Scrapy code to use that module
4191- dropped support for Python 2.5. See https://blog.scrapinghub.com/2012/02/27/scrapy-0-15-dropping-support-for-python-2-5/
4192- dropped support for Twisted 2.5
4193- added :setting:`REFERER_ENABLED` setting, to control referer middleware
4194- changed default user agent to: ``Scrapy/VERSION (+http://scrapy.org)``
4195- removed (undocumented) ``HTMLImageLinkExtractor`` class from ``scrapy.contrib.linkextractors.image``
4196- removed per-spider settings (to be replaced by instantiating multiple crawler objects)
4197- ``USER_AGENT`` spider attribute will no longer work, use ``user_agent`` attribute instead
4198- ``DOWNLOAD_TIMEOUT`` spider attribute will no longer work, use ``download_timeout`` attribute instead
4199- removed ``ENCODING_ALIASES`` setting, as encoding auto-detection has been moved to the `w3lib`_ library
4200- promoted :ref:`topics-djangoitem` to main contrib
4201- LogFormatter method now return dicts(instead of strings) to support lazy formatting (:issue:`164`, :commit:`dcef7b0`)
4202- downloader handlers (:setting:`DOWNLOAD_HANDLERS` setting) now receive settings as the first argument of the ``__init__`` method
4203- replaced memory usage acounting with (more portable) `resource`_ module, removed ``scrapy.utils.memory`` module
4204- removed signal: ``scrapy.mail.mail_sent``
4205- removed ``TRACK_REFS`` setting, now :ref:`trackrefs <topics-leaks-trackrefs>` is always enabled
4206- DBM is now the default storage backend for HTTP cache middleware
4207- number of log messages (per level) are now tracked through Scrapy stats (stat name: ``log_count/LEVEL``)
4208- number received responses are now tracked through Scrapy stats (stat name: ``response_received_count``)
4209- removed ``scrapy.log.started`` attribute
4210
4211Scrapy 0.14.4
4212-------------
4213
4214- added precise to supported Ubuntu distros (:commit:`b7e46df`)
4215- fixed bug in json-rpc webservice reported in https://groups.google.com/forum/#!topic/scrapy-users/qgVBmFybNAQ/discussion. also removed no longer supported 'run' command from extras/scrapy-ws.py (:commit:`340fbdb`)
4216- meta tag attributes for content-type http equiv can be in any order. #123 (:commit:`0cb68af`)
4217- replace "import Image" by more standard "from PIL import Image". closes #88 (:commit:`4d17048`)
4218- return trial status as bin/runtests.sh exit value. #118 (:commit:`b7b2e7f`)
4219
4220Scrapy 0.14.3
4221-------------
4222
4223- forgot to include pydispatch license. #118 (:commit:`fd85f9c`)
4224- include egg files used by testsuite in source distribution. #118 (:commit:`c897793`)
4225- update docstring in project template to avoid confusion with genspider command, which may be considered as an advanced feature. refs #107 (:commit:`2548dcc`)
4226- added note to docs/topics/firebug.rst about google directory being shut down (:commit:`668e352`)
4227- don't discard slot when empty, just save in another dict in order to recycle if needed again. (:commit:`8e9f607`)
4228- do not fail handling unicode xpaths in libxml2 backed selectors (:commit:`b830e95`)
4229- fixed minor mistake in Request objects documentation (:commit:`bf3c9ee`)
4230- fixed minor defect in link extractors documentation (:commit:`ba14f38`)
4231- removed some obsolete remaining code related to sqlite support in Scrapy (:commit:`0665175`)
4232
4233Scrapy 0.14.2
4234-------------
4235
4236- move buffer pointing to start of file before computing checksum. refs #92 (:commit:`6a5bef2`)
4237- Compute image checksum before persisting images. closes #92 (:commit:`9817df1`)
4238- remove leaking references in cached failures (:commit:`673a120`)
4239- fixed bug in MemoryUsage extension: get_engine_status() takes exactly 1 argument (0 given) (:commit:`11133e9`)
4240- fixed struct.error on http compression middleware. closes #87 (:commit:`1423140`)
4241- ajax crawling wasn't expanding for unicode urls (:commit:`0de3fb4`)
4242- Catch start_requests iterator errors. refs #83 (:commit:`454a21d`)
4243- Speed-up libxml2 XPathSelector (:commit:`2fbd662`)
4244- updated versioning doc according to recent changes (:commit:`0a070f5`)
4245- scrapyd: fixed documentation link (:commit:`2b4e4c3`)
4246- extras/makedeb.py: no longer obtaining version from git (:commit:`caffe0e`)
4247
4248Scrapy 0.14.1
4249-------------
4250
4251- extras/makedeb.py: no longer obtaining version from git (:commit:`caffe0e`)
4252- bumped version to 0.14.1 (:commit:`6cb9e1c`)
4253- fixed reference to tutorial directory (:commit:`4b86bd6`)
4254- doc: removed duplicated callback argument from Request.replace() (:commit:`1aeccdd`)
4255- fixed formatting of scrapyd doc (:commit:`8bf19e6`)
4256- Dump stacks for all running threads and fix engine status dumped by StackTraceDump extension (:commit:`14a8e6e`)
4257- added comment about why we disable ssl on boto images upload (:commit:`5223575`)
4258- SSL handshaking hangs when doing too many parallel connections to S3 (:commit:`63d583d`)
4259- change tutorial to follow changes on dmoz site (:commit:`bcb3198`)
4260- Avoid _disconnectedDeferred AttributeError exception in Twisted>=11.1.0 (:commit:`98f3f87`)
4261- allow spider to set autothrottle max concurrency (:commit:`175a4b5`)
4262
4263Scrapy 0.14
4264-----------
4265
4266New features and settings
4267~~~~~~~~~~~~~~~~~~~~~~~~~
4268
4269- Support for `AJAX crawleable urls`_
4270- New persistent scheduler that stores requests on disk, allowing to suspend and resume crawls (:rev:`2737`)
4271- added ``-o`` option to ``scrapy crawl``, a shortcut for dumping scraped items into a file (or standard output using ``-``)
4272- Added support for passing custom settings to Scrapyd ``schedule.json`` api (:rev:`2779`, :rev:`2783`)
4273- New ``ChunkedTransferMiddleware`` (enabled by default) to support `chunked transfer encoding`_ (:rev:`2769`)
4274- Add boto 2.0 support for S3 downloader handler (:rev:`2763`)
4275- Added `marshal`_ to formats supported by feed exports (:rev:`2744`)
4276- In request errbacks, offending requests are now received in ``failure.request`` attribute (:rev:`2738`)
4277- Big downloader refactoring to support per domain/ip concurrency limits (:rev:`2732`)
4278   - ``CONCURRENT_REQUESTS_PER_SPIDER`` setting has been deprecated and replaced by:
4279      - :setting:`CONCURRENT_REQUESTS`, :setting:`CONCURRENT_REQUESTS_PER_DOMAIN`, :setting:`CONCURRENT_REQUESTS_PER_IP`
4280   - check the documentation for more details
4281- Added builtin caching DNS resolver (:rev:`2728`)
4282- Moved Amazon AWS-related components/extensions (SQS spider queue, SimpleDB stats collector) to a separate project: [scaws](https://github.com/scrapinghub/scaws) (:rev:`2706`, :rev:`2714`)
4283- Moved spider queues to scrapyd: ``scrapy.spiderqueue`` -> ``scrapyd.spiderqueue`` (:rev:`2708`)
4284- Moved sqlite utils to scrapyd: ``scrapy.utils.sqlite`` -> ``scrapyd.sqlite`` (:rev:`2781`)
4285- Real support for returning iterators on ``start_requests()`` method. The iterator is now consumed during the crawl when the spider is getting idle (:rev:`2704`)
4286- Added :setting:`REDIRECT_ENABLED` setting to quickly enable/disable the redirect middleware (:rev:`2697`)
4287- Added :setting:`RETRY_ENABLED` setting to quickly enable/disable the retry middleware (:rev:`2694`)
4288- Added ``CloseSpider`` exception to manually close spiders (:rev:`2691`)
4289- Improved encoding detection by adding support for HTML5 meta charset declaration (:rev:`2690`)
4290- Refactored close spider behavior to wait for all downloads to finish and be processed by spiders, before closing the spider (:rev:`2688`)
4291- Added ``SitemapSpider`` (see documentation in Spiders page) (:rev:`2658`)
4292- Added ``LogStats`` extension for periodically logging basic stats (like crawled pages and scraped items) (:rev:`2657`)
4293- Make handling of gzipped responses more robust (#319, :rev:`2643`). Now Scrapy will try and decompress as much as possible from a gzipped response, instead of failing with an ``IOError``.
4294- Simplified !MemoryDebugger extension to use stats for dumping memory debugging info (:rev:`2639`)
4295- Added new command to edit spiders: ``scrapy edit`` (:rev:`2636`) and ``-e`` flag to ``genspider`` command that uses it (:rev:`2653`)
4296- Changed default representation of items to pretty-printed dicts. (:rev:`2631`). This improves default logging by making log more readable in the default case, for both Scraped and Dropped lines.
4297- Added :signal:`spider_error` signal (:rev:`2628`)
4298- Added :setting:`COOKIES_ENABLED` setting (:rev:`2625`)
4299- Stats are now dumped to Scrapy log (default value of :setting:`STATS_DUMP` setting has been changed to ``True``). This is to make Scrapy users more aware of Scrapy stats and the data that is collected there.
4300- Added support for dynamically adjusting download delay and maximum concurrent requests (:rev:`2599`)
4301- Added new DBM HTTP cache storage backend (:rev:`2576`)
4302- Added ``listjobs.json`` API to Scrapyd (:rev:`2571`)
4303- ``CsvItemExporter``: added ``join_multivalued`` parameter (:rev:`2578`)
4304- Added namespace support to ``xmliter_lxml`` (:rev:`2552`)
4305- Improved cookies middleware by making ``COOKIES_DEBUG`` nicer and documenting it (:rev:`2579`)
4306- Several improvements to Scrapyd and Link extractors
4307
4308Code rearranged and removed
4309~~~~~~~~~~~~~~~~~~~~~~~~~~~
4310
4311- Merged item passed and item scraped concepts, as they have often proved confusing in the past. This means: (:rev:`2630`)
4312   - original item_scraped signal was removed
4313   - original item_passed signal was renamed to item_scraped
4314   - old log lines ``Scraped Item...`` were removed
4315   - old log lines ``Passed Item...`` were renamed to ``Scraped Item...`` lines and downgraded to ``DEBUG`` level
4316- Reduced Scrapy codebase by striping part of Scrapy code into two new libraries:
4317   - `w3lib`_ (several functions from ``scrapy.utils.{http,markup,multipart,response,url}``, done in :rev:`2584`)
4318   - `scrapely`_ (was ``scrapy.contrib.ibl``, done in :rev:`2586`)
4319- Removed unused function: ``scrapy.utils.request.request_info()`` (:rev:`2577`)
4320- Removed googledir project from ``examples/googledir``. There's now a new example project called ``dirbot`` available on GitHub: https://github.com/scrapy/dirbot
4321- Removed support for default field values in Scrapy items (:rev:`2616`)
4322- Removed experimental crawlspider v2 (:rev:`2632`)
4323- Removed scheduler middleware to simplify architecture. Duplicates filter is now done in the scheduler itself, using the same dupe fltering class as before (``DUPEFILTER_CLASS`` setting) (:rev:`2640`)
4324- Removed support for passing urls to ``scrapy crawl`` command (use ``scrapy parse`` instead) (:rev:`2704`)
4325- Removed deprecated Execution Queue (:rev:`2704`)
4326- Removed (undocumented) spider context extension (from scrapy.contrib.spidercontext) (:rev:`2780`)
4327- removed ``CONCURRENT_SPIDERS`` setting (use scrapyd maxproc instead) (:rev:`2789`)
4328- Renamed attributes of core components: downloader.sites -> downloader.slots, scraper.sites -> scraper.slots (:rev:`2717`, :rev:`2718`)
4329- Renamed setting ``CLOSESPIDER_ITEMPASSED`` to :setting:`CLOSESPIDER_ITEMCOUNT` (:rev:`2655`). Backward compatibility kept.
4330
4331Scrapy 0.12
4332-----------
4333
4334The numbers like #NNN reference tickets in the old issue tracker (Trac) which is no longer available.
4335
4336New features and improvements
4337~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4338
4339- Passed item is now sent in the ``item`` argument of the :signal:`item_passed
4340  <item_scraped>` (#273)
4341- Added verbose option to ``scrapy version`` command, useful for bug reports (#298)
4342- HTTP cache now stored by default in the project data dir (#279)
4343- Added project data storage directory (#276, #277)
4344- Documented file structure of Scrapy projects (see command-line tool doc)
4345- New lxml backend for XPath selectors (#147)
4346- Per-spider settings (#245)
4347- Support exit codes to signal errors in Scrapy commands (#248)
4348- Added ``-c`` argument to ``scrapy shell`` command
4349- Made ``libxml2`` optional (#260)
4350- New ``deploy`` command (#261)
4351- Added :setting:`CLOSESPIDER_PAGECOUNT` setting (#253)
4352- Added :setting:`CLOSESPIDER_ERRORCOUNT` setting (#254)
4353
4354Scrapyd changes
4355~~~~~~~~~~~~~~~
4356
4357- Scrapyd now uses one process per spider
4358- It stores one log file per spider run, and rotate them keeping the lastest 5 logs per spider (by default)
4359- A minimal web ui was added, available at http://localhost:6800 by default
4360- There is now a ``scrapy server`` command to start a Scrapyd server of the current project
4361
4362Changes to settings
4363~~~~~~~~~~~~~~~~~~~
4364
4365- added ``HTTPCACHE_ENABLED`` setting (False by default) to enable HTTP cache middleware
4366- changed ``HTTPCACHE_EXPIRATION_SECS`` semantics: now zero means "never expire".
4367
4368Deprecated/obsoleted functionality
4369~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4370
4371- Deprecated ``runserver`` command in favor of ``server`` command which starts a Scrapyd server. See also: Scrapyd changes
4372- Deprecated ``queue`` command in favor of using Scrapyd ``schedule.json`` API. See also: Scrapyd changes
4373- Removed the !LxmlItemLoader (experimental contrib which never graduated to main contrib)
4374
4375Scrapy 0.10
4376-----------
4377
4378The numbers like #NNN reference tickets in the old issue tracker (Trac) which is no longer available.
4379
4380New features and improvements
4381~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4382
4383- New Scrapy service called ``scrapyd`` for deploying Scrapy crawlers in production (#218) (documentation available)
4384- Simplified Images pipeline usage which doesn't require subclassing your own images pipeline now (#217)
4385- Scrapy shell now shows the Scrapy log by default (#206)
4386- Refactored execution queue in a common base code and pluggable backends called "spider queues" (#220)
4387- New persistent spider queue (based on SQLite) (#198), available by default, which allows to start Scrapy in server mode and then schedule spiders to run.
4388- Added documentation for Scrapy command-line tool and all its available sub-commands. (documentation available)
4389- Feed exporters with pluggable backends (#197) (documentation available)
4390- Deferred signals (#193)
4391- Added two new methods to item pipeline open_spider(), close_spider() with deferred support (#195)
4392- Support for overriding default request headers per spider (#181)
4393- Replaced default Spider Manager with one with similar functionality but not depending on Twisted Plugins (#186)
4394- Splitted Debian package into two packages - the library and the service (#187)
4395- Scrapy log refactoring (#188)
4396- New extension for keeping persistent spider contexts among different runs (#203)
4397- Added ``dont_redirect`` request.meta key for avoiding redirects (#233)
4398- Added ``dont_retry`` request.meta key for avoiding retries (#234)
4399
4400Command-line tool changes
4401~~~~~~~~~~~~~~~~~~~~~~~~~
4402
4403- New ``scrapy`` command which replaces the old ``scrapy-ctl.py`` (#199)
4404  - there is only one global ``scrapy`` command now, instead of one ``scrapy-ctl.py`` per project
4405  - Added ``scrapy.bat`` script for running more conveniently from Windows
4406- Added bash completion to command-line tool (#210)
4407- Renamed command ``start`` to ``runserver`` (#209)
4408
4409API changes
4410~~~~~~~~~~~
4411
4412- ``url`` and ``body`` attributes of Request objects are now read-only (#230)
4413- ``Request.copy()`` and ``Request.replace()`` now also copies their ``callback`` and ``errback`` attributes (#231)
4414- Removed ``UrlFilterMiddleware`` from ``scrapy.contrib`` (already disabled by default)
4415- Offsite middleware doesn't filter out any request coming from a spider that doesn't have a allowed_domains attribute (#225)
4416- Removed Spider Manager ``load()`` method. Now spiders are loaded in the ``__init__`` method itself.
4417- Changes to Scrapy Manager (now called "Crawler"):
4418   - ``scrapy.core.manager.ScrapyManager`` class renamed to ``scrapy.crawler.Crawler``
4419   - ``scrapy.core.manager.scrapymanager`` singleton moved to ``scrapy.project.crawler``
4420- Moved module: ``scrapy.contrib.spidermanager`` to ``scrapy.spidermanager``
4421- Spider Manager singleton moved from ``scrapy.spider.spiders`` to the ``spiders` attribute of ``scrapy.project.crawler`` singleton.
4422- moved Stats Collector classes: (#204)
4423   - ``scrapy.stats.collector.StatsCollector`` to ``scrapy.statscol.StatsCollector``
4424   - ``scrapy.stats.collector.SimpledbStatsCollector`` to ``scrapy.contrib.statscol.SimpledbStatsCollector``
4425- default per-command settings are now specified in the ``default_settings`` attribute of command object class (#201)
4426- changed arguments of Item pipeline ``process_item()`` method from ``(spider, item)`` to ``(item, spider)``
4427   - backward compatibility kept (with deprecation warning)
4428- moved ``scrapy.core.signals`` module to ``scrapy.signals``
4429   - backward compatibility kept (with deprecation warning)
4430- moved ``scrapy.core.exceptions`` module to ``scrapy.exceptions``
4431   - backward compatibility kept (with deprecation warning)
4432- added ``handles_request()`` class method to ``BaseSpider``
4433- dropped ``scrapy.log.exc()`` function (use ``scrapy.log.err()`` instead)
4434- dropped ``component`` argument of ``scrapy.log.msg()`` function
4435- dropped ``scrapy.log.log_level`` attribute
4436- Added ``from_settings()`` class methods to Spider Manager, and Item Pipeline Manager
4437
4438Changes to settings
4439~~~~~~~~~~~~~~~~~~~
4440
4441- Added ``HTTPCACHE_IGNORE_SCHEMES`` setting to ignore certain schemes on !HttpCacheMiddleware (#225)
4442- Added ``SPIDER_QUEUE_CLASS`` setting which defines the spider queue to use (#220)
4443- Added ``KEEP_ALIVE`` setting (#220)
4444- Removed ``SERVICE_QUEUE`` setting (#220)
4445- Removed ``COMMANDS_SETTINGS_MODULE`` setting (#201)
4446- Renamed ``REQUEST_HANDLERS`` to ``DOWNLOAD_HANDLERS`` and make download handlers classes (instead of functions)
4447
4448Scrapy 0.9
4449----------
4450
4451The numbers like #NNN reference tickets in the old issue tracker (Trac) which is no longer available.
4452
4453New features and improvements
4454~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4455
4456- Added SMTP-AUTH support to scrapy.mail
4457- New settings added: ``MAIL_USER``, ``MAIL_PASS`` (:rev:`2065` | #149)
4458- Added new scrapy-ctl view command - To view URL in the browser, as seen by Scrapy (:rev:`2039`)
4459- Added web service for controlling Scrapy process (this also deprecates the web console. (:rev:`2053` | #167)
4460- Support for running Scrapy as a service, for production systems (:rev:`1988`, :rev:`2054`, :rev:`2055`, :rev:`2056`, :rev:`2057` | #168)
4461- Added wrapper induction library (documentation only available in source code for now). (:rev:`2011`)
4462- Simplified and improved response encoding support (:rev:`1961`, :rev:`1969`)
4463- Added ``LOG_ENCODING`` setting (:rev:`1956`, documentation available)
4464- Added ``RANDOMIZE_DOWNLOAD_DELAY`` setting (enabled by default) (:rev:`1923`, doc available)
4465- ``MailSender`` is no longer IO-blocking (:rev:`1955` | #146)
4466- Linkextractors and new Crawlspider now handle relative base tag urls (:rev:`1960` | #148)
4467- Several improvements to Item Loaders and processors (:rev:`2022`, :rev:`2023`, :rev:`2024`, :rev:`2025`, :rev:`2026`, :rev:`2027`, :rev:`2028`, :rev:`2029`, :rev:`2030`)
4468- Added support for adding variables to telnet console (:rev:`2047` | #165)
4469- Support for requests without callbacks (:rev:`2050` | #166)
4470
4471API changes
4472~~~~~~~~~~~
4473
4474- Change ``Spider.domain_name`` to ``Spider.name`` (SEP-012, :rev:`1975`)
4475- ``Response.encoding`` is now the detected encoding (:rev:`1961`)
4476- ``HttpErrorMiddleware`` now returns None or raises an exception (:rev:`2006` | #157)
4477- ``scrapy.command`` modules relocation (:rev:`2035`, :rev:`2036`, :rev:`2037`)
4478- Added ``ExecutionQueue`` for feeding spiders to scrape (:rev:`2034`)
4479- Removed ``ExecutionEngine`` singleton (:rev:`2039`)
4480- Ported ``S3ImagesStore`` (images pipeline) to use boto and threads (:rev:`2033`)
4481- Moved module: ``scrapy.management.telnet`` to ``scrapy.telnet`` (:rev:`2047`)
4482
4483Changes to default settings
4484~~~~~~~~~~~~~~~~~~~~~~~~~~~
4485
4486- Changed default ``SCHEDULER_ORDER`` to ``DFO`` (:rev:`1939`)
4487
4488Scrapy 0.8
4489----------
4490
4491The numbers like #NNN reference tickets in the old issue tracker (Trac) which is no longer available.
4492
4493New features
4494~~~~~~~~~~~~
4495
4496- Added DEFAULT_RESPONSE_ENCODING setting (:rev:`1809`)
4497- Added ``dont_click`` argument to ``FormRequest.from_response()`` method (:rev:`1813`, :rev:`1816`)
4498- Added ``clickdata`` argument to ``FormRequest.from_response()`` method (:rev:`1802`, :rev:`1803`)
4499- Added support for HTTP proxies (``HttpProxyMiddleware``) (:rev:`1781`, :rev:`1785`)
4500- Offsite spider middleware now logs messages when filtering out requests (:rev:`1841`)
4501
4502Backward-incompatible changes
4503~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4504
4505- Changed ``scrapy.utils.response.get_meta_refresh()`` signature (:rev:`1804`)
4506- Removed deprecated ``scrapy.item.ScrapedItem`` class - use ``scrapy.item.Item instead`` (:rev:`1838`)
4507- Removed deprecated ``scrapy.xpath`` module - use ``scrapy.selector`` instead. (:rev:`1836`)
4508- Removed deprecated ``core.signals.domain_open`` signal - use ``core.signals.domain_opened`` instead (:rev:`1822`)
4509- ``log.msg()`` now receives a ``spider`` argument (:rev:`1822`)
4510   - Old domain argument has been deprecated and will be removed in 0.9. For spiders, you should always use the ``spider`` argument and pass spider references. If you really want to pass a string, use the ``component`` argument instead.
4511- Changed core signals ``domain_opened``, ``domain_closed``, ``domain_idle``
4512- Changed Item pipeline to use spiders instead of domains
4513   -  The ``domain`` argument of  ``process_item()`` item pipeline method was changed to  ``spider``, the new signature is: ``process_item(spider, item)`` (:rev:`1827` | #105)
4514   - To quickly port your code (to work with Scrapy 0.8) just use ``spider.domain_name`` where you previously used ``domain``.
4515- Changed Stats API to use spiders instead of domains (:rev:`1849` | #113)
4516   - ``StatsCollector`` was changed to receive spider references (instead of domains) in its methods (``set_value``, ``inc_value``, etc).
4517   - added ``StatsCollector.iter_spider_stats()`` method
4518   - removed ``StatsCollector.list_domains()`` method
4519   - Also, Stats signals were renamed and now pass around spider references (instead of domains). Here's a summary of the changes:
4520   - To quickly port your code (to work with Scrapy 0.8) just use ``spider.domain_name`` where you previously used ``domain``. ``spider_stats`` contains exactly the same data as ``domain_stats``.
4521- ``CloseDomain`` extension moved to ``scrapy.contrib.closespider.CloseSpider`` (:rev:`1833`)
4522   - Its settings were also renamed:
4523      - ``CLOSEDOMAIN_TIMEOUT`` to ``CLOSESPIDER_TIMEOUT``
4524      - ``CLOSEDOMAIN_ITEMCOUNT`` to ``CLOSESPIDER_ITEMCOUNT``
4525- Removed deprecated ``SCRAPYSETTINGS_MODULE`` environment variable - use ``SCRAPY_SETTINGS_MODULE`` instead (:rev:`1840`)
4526- Renamed setting: ``REQUESTS_PER_DOMAIN`` to ``CONCURRENT_REQUESTS_PER_SPIDER`` (:rev:`1830`, :rev:`1844`)
4527- Renamed setting: ``CONCURRENT_DOMAINS`` to ``CONCURRENT_SPIDERS`` (:rev:`1830`)
4528- Refactored HTTP Cache middleware
4529- HTTP Cache middleware has been heavilty refactored, retaining the same functionality except for the domain sectorization which was removed. (:rev:`1843` )
4530- Renamed exception: ``DontCloseDomain`` to ``DontCloseSpider`` (:rev:`1859` | #120)
4531- Renamed extension: ``DelayedCloseDomain`` to ``SpiderCloseDelay`` (:rev:`1861` | #121)
4532- Removed obsolete ``scrapy.utils.markup.remove_escape_chars`` function - use ``scrapy.utils.markup.replace_escape_chars`` instead (:rev:`1865`)
4533
4534Scrapy 0.7
4535----------
4536
4537First release of Scrapy.
4538
4539
4540.. _AJAX crawleable urls: https://developers.google.com/search/docs/ajax-crawling/docs/getting-started?csw=1
4541.. _botocore: https://github.com/boto/botocore
4542.. _chunked transfer encoding: https://en.wikipedia.org/wiki/Chunked_transfer_encoding
4543.. _ClientForm: http://wwwsearch.sourceforge.net/old/ClientForm/
4544.. _Creating a pull request: https://help.github.com/en/articles/creating-a-pull-request
4545.. _cryptography: https://cryptography.io/en/latest/
4546.. _docstrings: https://docs.python.org/3/glossary.html#term-docstring
4547.. _KeyboardInterrupt: https://docs.python.org/3/library/exceptions.html#KeyboardInterrupt
4548.. _LevelDB: https://github.com/google/leveldb
4549.. _lxml: https://lxml.de/
4550.. _marshal: https://docs.python.org/2/library/marshal.html
4551.. _parsel.csstranslator.GenericTranslator: https://parsel.readthedocs.io/en/latest/parsel.html#parsel.csstranslator.GenericTranslator
4552.. _parsel.csstranslator.HTMLTranslator: https://parsel.readthedocs.io/en/latest/parsel.html#parsel.csstranslator.HTMLTranslator
4553.. _parsel.csstranslator.XPathExpr: https://parsel.readthedocs.io/en/latest/parsel.html#parsel.csstranslator.XPathExpr
4554.. _PEP 257: https://www.python.org/dev/peps/pep-0257/
4555.. _Pillow: https://python-pillow.org/
4556.. _pyOpenSSL: https://www.pyopenssl.org/en/stable/
4557.. _queuelib: https://github.com/scrapy/queuelib
4558.. _registered with IANA: https://www.iana.org/assignments/media-types/media-types.xhtml
4559.. _resource: https://docs.python.org/2/library/resource.html
4560.. _robots.txt: https://www.robotstxt.org/
4561.. _scrapely: https://github.com/scrapy/scrapely
4562.. _scrapy-bench: https://github.com/scrapy/scrapy-bench
4563.. _service_identity: https://service-identity.readthedocs.io/en/stable/
4564.. _six: https://six.readthedocs.io/
4565.. _tox: https://pypi.org/project/tox/
4566.. _Twisted: https://twistedmatrix.com/trac/
4567.. _w3lib: https://github.com/scrapy/w3lib
4568.. _w3lib.encoding: https://github.com/scrapy/w3lib/blob/master/w3lib/encoding.py
4569.. _What is cacheable: https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.1
4570.. _zope.interface: https://zopeinterface.readthedocs.io/en/latest/
4571.. _Zsh: https://www.zsh.org/
4572.. _zstandard: https://pypi.org/project/zstandard/
4573