1.. _topics-exporters:
2
3==============
4Item Exporters
5==============
6
7.. module:: scrapy.exporters
8   :synopsis: Item Exporters
9
10Once you have scraped your items, you often want to persist or export those
11items, to use the data in some other application. That is, after all, the whole
12purpose of the scraping process.
13
14For this purpose Scrapy provides a collection of Item Exporters for different
15output formats, such as XML, CSV or JSON.
16
17Using Item Exporters
18====================
19
20If you are in a hurry, and just want to use an Item Exporter to output scraped
21data see the :ref:`topics-feed-exports`. Otherwise, if you want to know how
22Item Exporters work or need more custom functionality (not covered by the
23default exports), continue reading below.
24
25In order to use an Item Exporter, you  must instantiate it with its required
26args. Each Item Exporter requires different arguments, so check each exporter
27documentation to be sure, in :ref:`topics-exporters-reference`. After you have
28instantiated your exporter, you have to:
29
301. call the method :meth:`~BaseItemExporter.start_exporting` in order to
31signal the beginning of the exporting process
32
332. call the :meth:`~BaseItemExporter.export_item` method for each item you want
34to export
35
363. and finally call the :meth:`~BaseItemExporter.finish_exporting` to signal
37the end of the exporting process
38
39Here you can see an :doc:`Item Pipeline <item-pipeline>` which uses multiple
40Item Exporters to group scraped items to different files according to the
41value of one of their fields::
42
43    from itemadapter import ItemAdapter
44    from scrapy.exporters import XmlItemExporter
45
46    class PerYearXmlExportPipeline:
47        """Distribute items across multiple XML files according to their 'year' field"""
48
49        def open_spider(self, spider):
50            self.year_to_exporter = {}
51
52        def close_spider(self, spider):
53            for exporter, xml_file in self.year_to_exporter.values():
54                exporter.finish_exporting()
55                xml_file.close()
56
57        def _exporter_for_item(self, item):
58            adapter = ItemAdapter(item)
59            year = adapter['year']
60            if year not in self.year_to_exporter:
61                xml_file = open(f'{year}.xml', 'wb')
62                exporter = XmlItemExporter(xml_file)
63                exporter.start_exporting()
64                self.year_to_exporter[year] = (exporter, xml_file)
65            return self.year_to_exporter[year][0]
66
67        def process_item(self, item, spider):
68            exporter = self._exporter_for_item(item)
69            exporter.export_item(item)
70            return item
71
72
73.. _topics-exporters-field-serialization:
74
75Serialization of item fields
76============================
77
78By default, the field values are passed unmodified to the underlying
79serialization library, and the decision of how to serialize them is delegated
80to each particular serialization library.
81
82However, you can customize how each field value is serialized *before it is
83passed to the serialization library*.
84
85There are two ways to customize how a field will be serialized, which are
86described next.
87
88.. _topics-exporters-serializers:
89
901. Declaring a serializer in the field
91--------------------------------------
92
93If you use :class:`~.Item` you can declare a serializer in the
94:ref:`field metadata <topics-items-fields>`. The serializer must be
95a callable which receives a value and returns its serialized form.
96
97Example::
98
99    import scrapy
100
101    def serialize_price(value):
102        return f'$ {str(value)}'
103
104    class Product(scrapy.Item):
105        name = scrapy.Field()
106        price = scrapy.Field(serializer=serialize_price)
107
108
1092. Overriding the serialize_field() method
110------------------------------------------
111
112You can also override the :meth:`~BaseItemExporter.serialize_field()` method to
113customize how your field value will be exported.
114
115Make sure you call the base class :meth:`~BaseItemExporter.serialize_field()` method
116after your custom code.
117
118Example::
119
120      from scrapy.exporter import XmlItemExporter
121
122      class ProductXmlExporter(XmlItemExporter):
123
124          def serialize_field(self, field, name, value):
125              if field == 'price':
126                  return f'$ {str(value)}'
127              return super().serialize_field(field, name, value)
128
129.. _topics-exporters-reference:
130
131Built-in Item Exporters reference
132=================================
133
134Here is a list of the Item Exporters bundled with Scrapy. Some of them contain
135output examples, which assume you're exporting these two items::
136
137    Item(name='Color TV', price='1200')
138    Item(name='DVD player', price='200')
139
140BaseItemExporter
141----------------
142
143.. class:: BaseItemExporter(fields_to_export=None, export_empty_fields=False, encoding='utf-8', indent=0, dont_fail=False)
144
145   This is the (abstract) base class for all Item Exporters. It provides
146   support for common features used by all (concrete) Item Exporters, such as
147   defining what fields to export, whether to export empty fields, or which
148   encoding to use.
149
150   These features can be configured through the ``__init__`` method arguments which
151   populate their respective instance attributes: :attr:`fields_to_export`,
152   :attr:`export_empty_fields`, :attr:`encoding`, :attr:`indent`.
153
154   .. versionadded:: 2.0
155      The *dont_fail* parameter.
156
157   .. method:: export_item(item)
158
159      Exports the given item. This method must be implemented in subclasses.
160
161   .. method:: serialize_field(field, name, value)
162
163      Return the serialized value for the given field. You can override this
164      method (in your custom Item Exporters) if you want to control how a
165      particular field or value will be serialized/exported.
166
167      By default, this method looks for a serializer :ref:`declared in the item
168      field <topics-exporters-serializers>` and returns the result of applying
169      that serializer to the value. If no serializer is found, it returns the
170      value unchanged.
171
172      :param field: the field being serialized. If the source :ref:`item object
173          <item-types>` does not define field metadata, *field* is an empty
174          :class:`dict`.
175      :type field: :class:`~scrapy.item.Field` object or a :class:`dict` instance
176
177      :param name: the name of the field being serialized
178      :type name: str
179
180      :param value: the value being serialized
181
182   .. method:: start_exporting()
183
184      Signal the beginning of the exporting process. Some exporters may use
185      this to generate some required header (for example, the
186      :class:`XmlItemExporter`). You must call this method before exporting any
187      items.
188
189   .. method:: finish_exporting()
190
191      Signal the end of the exporting process. Some exporters may use this to
192      generate some required footer (for example, the
193      :class:`XmlItemExporter`). You must always call this method after you
194      have no more items to export.
195
196   .. attribute:: fields_to_export
197
198      A list with the name of the fields that will be exported, or ``None`` if
199      you want to export all fields. Defaults to ``None``.
200
201      Some exporters (like :class:`CsvItemExporter`) respect the order of the
202      fields defined in this attribute.
203
204      When using :ref:`item objects <item-types>` that do not expose all their
205      possible fields, exporters that do not support exporting a different
206      subset of fields per item will only export the fields found in the first
207      item exported. Use ``fields_to_export`` to define all the fields to be
208      exported.
209
210   .. attribute:: export_empty_fields
211
212      Whether to include empty/unpopulated item fields in the exported data.
213      Defaults to ``False``. Some exporters (like :class:`CsvItemExporter`)
214      ignore this attribute and always export all empty fields.
215
216      This option is ignored for dict items.
217
218   .. attribute:: encoding
219
220      The output character encoding.
221
222   .. attribute:: indent
223
224      Amount of spaces used to indent the output on each level. Defaults to ``0``.
225
226      * ``indent=None`` selects the most compact representation,
227        all items in the same line with no indentation
228      * ``indent<=0`` each item on its own line, no indentation
229      * ``indent>0`` each item on its own line, indented with the provided numeric value
230
231PythonItemExporter
232------------------
233
234.. autoclass:: PythonItemExporter
235
236
237.. highlight:: none
238
239XmlItemExporter
240---------------
241
242.. class:: XmlItemExporter(file, item_element='item', root_element='items', **kwargs)
243
244   Exports items in XML format to the specified file object.
245
246   :param file: the file-like object to use for exporting the data. Its ``write`` method should
247                accept ``bytes`` (a disk file opened in binary mode, a ``io.BytesIO`` object, etc)
248
249   :param root_element: The name of root element in the exported XML.
250   :type root_element: str
251
252   :param item_element: The name of each item element in the exported XML.
253   :type item_element: str
254
255   The additional keyword arguments of this ``__init__`` method are passed to the
256   :class:`BaseItemExporter` ``__init__`` method.
257
258   A typical output of this exporter would be::
259
260       <?xml version="1.0" encoding="utf-8"?>
261       <items>
262         <item>
263           <name>Color TV</name>
264           <price>1200</price>
265        </item>
266         <item>
267           <name>DVD player</name>
268           <price>200</price>
269        </item>
270       </items>
271
272   Unless overridden in the :meth:`serialize_field` method, multi-valued fields are
273   exported by serializing each value inside a ``<value>`` element. This is for
274   convenience, as multi-valued fields are very common.
275
276   For example, the item::
277
278        Item(name=['John', 'Doe'], age='23')
279
280   Would be serialized as::
281
282       <?xml version="1.0" encoding="utf-8"?>
283       <items>
284         <item>
285           <name>
286             <value>John</value>
287             <value>Doe</value>
288           </name>
289           <age>23</age>
290         </item>
291       </items>
292
293CsvItemExporter
294---------------
295
296.. class:: CsvItemExporter(file, include_headers_line=True, join_multivalued=',', errors=None, **kwargs)
297
298   Exports items in CSV format to the given file-like object. If the
299   :attr:`fields_to_export` attribute is set, it will be used to define the
300   CSV columns and their order. The :attr:`export_empty_fields` attribute has
301   no effect on this exporter.
302
303   :param file: the file-like object to use for exporting the data. Its ``write`` method should
304                accept ``bytes`` (a disk file opened in binary mode, a ``io.BytesIO`` object, etc)
305
306   :param include_headers_line: If enabled, makes the exporter output a header
307      line with the field names taken from
308      :attr:`BaseItemExporter.fields_to_export` or the first exported item fields.
309   :type include_headers_line: bool
310
311   :param join_multivalued: The char (or chars) that will be used for joining
312      multi-valued fields, if found.
313   :type include_headers_line: str
314
315   :param errors: The optional string that specifies how encoding and decoding
316      errors are to be handled. For more information see
317      :class:`io.TextIOWrapper`.
318   :type errors: str
319
320   The additional keyword arguments of this ``__init__`` method are passed to the
321   :class:`BaseItemExporter` ``__init__`` method, and the leftover arguments to the
322   :func:`csv.writer` function, so you can use any :func:`csv.writer` function
323   argument to customize this exporter.
324
325   A typical output of this exporter would be::
326
327      product,price
328      Color TV,1200
329      DVD player,200
330
331PickleItemExporter
332------------------
333
334.. class:: PickleItemExporter(file, protocol=0, **kwargs)
335
336   Exports items in pickle format to the given file-like object.
337
338   :param file: the file-like object to use for exporting the data. Its ``write`` method should
339                accept ``bytes`` (a disk file opened in binary mode, a ``io.BytesIO`` object, etc)
340
341   :param protocol: The pickle protocol to use.
342   :type protocol: int
343
344   For more information, see :mod:`pickle`.
345
346   The additional keyword arguments of this ``__init__`` method are passed to the
347   :class:`BaseItemExporter` ``__init__`` method.
348
349   Pickle isn't a human readable format, so no output examples are provided.
350
351PprintItemExporter
352------------------
353
354.. class:: PprintItemExporter(file, **kwargs)
355
356   Exports items in pretty print format to the specified file object.
357
358   :param file: the file-like object to use for exporting the data. Its ``write`` method should
359                accept ``bytes`` (a disk file opened in binary mode, a ``io.BytesIO`` object, etc)
360
361   The additional keyword arguments of this ``__init__`` method are passed to the
362   :class:`BaseItemExporter` ``__init__`` method.
363
364   A typical output of this exporter would be::
365
366        {'name': 'Color TV', 'price': '1200'}
367        {'name': 'DVD player', 'price': '200'}
368
369   Longer lines (when present) are pretty-formatted.
370
371JsonItemExporter
372----------------
373
374.. class:: JsonItemExporter(file, **kwargs)
375
376   Exports items in JSON format to the specified file-like object, writing all
377   objects as a list of objects. The additional ``__init__`` method arguments are
378   passed to the :class:`BaseItemExporter` ``__init__`` method, and the leftover
379   arguments to the :class:`~json.JSONEncoder` ``__init__`` method, so you can use any
380   :class:`~json.JSONEncoder` ``__init__`` method argument to customize this exporter.
381
382   :param file: the file-like object to use for exporting the data. Its ``write`` method should
383                accept ``bytes`` (a disk file opened in binary mode, a ``io.BytesIO`` object, etc)
384
385   A typical output of this exporter would be::
386
387        [{"name": "Color TV", "price": "1200"},
388        {"name": "DVD player", "price": "200"}]
389
390   .. _json-with-large-data:
391
392   .. warning:: JSON is very simple and flexible serialization format, but it
393      doesn't scale well for large amounts of data since incremental (aka.
394      stream-mode) parsing is not well supported (if at all) among JSON parsers
395      (on any language), and most of them just parse the entire object in
396      memory. If you want the power and simplicity of JSON with a more
397      stream-friendly format, consider using :class:`JsonLinesItemExporter`
398      instead, or splitting the output in multiple chunks.
399
400JsonLinesItemExporter
401---------------------
402
403.. class:: JsonLinesItemExporter(file, **kwargs)
404
405   Exports items in JSON format to the specified file-like object, writing one
406   JSON-encoded item per line. The additional ``__init__`` method arguments are passed
407   to the :class:`BaseItemExporter` ``__init__`` method, and the leftover arguments to
408   the :class:`~json.JSONEncoder` ``__init__`` method, so you can use any
409   :class:`~json.JSONEncoder` ``__init__`` method argument to customize this exporter.
410
411   :param file: the file-like object to use for exporting the data. Its ``write`` method should
412                accept ``bytes`` (a disk file opened in binary mode, a ``io.BytesIO`` object, etc)
413
414   A typical output of this exporter would be::
415
416        {"name": "Color TV", "price": "1200"}
417        {"name": "DVD player", "price": "200"}
418
419   Unlike the one produced by :class:`JsonItemExporter`, the format produced by
420   this exporter is well suited for serializing large amounts of data.
421
422MarshalItemExporter
423-------------------
424
425.. autoclass:: MarshalItemExporter
426