1:mod:`csv` --- CSV File Reading and Writing
2===========================================
3
4.. module:: csv
5   :synopsis: Write and read tabular data to and from delimited files.
6
7.. sectionauthor:: Skip Montanaro <skip@pobox.com>
8
9**Source code:** :source:`Lib/csv.py`
10
11.. index::
12   single: csv
13   pair: data; tabular
14
15--------------
16
17The so-called CSV (Comma Separated Values) format is the most common import and
18export format for spreadsheets and databases.  CSV format was used for many
19years prior to attempts to describe the format in a standardized way in
20:rfc:`4180`.  The lack of a well-defined standard means that subtle differences
21often exist in the data produced and consumed by different applications.  These
22differences can make it annoying to process CSV files from multiple sources.
23Still, while the delimiters and quoting characters vary, the overall format is
24similar enough that it is possible to write a single module which can
25efficiently manipulate such data, hiding the details of reading and writing the
26data from the programmer.
27
28The :mod:`csv` module implements classes to read and write tabular data in CSV
29format.  It allows programmers to say, "write this data in the format preferred
30by Excel," or "read data from this file which was generated by Excel," without
31knowing the precise details of the CSV format used by Excel.  Programmers can
32also describe the CSV formats understood by other applications or define their
33own special-purpose CSV formats.
34
35The :mod:`csv` module's :class:`reader` and :class:`writer` objects read and
36write sequences.  Programmers can also read and write data in dictionary form
37using the :class:`DictReader` and :class:`DictWriter` classes.
38
39.. seealso::
40
41   :pep:`305` - CSV File API
42      The Python Enhancement Proposal which proposed this addition to Python.
43
44
45.. _csv-contents:
46
47Module Contents
48---------------
49
50The :mod:`csv` module defines the following functions:
51
52
53.. index::
54   single: universal newlines; csv.reader function
55
56.. function:: reader(csvfile, dialect='excel', **fmtparams)
57
58   Return a reader object which will iterate over lines in the given *csvfile*.
59   *csvfile* can be any object which supports the :term:`iterator` protocol and returns a
60   string each time its :meth:`!__next__` method is called --- :term:`file objects
61   <file object>` and list objects are both suitable.   If *csvfile* is a file object,
62   it should be opened with ``newline=''``. [1]_  An optional
63   *dialect* parameter can be given which is used to define a set of parameters
64   specific to a particular CSV dialect.  It may be an instance of a subclass of
65   the :class:`Dialect` class or one of the strings returned by the
66   :func:`list_dialects` function.  The other optional *fmtparams* keyword arguments
67   can be given to override individual formatting parameters in the current
68   dialect.  For full details about the dialect and formatting parameters, see
69   section :ref:`csv-fmt-params`.
70
71   Each row read from the csv file is returned as a list of strings.  No
72   automatic data type conversion is performed unless the ``QUOTE_NONNUMERIC`` format
73   option is specified (in which case unquoted fields are transformed into floats).
74
75   A short usage example::
76
77      >>> import csv
78      >>> with open('eggs.csv', newline='') as csvfile:
79      ...     spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
80      ...     for row in spamreader:
81      ...         print(', '.join(row))
82      Spam, Spam, Spam, Spam, Spam, Baked Beans
83      Spam, Lovely Spam, Wonderful Spam
84
85
86.. function:: writer(csvfile, dialect='excel', **fmtparams)
87
88   Return a writer object responsible for converting the user's data into delimited
89   strings on the given file-like object.  *csvfile* can be any object with a
90   :func:`write` method.  If *csvfile* is a file object, it should be opened with
91   ``newline=''`` [1]_.  An optional *dialect*
92   parameter can be given which is used to define a set of parameters specific to a
93   particular CSV dialect.  It may be an instance of a subclass of the
94   :class:`Dialect` class or one of the strings returned by the
95   :func:`list_dialects` function.  The other optional *fmtparams* keyword arguments
96   can be given to override individual formatting parameters in the current
97   dialect.  For full details about the dialect and formatting parameters, see
98   section :ref:`csv-fmt-params`. To make it
99   as easy as possible to interface with modules which implement the DB API, the
100   value :const:`None` is written as the empty string.  While this isn't a
101   reversible transformation, it makes it easier to dump SQL NULL data values to
102   CSV files without preprocessing the data returned from a ``cursor.fetch*`` call.
103   All other non-string data are stringified with :func:`str` before being written.
104
105   A short usage example::
106
107      import csv
108      with open('eggs.csv', 'w', newline='') as csvfile:
109          spamwriter = csv.writer(csvfile, delimiter=' ',
110                                  quotechar='|', quoting=csv.QUOTE_MINIMAL)
111          spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
112          spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
113
114
115.. function:: register_dialect(name[, dialect[, **fmtparams]])
116
117   Associate *dialect* with *name*.  *name* must be a string. The
118   dialect can be specified either by passing a sub-class of :class:`Dialect`, or
119   by *fmtparams* keyword arguments, or both, with keyword arguments overriding
120   parameters of the dialect. For full details about the dialect and formatting
121   parameters, see section :ref:`csv-fmt-params`.
122
123
124.. function:: unregister_dialect(name)
125
126   Delete the dialect associated with *name* from the dialect registry.  An
127   :exc:`Error` is raised if *name* is not a registered dialect name.
128
129
130.. function:: get_dialect(name)
131
132   Return the dialect associated with *name*.  An :exc:`Error` is raised if
133   *name* is not a registered dialect name.  This function returns an immutable
134   :class:`Dialect`.
135
136.. function:: list_dialects()
137
138   Return the names of all registered dialects.
139
140
141.. function:: field_size_limit([new_limit])
142
143   Returns the current maximum field size allowed by the parser. If *new_limit* is
144   given, this becomes the new limit.
145
146
147The :mod:`csv` module defines the following classes:
148
149.. class:: DictReader(f, fieldnames=None, restkey=None, restval=None, \
150                      dialect='excel', *args, **kwds)
151
152   Create an object that operates like a regular reader but maps the
153   information in each row to a :class:`dict` whose keys are given by the
154   optional *fieldnames* parameter.
155
156   The *fieldnames* parameter is a :term:`sequence`.  If *fieldnames* is
157   omitted, the values in the first row of file *f* will be used as the
158   fieldnames.  Regardless of how the fieldnames are determined, the
159   dictionary preserves their original ordering.
160
161   If a row has more fields than fieldnames, the remaining data is put in a
162   list and stored with the fieldname specified by *restkey* (which defaults
163   to ``None``).  If a non-blank row has fewer fields than fieldnames, the
164   missing values are filled-in with the value of *restval* (which defaults
165   to ``None``).
166
167   All other optional or keyword arguments are passed to the underlying
168   :class:`reader` instance.
169
170   .. versionchanged:: 3.6
171      Returned rows are now of type :class:`OrderedDict`.
172
173   .. versionchanged:: 3.8
174      Returned rows are now of type :class:`dict`.
175
176   A short usage example::
177
178       >>> import csv
179       >>> with open('names.csv', newline='') as csvfile:
180       ...     reader = csv.DictReader(csvfile)
181       ...     for row in reader:
182       ...         print(row['first_name'], row['last_name'])
183       ...
184       Eric Idle
185       John Cleese
186
187       >>> print(row)
188       {'first_name': 'John', 'last_name': 'Cleese'}
189
190
191.. class:: DictWriter(f, fieldnames, restval='', extrasaction='raise', \
192                      dialect='excel', *args, **kwds)
193
194   Create an object which operates like a regular writer but maps dictionaries
195   onto output rows.  The *fieldnames* parameter is a :mod:`sequence
196   <collections.abc>` of keys that identify the order in which values in the
197   dictionary passed to the :meth:`writerow` method are written to file
198   *f*.  The optional *restval* parameter specifies the value to be
199   written if the dictionary is missing a key in *fieldnames*.  If the
200   dictionary passed to the :meth:`writerow` method contains a key not found in
201   *fieldnames*, the optional *extrasaction* parameter indicates what action to
202   take.
203   If it is set to ``'raise'``, the default value, a :exc:`ValueError`
204   is raised.
205   If it is set to ``'ignore'``, extra values in the dictionary are ignored.
206   Any other optional or keyword arguments are passed to the underlying
207   :class:`writer` instance.
208
209   Note that unlike the :class:`DictReader` class, the *fieldnames* parameter
210   of the :class:`DictWriter` class is not optional.
211
212   A short usage example::
213
214       import csv
215
216       with open('names.csv', 'w', newline='') as csvfile:
217           fieldnames = ['first_name', 'last_name']
218           writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
219
220           writer.writeheader()
221           writer.writerow({'first_name': 'Baked', 'last_name': 'Beans'})
222           writer.writerow({'first_name': 'Lovely', 'last_name': 'Spam'})
223           writer.writerow({'first_name': 'Wonderful', 'last_name': 'Spam'})
224
225
226.. class:: Dialect
227
228   The :class:`Dialect` class is a container class relied on primarily for its
229   attributes, which are used to define the parameters for a specific
230   :class:`reader` or :class:`writer` instance.
231
232
233.. class:: excel()
234
235   The :class:`excel` class defines the usual properties of an Excel-generated CSV
236   file.  It is registered with the dialect name ``'excel'``.
237
238
239.. class:: excel_tab()
240
241   The :class:`excel_tab` class defines the usual properties of an Excel-generated
242   TAB-delimited file.  It is registered with the dialect name ``'excel-tab'``.
243
244
245.. class:: unix_dialect()
246
247   The :class:`unix_dialect` class defines the usual properties of a CSV file
248   generated on UNIX systems, i.e. using ``'\n'`` as line terminator and quoting
249   all fields.  It is registered with the dialect name ``'unix'``.
250
251   .. versionadded:: 3.2
252
253
254.. class:: Sniffer()
255
256   The :class:`Sniffer` class is used to deduce the format of a CSV file.
257
258   The :class:`Sniffer` class provides two methods:
259
260   .. method:: sniff(sample, delimiters=None)
261
262      Analyze the given *sample* and return a :class:`Dialect` subclass
263      reflecting the parameters found.  If the optional *delimiters* parameter
264      is given, it is interpreted as a string containing possible valid
265      delimiter characters.
266
267
268   .. method:: has_header(sample)
269
270      Analyze the sample text (presumed to be in CSV format) and return
271      :const:`True` if the first row appears to be a series of column headers.
272
273An example for :class:`Sniffer` use::
274
275   with open('example.csv', newline='') as csvfile:
276       dialect = csv.Sniffer().sniff(csvfile.read(1024))
277       csvfile.seek(0)
278       reader = csv.reader(csvfile, dialect)
279       # ... process CSV file contents here ...
280
281
282The :mod:`csv` module defines the following constants:
283
284.. data:: QUOTE_ALL
285
286   Instructs :class:`writer` objects to quote all fields.
287
288
289.. data:: QUOTE_MINIMAL
290
291   Instructs :class:`writer` objects to only quote those fields which contain
292   special characters such as *delimiter*, *quotechar* or any of the characters in
293   *lineterminator*.
294
295
296.. data:: QUOTE_NONNUMERIC
297
298   Instructs :class:`writer` objects to quote all non-numeric fields.
299
300   Instructs the reader to convert all non-quoted fields to type *float*.
301
302
303.. data:: QUOTE_NONE
304
305   Instructs :class:`writer` objects to never quote fields.  When the current
306   *delimiter* occurs in output data it is preceded by the current *escapechar*
307   character.  If *escapechar* is not set, the writer will raise :exc:`Error` if
308   any characters that require escaping are encountered.
309
310   Instructs :class:`reader` to perform no special processing of quote characters.
311
312The :mod:`csv` module defines the following exception:
313
314
315.. exception:: Error
316
317   Raised by any of the functions when an error is detected.
318
319.. _csv-fmt-params:
320
321Dialects and Formatting Parameters
322----------------------------------
323
324To make it easier to specify the format of input and output records, specific
325formatting parameters are grouped together into dialects.  A dialect is a
326subclass of the :class:`Dialect` class having a set of specific methods and a
327single :meth:`validate` method.  When creating :class:`reader` or
328:class:`writer` objects, the programmer can specify a string or a subclass of
329the :class:`Dialect` class as the dialect parameter.  In addition to, or instead
330of, the *dialect* parameter, the programmer can also specify individual
331formatting parameters, which have the same names as the attributes defined below
332for the :class:`Dialect` class.
333
334Dialects support the following attributes:
335
336
337.. attribute:: Dialect.delimiter
338
339   A one-character string used to separate fields.  It defaults to ``','``.
340
341
342.. attribute:: Dialect.doublequote
343
344   Controls how instances of *quotechar* appearing inside a field should
345   themselves be quoted.  When :const:`True`, the character is doubled. When
346   :const:`False`, the *escapechar* is used as a prefix to the *quotechar*.  It
347   defaults to :const:`True`.
348
349   On output, if *doublequote* is :const:`False` and no *escapechar* is set,
350   :exc:`Error` is raised if a *quotechar* is found in a field.
351
352
353.. attribute:: Dialect.escapechar
354
355   A one-character string used by the writer to escape the *delimiter* if *quoting*
356   is set to :const:`QUOTE_NONE` and the *quotechar* if *doublequote* is
357   :const:`False`. On reading, the *escapechar* removes any special meaning from
358   the following character. It defaults to :const:`None`, which disables escaping.
359
360
361.. attribute:: Dialect.lineterminator
362
363   The string used to terminate lines produced by the :class:`writer`. It defaults
364   to ``'\r\n'``.
365
366   .. note::
367
368      The :class:`reader` is hard-coded to recognise either ``'\r'`` or ``'\n'`` as
369      end-of-line, and ignores *lineterminator*. This behavior may change in the
370      future.
371
372
373.. attribute:: Dialect.quotechar
374
375   A one-character string used to quote fields containing special characters, such
376   as the *delimiter* or *quotechar*, or which contain new-line characters.  It
377   defaults to ``'"'``.
378
379
380.. attribute:: Dialect.quoting
381
382   Controls when quotes should be generated by the writer and recognised by the
383   reader.  It can take on any of the :const:`QUOTE_\*` constants (see section
384   :ref:`csv-contents`) and defaults to :const:`QUOTE_MINIMAL`.
385
386
387.. attribute:: Dialect.skipinitialspace
388
389   When :const:`True`, whitespace immediately following the *delimiter* is ignored.
390   The default is :const:`False`.
391
392
393.. attribute:: Dialect.strict
394
395   When ``True``, raise exception :exc:`Error` on bad CSV input.
396   The default is ``False``.
397
398Reader Objects
399--------------
400
401Reader objects (:class:`DictReader` instances and objects returned by the
402:func:`reader` function) have the following public methods:
403
404.. method:: csvreader.__next__()
405
406   Return the next row of the reader's iterable object as a list (if the object
407   was returned from :func:`reader`) or a dict (if it is a :class:`DictReader`
408   instance), parsed according to the current dialect.  Usually you should call
409   this as ``next(reader)``.
410
411
412Reader objects have the following public attributes:
413
414.. attribute:: csvreader.dialect
415
416   A read-only description of the dialect in use by the parser.
417
418
419.. attribute:: csvreader.line_num
420
421   The number of lines read from the source iterator. This is not the same as the
422   number of records returned, as records can span multiple lines.
423
424
425DictReader objects have the following public attribute:
426
427.. attribute:: csvreader.fieldnames
428
429   If not passed as a parameter when creating the object, this attribute is
430   initialized upon first access or when the first record is read from the
431   file.
432
433
434
435Writer Objects
436--------------
437
438:class:`Writer` objects (:class:`DictWriter` instances and objects returned by
439the :func:`writer` function) have the following public methods.  A *row* must be
440an iterable of strings or numbers for :class:`Writer` objects and a dictionary
441mapping fieldnames to strings or numbers (by passing them through :func:`str`
442first) for :class:`DictWriter` objects.  Note that complex numbers are written
443out surrounded by parens. This may cause some problems for other programs which
444read CSV files (assuming they support complex numbers at all).
445
446
447.. method:: csvwriter.writerow(row)
448
449   Write the *row* parameter to the writer's file object, formatted according to
450   the current dialect. Return the return value of the call to the *write* method
451   of the underlying file object.
452
453   .. versionchanged:: 3.5
454      Added support of arbitrary iterables.
455
456.. method:: csvwriter.writerows(rows)
457
458   Write all elements in *rows* (an iterable of *row* objects as described
459   above) to the writer's file object, formatted according to the current
460   dialect.
461
462Writer objects have the following public attribute:
463
464
465.. attribute:: csvwriter.dialect
466
467   A read-only description of the dialect in use by the writer.
468
469
470DictWriter objects have the following public method:
471
472
473.. method:: DictWriter.writeheader()
474
475   Write a row with the field names (as specified in the constructor) to
476   the writer's file object, formatted according to the current dialect. Return
477   the return value of the :meth:`csvwriter.writerow` call used internally.
478
479   .. versionadded:: 3.2
480   .. versionchanged:: 3.8
481      :meth:`writeheader` now also returns the value returned by
482      the :meth:`csvwriter.writerow` method it uses internally.
483
484
485.. _csv-examples:
486
487Examples
488--------
489
490The simplest example of reading a CSV file::
491
492   import csv
493   with open('some.csv', newline='') as f:
494       reader = csv.reader(f)
495       for row in reader:
496           print(row)
497
498Reading a file with an alternate format::
499
500   import csv
501   with open('passwd', newline='') as f:
502       reader = csv.reader(f, delimiter=':', quoting=csv.QUOTE_NONE)
503       for row in reader:
504           print(row)
505
506The corresponding simplest possible writing example is::
507
508   import csv
509   with open('some.csv', 'w', newline='') as f:
510       writer = csv.writer(f)
511       writer.writerows(someiterable)
512
513Since :func:`open` is used to open a CSV file for reading, the file
514will by default be decoded into unicode using the system default
515encoding (see :func:`locale.getpreferredencoding`).  To decode a file
516using a different encoding, use the ``encoding`` argument of open::
517
518   import csv
519   with open('some.csv', newline='', encoding='utf-8') as f:
520       reader = csv.reader(f)
521       for row in reader:
522           print(row)
523
524The same applies to writing in something other than the system default
525encoding: specify the encoding argument when opening the output file.
526
527Registering a new dialect::
528
529   import csv
530   csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)
531   with open('passwd', newline='') as f:
532       reader = csv.reader(f, 'unixpwd')
533
534A slightly more advanced use of the reader --- catching and reporting errors::
535
536   import csv, sys
537   filename = 'some.csv'
538   with open(filename, newline='') as f:
539       reader = csv.reader(f)
540       try:
541           for row in reader:
542               print(row)
543       except csv.Error as e:
544           sys.exit('file {}, line {}: {}'.format(filename, reader.line_num, e))
545
546And while the module doesn't directly support parsing strings, it can easily be
547done::
548
549   import csv
550   for row in csv.reader(['one,two,three']):
551       print(row)
552
553
554.. rubric:: Footnotes
555
556.. [1] If ``newline=''`` is not specified, newlines embedded inside quoted fields
557   will not be interpreted correctly, and on platforms that use ``\r\n`` linendings
558   on write an extra ``\r`` will be added.  It should always be safe to specify
559   ``newline=''``, since the csv module does its own
560   (:term:`universal <universal newlines>`) newline handling.
561