1:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5   :synopsis: Read and write tar-format archive files.
6
7
8.. versionadded:: 2.3
9
10.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
11.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
12
13**Source code:** :source:`Lib/tarfile.py`
14
15--------------
16
17The :mod:`tarfile` module makes it possible to read and write tar
18archives, including those using gzip or bz2 compression.
19Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the
20higher-level functions in :ref:`shutil <archiving-operations>`.
21
22Some facts and figures:
23
24* reads and writes :mod:`gzip` and :mod:`bz2` compressed archives
25  if the respective modules are available.
26
27* read/write support for the POSIX.1-1988 (ustar) format.
28
29* read/write support for the GNU tar format including *longname* and *longlink*
30  extensions, read-only support for the *sparse* extension.
31
32* read/write support for the POSIX.1-2001 (pax) format.
33
34  .. versionadded:: 2.6
35
36* handles directories, regular files, hardlinks, symbolic links, fifos,
37  character devices and block devices and is able to acquire and restore file
38  information like timestamp, access permissions and owner.
39
40.. note::
41   Handling of multi-stream bzip2 files is not supported.  Modules such as
42   `bz2file <https://github.com/nvawda/bz2file>`_ let you overcome this.
43
44
45.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
46
47   Return a :class:`TarFile` object for the pathname *name*. For detailed
48   information on :class:`TarFile` objects and the keyword arguments that are
49   allowed, see :ref:`tarfile-objects`.
50
51   *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
52   to ``'r'``. Here is a full list of mode combinations:
53
54   +------------------+---------------------------------------------+
55   | mode             | action                                      |
56   +==================+=============================================+
57   | ``'r' or 'r:*'`` | Open for reading with transparent           |
58   |                  | compression (recommended).                  |
59   +------------------+---------------------------------------------+
60   | ``'r:'``         | Open for reading exclusively without        |
61   |                  | compression.                                |
62   +------------------+---------------------------------------------+
63   | ``'r:gz'``       | Open for reading with gzip compression.     |
64   +------------------+---------------------------------------------+
65   | ``'r:bz2'``      | Open for reading with bzip2 compression.    |
66   +------------------+---------------------------------------------+
67   | ``'a' or 'a:'``  | Open for appending with no compression. The |
68   |                  | file is created if it does not exist.       |
69   +------------------+---------------------------------------------+
70   | ``'w' or 'w:'``  | Open for uncompressed writing.              |
71   +------------------+---------------------------------------------+
72   | ``'w:gz'``       | Open for gzip compressed writing.           |
73   +------------------+---------------------------------------------+
74   | ``'w:bz2'``      | Open for bzip2 compressed writing.          |
75   +------------------+---------------------------------------------+
76
77   Note that ``'a:gz'`` or ``'a:bz2'`` is not possible. If *mode* is not suitable
78   to open a certain (compressed) file for reading, :exc:`ReadError` is raised. Use
79   *mode* ``'r'`` to avoid this.  If a compression method is not supported,
80   :exc:`CompressionError` is raised.
81
82   If *fileobj* is specified, it is used as an alternative to a file object opened
83   for *name*. It is supposed to be at position 0.
84
85   For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, :func:`tarfile.open`
86   accepts the keyword argument *compresslevel* (default ``9``) to
87   specify the compression level of the file.
88
89   For special purposes, there is a second format for *mode*:
90   ``'filemode|[compression]'``.  :func:`tarfile.open` will return a :class:`TarFile`
91   object that processes its data as a stream of blocks.  No random seeking will
92   be done on the file. If given, *fileobj* may be any object that has a
93   :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
94   specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
95   in combination with e.g. ``sys.stdin``, a socket file object or a tape
96   device. However, such a :class:`TarFile` object is limited in that it does
97   not allow random access, see :ref:`tar-examples`.  The currently
98   possible modes:
99
100   +-------------+--------------------------------------------+
101   | Mode        | Action                                     |
102   +=============+============================================+
103   | ``'r|*'``   | Open a *stream* of tar blocks for reading  |
104   |             | with transparent compression.              |
105   +-------------+--------------------------------------------+
106   | ``'r|'``    | Open a *stream* of uncompressed tar blocks |
107   |             | for reading.                               |
108   +-------------+--------------------------------------------+
109   | ``'r|gz'``  | Open a gzip compressed *stream* for        |
110   |             | reading.                                   |
111   +-------------+--------------------------------------------+
112   | ``'r|bz2'`` | Open a bzip2 compressed *stream* for       |
113   |             | reading.                                   |
114   +-------------+--------------------------------------------+
115   | ``'w|'``    | Open an uncompressed *stream* for writing. |
116   +-------------+--------------------------------------------+
117   | ``'w|gz'``  | Open a gzip compressed *stream* for        |
118   |             | writing.                                   |
119   +-------------+--------------------------------------------+
120   | ``'w|bz2'`` | Open a bzip2 compressed *stream* for       |
121   |             | writing.                                   |
122   +-------------+--------------------------------------------+
123
124
125.. class:: TarFile
126
127   Class for reading and writing tar archives. Do not use this class directly,
128   better use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
129
130
131.. function:: is_tarfile(name)
132
133   Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
134   module can read.
135
136
137.. class:: TarFileCompat(filename, mode='r', compression=TAR_PLAIN)
138
139   Class for limited access to tar archives with a :mod:`zipfile`\ -like interface.
140   Please consult the documentation of the :mod:`zipfile` module for more details.
141   *compression* must be one of the following constants:
142
143
144   .. data:: TAR_PLAIN
145
146      Constant for an uncompressed tar archive.
147
148
149   .. data:: TAR_GZIPPED
150
151      Constant for a :mod:`gzip` compressed tar archive.
152
153
154   .. deprecated:: 2.6
155      The :class:`TarFileCompat` class has been removed in Python 3.
156
157
158.. exception:: TarError
159
160   Base class for all :mod:`tarfile` exceptions.
161
162
163.. exception:: ReadError
164
165   Is raised when a tar archive is opened, that either cannot be handled by the
166   :mod:`tarfile` module or is somehow invalid.
167
168
169.. exception:: CompressionError
170
171   Is raised when a compression method is not supported or when the data cannot be
172   decoded properly.
173
174
175.. exception:: StreamError
176
177   Is raised for the limitations that are typical for stream-like :class:`TarFile`
178   objects.
179
180
181.. exception:: ExtractError
182
183   Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
184   :attr:`TarFile.errorlevel`\ ``== 2``.
185
186
187The following constants are available at the module level:
188
189.. data:: ENCODING
190
191   The default character encoding: ``'utf-8'`` on Windows, the value returned by
192   :func:`sys.getfilesystemencoding` otherwise.
193
194
195.. exception:: HeaderError
196
197   Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
198
199   .. versionadded:: 2.6
200
201
202Each of the following constants defines a tar archive format that the
203:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
204details.
205
206
207.. data:: USTAR_FORMAT
208
209   POSIX.1-1988 (ustar) format.
210
211
212.. data:: GNU_FORMAT
213
214   GNU tar format.
215
216
217.. data:: PAX_FORMAT
218
219   POSIX.1-2001 (pax) format.
220
221
222.. data:: DEFAULT_FORMAT
223
224   The default format for creating archives. This is currently :const:`GNU_FORMAT`.
225
226
227.. seealso::
228
229   Module :mod:`zipfile`
230      Documentation of the :mod:`zipfile` standard module.
231
232   :ref:`archiving-operations`
233      Documentation of the higher-level archiving facilities provided by the
234      standard :mod:`shutil` module.
235
236   `GNU tar manual, Basic Tar Format <https://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
237      Documentation for tar archive files, including GNU tar extensions.
238
239
240.. _tarfile-objects:
241
242TarFile Objects
243---------------
244
245The :class:`TarFile` object provides an interface to a tar archive. A tar
246archive is a sequence of blocks. An archive member (a stored file) is made up of
247a header block followed by data blocks. It is possible to store a file in a tar
248archive several times. Each archive member is represented by a :class:`TarInfo`
249object, see :ref:`tarinfo-objects` for details.
250
251A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
252statement. It will automatically be closed when the block is completed. Please
253note that in the event of an exception an archive opened for writing will not
254be finalized; only the internally used file object will be closed. See the
255:ref:`tar-examples` section for a use case.
256
257.. versionadded:: 2.7
258   Added support for the context management protocol.
259
260.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors=None, pax_headers=None, debug=0, errorlevel=0)
261
262   All following arguments are optional and can be accessed as instance attributes
263   as well.
264
265   *name* is the pathname of the archive. It can be omitted if *fileobj* is given.
266   In this case, the file object's :attr:`name` attribute is used if it exists.
267
268   *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
269   data to an existing file or ``'w'`` to create a new file overwriting an existing
270   one.
271
272   If *fileobj* is given, it is used for reading or writing data. If it can be
273   determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
274   from position 0.
275
276   .. note::
277
278      *fileobj* is not closed, when :class:`TarFile` is closed.
279
280   *format* controls the archive format. It must be one of the constants
281   :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
282   defined at module level.
283
284   .. versionadded:: 2.6
285
286   The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
287   with a different one.
288
289   .. versionadded:: 2.6
290
291   If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
292   is :const:`True`, add the content of the target files to the archive. This has no
293   effect on systems that do not support symbolic links.
294
295   If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
296   If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
297   as possible. This is only useful for reading concatenated or damaged archives.
298
299   *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
300   messages). The messages are written to ``sys.stderr``.
301
302   If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
303   Nevertheless, they appear as error messages in the debug output, when debugging
304   is enabled.  If ``1``, all *fatal* errors are raised as :exc:`OSError` or
305   :exc:`IOError` exceptions. If ``2``, all *non-fatal* errors are raised as
306   :exc:`TarError` exceptions as well.
307
308   The *encoding* and *errors* arguments control the way strings are converted to
309   unicode objects and vice versa. The default settings will work for most users.
310   See section :ref:`tar-unicode` for in-depth information.
311
312   .. versionadded:: 2.6
313
314   The *pax_headers* argument is an optional dictionary of unicode strings which
315   will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
316
317   .. versionadded:: 2.6
318
319
320.. classmethod:: TarFile.open(...)
321
322   Alternative constructor. The :func:`tarfile.open` function is actually a
323   shortcut to this classmethod.
324
325
326.. method:: TarFile.getmember(name)
327
328   Return a :class:`TarInfo` object for member *name*. If *name* can not be found
329   in the archive, :exc:`KeyError` is raised.
330
331   .. note::
332
333      If a member occurs more than once in the archive, its last occurrence is assumed
334      to be the most up-to-date version.
335
336
337.. method:: TarFile.getmembers()
338
339   Return the members of the archive as a list of :class:`TarInfo` objects. The
340   list has the same order as the members in the archive.
341
342
343.. method:: TarFile.getnames()
344
345   Return the members as a list of their names. It has the same order as the list
346   returned by :meth:`getmembers`.
347
348
349.. method:: TarFile.list(verbose=True)
350
351   Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
352   only the names of the members are printed. If it is :const:`True`, output
353   similar to that of :program:`ls -l` is produced.
354
355
356.. method:: TarFile.next()
357
358   Return the next member of the archive as a :class:`TarInfo` object, when
359   :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
360   available.
361
362
363.. method:: TarFile.extractall(path=".", members=None)
364
365   Extract all members from the archive to the current working directory or
366   directory *path*. If optional *members* is given, it must be a subset of the
367   list returned by :meth:`getmembers`. Directory information like owner,
368   modification time and permissions are set after all members have been extracted.
369   This is done to work around two problems: A directory's modification time is
370   reset each time a file is created in it. And, if a directory's permissions do
371   not allow writing, extracting files to it will fail.
372
373   .. warning::
374
375      Never extract archives from untrusted sources without prior inspection.
376      It is possible that files are created outside of *path*, e.g. members
377      that have absolute filenames starting with ``"/"`` or filenames with two
378      dots ``".."``.
379
380   .. versionadded:: 2.5
381
382
383.. method:: TarFile.extract(member, path="")
384
385   Extract a member from the archive to the current working directory, using its
386   full name. Its file information is extracted as accurately as possible. *member*
387   may be a filename or a :class:`TarInfo` object. You can specify a different
388   directory using *path*.
389
390   .. note::
391
392      The :meth:`extract` method does not take care of several extraction issues.
393      In most cases you should consider using the :meth:`extractall` method.
394
395   .. warning::
396
397      See the warning for :meth:`extractall`.
398
399
400.. method:: TarFile.extractfile(member)
401
402   Extract a member from the archive as a file object. *member* may be a filename
403   or a :class:`TarInfo` object. If *member* is a regular file, a file-like object
404   is returned. If *member* is a link, a file-like object is constructed from the
405   link's target. If *member* is none of the above, :const:`None` is returned.
406
407   .. note::
408
409      The file-like object is read-only.  It provides the methods
410      :meth:`read`, :meth:`readline`, :meth:`readlines`, :meth:`seek`, :meth:`tell`,
411      and :meth:`close`, and also supports iteration over its lines.
412
413
414.. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None, filter=None)
415
416   Add the file *name* to the archive. *name* may be any type of file (directory,
417   fifo, symbolic link, etc.). If given, *arcname* specifies an alternative name
418   for the file in the archive. Directories are added recursively by default. This
419   can be avoided by setting *recursive* to :const:`False`. If *exclude* is given
420   it must be a function that takes one filename argument and returns a boolean
421   value. Depending on this value the respective file is either excluded
422   (:const:`True`) or added (:const:`False`). If *filter* is specified it must
423   be a function that takes a :class:`TarInfo` object argument and returns the
424   changed :class:`TarInfo` object. If it instead returns :const:`None` the :class:`TarInfo`
425   object will be excluded from the archive. See :ref:`tar-examples` for an
426   example.
427
428   .. versionchanged:: 2.6
429      Added the *exclude* parameter.
430
431   .. versionchanged:: 2.7
432      Added the *filter* parameter.
433
434   .. deprecated:: 2.7
435      The *exclude* parameter is deprecated, please use the *filter* parameter
436      instead.  For maximum portability, *filter* should be used as a keyword
437      argument rather than as a positional argument so that code won't be
438      affected when *exclude* is ultimately removed.
439
440
441.. method:: TarFile.addfile(tarinfo, fileobj=None)
442
443   Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
444   ``tarinfo.size`` bytes are read from it and added to the archive.  You can
445   create :class:`TarInfo` objects directly, or by using :meth:`gettarinfo`.
446
447   .. note::
448      On Windows platforms, *fileobj* should always be opened with mode ``'rb'`` to
449      avoid irritation about the file size.
450
451
452.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
453
454   Create a :class:`TarInfo` object from the result of :func:`os.stat` or
455   equivalent on an existing file.  The file is either named by *name*, or
456   specified as a file object *fileobj* with a file descriptor.  If
457   given, *arcname* specifies an alternative name for the file in the
458   archive, otherwise, the name is taken from *fileobj*’s
459   :attr:`~file.name` attribute, or the *name* argument.
460
461   You can modify some
462   of the :class:`TarInfo`’s attributes before you add it using :meth:`addfile`.
463   If the file object is not an ordinary file object positioned at the
464   beginning of the file, attributes such as :attr:`~TarInfo.size` may need
465   modifying.  This is the case for objects such as :class:`~gzip.GzipFile`.
466   The :attr:`~TarInfo.name` may also be modified, in which case *arcname*
467   could be a dummy string.
468
469
470.. method:: TarFile.close()
471
472   Close the :class:`TarFile`. In write mode, two finishing zero blocks are
473   appended to the archive.
474
475
476.. attribute:: TarFile.posix
477
478   Setting this to :const:`True` is equivalent to setting the :attr:`format`
479   attribute to :const:`USTAR_FORMAT`, :const:`False` is equivalent to
480   :const:`GNU_FORMAT`.
481
482   .. versionchanged:: 2.4
483      *posix* defaults to :const:`False`.
484
485   .. deprecated:: 2.6
486      Use the :attr:`format` attribute instead.
487
488
489.. attribute:: TarFile.pax_headers
490
491   A dictionary containing key-value pairs of pax global headers.
492
493   .. versionadded:: 2.6
494
495
496.. _tarinfo-objects:
497
498TarInfo Objects
499---------------
500
501A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
502from storing all required attributes of a file (like file type, size, time,
503permissions, owner etc.), it provides some useful methods to determine its type.
504It does *not* contain the file's data itself.
505
506:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
507:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
508
509
510.. class:: TarInfo(name="")
511
512   Create a :class:`TarInfo` object.
513
514
515.. method:: TarInfo.frombuf(buf)
516
517   Create and return a :class:`TarInfo` object from string buffer *buf*.
518
519   .. versionadded:: 2.6
520      Raises :exc:`HeaderError` if the buffer is invalid..
521
522
523.. method:: TarInfo.fromtarfile(tarfile)
524
525   Read the next member from the :class:`TarFile` object *tarfile* and return it as
526   a :class:`TarInfo` object.
527
528   .. versionadded:: 2.6
529
530
531.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='strict')
532
533   Create a string buffer from a :class:`TarInfo` object. For information on the
534   arguments see the constructor of the :class:`TarFile` class.
535
536   .. versionchanged:: 2.6
537      The arguments were added.
538
539A ``TarInfo`` object has the following public data attributes:
540
541
542.. attribute:: TarInfo.name
543
544   Name of the archive member.
545
546
547.. attribute:: TarInfo.size
548
549   Size in bytes.
550
551
552.. attribute:: TarInfo.mtime
553
554   Time of last modification.
555
556
557.. attribute:: TarInfo.mode
558
559   Permission bits.
560
561
562.. attribute:: TarInfo.type
563
564   File type.  *type* is usually one of these constants: :const:`REGTYPE`,
565   :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
566   :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
567   :const:`GNUTYPE_SPARSE`.  To determine the type of a :class:`TarInfo` object
568   more conveniently, use the ``is*()`` methods below.
569
570
571.. attribute:: TarInfo.linkname
572
573   Name of the target file name, which is only present in :class:`TarInfo` objects
574   of type :const:`LNKTYPE` and :const:`SYMTYPE`.
575
576
577.. attribute:: TarInfo.uid
578
579   User ID of the user who originally stored this member.
580
581
582.. attribute:: TarInfo.gid
583
584   Group ID of the user who originally stored this member.
585
586
587.. attribute:: TarInfo.uname
588
589   User name.
590
591
592.. attribute:: TarInfo.gname
593
594   Group name.
595
596
597.. attribute:: TarInfo.pax_headers
598
599   A dictionary containing key-value pairs of an associated pax extended header.
600
601   .. versionadded:: 2.6
602
603A :class:`TarInfo` object also provides some convenient query methods:
604
605
606.. method:: TarInfo.isfile()
607
608   Return :const:`True` if the :class:`Tarinfo` object is a regular file.
609
610
611.. method:: TarInfo.isreg()
612
613   Same as :meth:`isfile`.
614
615
616.. method:: TarInfo.isdir()
617
618   Return :const:`True` if it is a directory.
619
620
621.. method:: TarInfo.issym()
622
623   Return :const:`True` if it is a symbolic link.
624
625
626.. method:: TarInfo.islnk()
627
628   Return :const:`True` if it is a hard link.
629
630
631.. method:: TarInfo.ischr()
632
633   Return :const:`True` if it is a character device.
634
635
636.. method:: TarInfo.isblk()
637
638   Return :const:`True` if it is a block device.
639
640
641.. method:: TarInfo.isfifo()
642
643   Return :const:`True` if it is a FIFO.
644
645
646.. method:: TarInfo.isdev()
647
648   Return :const:`True` if it is one of character device, block device or FIFO.
649
650
651.. _tar-examples:
652
653Examples
654--------
655
656How to extract an entire tar archive to the current working directory::
657
658   import tarfile
659   tar = tarfile.open("sample.tar.gz")
660   tar.extractall()
661   tar.close()
662
663How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
664a generator function instead of a list::
665
666   import os
667   import tarfile
668
669   def py_files(members):
670       for tarinfo in members:
671           if os.path.splitext(tarinfo.name)[1] == ".py":
672               yield tarinfo
673
674   tar = tarfile.open("sample.tar.gz")
675   tar.extractall(members=py_files(tar))
676   tar.close()
677
678How to create an uncompressed tar archive from a list of filenames::
679
680   import tarfile
681   tar = tarfile.open("sample.tar", "w")
682   for name in ["foo", "bar", "quux"]:
683       tar.add(name)
684   tar.close()
685
686The same example using the :keyword:`with` statement::
687
688    import tarfile
689    with tarfile.open("sample.tar", "w") as tar:
690        for name in ["foo", "bar", "quux"]:
691            tar.add(name)
692
693How to read a gzip compressed tar archive and display some member information::
694
695   import tarfile
696   tar = tarfile.open("sample.tar.gz", "r:gz")
697   for tarinfo in tar:
698       print tarinfo.name, "is", tarinfo.size, "bytes in size and is",
699       if tarinfo.isreg():
700           print "a regular file."
701       elif tarinfo.isdir():
702           print "a directory."
703       else:
704           print "something else."
705   tar.close()
706
707How to create an archive and reset the user information using the *filter*
708parameter in :meth:`TarFile.add`::
709
710    import tarfile
711    def reset(tarinfo):
712        tarinfo.uid = tarinfo.gid = 0
713        tarinfo.uname = tarinfo.gname = "root"
714        return tarinfo
715    tar = tarfile.open("sample.tar.gz", "w:gz")
716    tar.add("foo", filter=reset)
717    tar.close()
718
719
720.. _tar-formats:
721
722Supported tar formats
723---------------------
724
725There are three tar formats that can be created with the :mod:`tarfile` module:
726
727* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
728  up to a length of at best 256 characters and linknames up to 100 characters. The
729  maximum file size is 8 gigabytes. This is an old and limited but widely
730  supported format.
731
732* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
733  linknames, files bigger than 8 gigabytes and sparse files. It is the de facto
734  standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
735  extensions for long names, sparse file support is read-only.
736
737* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
738  format with virtually no limits. It supports long filenames and linknames, large
739  files and stores pathnames in a portable way. However, not all tar
740  implementations today are able to handle pax archives properly.
741
742  The *pax* format is an extension to the existing *ustar* format. It uses extra
743  headers for information that cannot be stored otherwise. There are two flavours
744  of pax headers: Extended headers only affect the subsequent file header, global
745  headers are valid for the complete archive and affect all following files. All
746  the data in a pax header is encoded in *UTF-8* for portability reasons.
747
748There are some more variants of the tar format which can be read, but not
749created:
750
751* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
752  storing only regular files and directories. Names must not be longer than 100
753  characters, there is no user/group name information. Some archives have
754  miscalculated header checksums in case of fields with non-ASCII characters.
755
756* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
757  pax format, but is not compatible.
758
759.. _tar-unicode:
760
761Unicode issues
762--------------
763
764The tar format was originally conceived to make backups on tape drives with the
765main focus on preserving file system information. Nowadays tar archives are
766commonly used for file distribution and exchanging archives over networks. One
767problem of the original format (that all other formats are merely variants of)
768is that there is no concept of supporting different character encodings. For
769example, an ordinary tar archive created on a *UTF-8* system cannot be read
770correctly on a *Latin-1* system if it contains non-ASCII characters. Names (i.e.
771filenames, linknames, user/group names) containing these characters will appear
772damaged.  Unfortunately, there is no way to autodetect the encoding of an
773archive.
774
775The pax format was designed to solve this problem. It stores non-ASCII names
776using the universal character encoding *UTF-8*. When a pax archive is read,
777these *UTF-8* names are converted to the encoding of the local file system.
778
779The details of unicode conversion are controlled by the *encoding* and *errors*
780keyword arguments of the :class:`TarFile` class.
781
782The default value for *encoding* is the local character encoding. It is deduced
783from :func:`sys.getfilesystemencoding` and :func:`sys.getdefaultencoding`. In
784read mode, *encoding* is used exclusively to convert unicode names from a pax
785archive to strings in the local character encoding. In write mode, the use of
786*encoding* depends on the chosen archive format. In case of :const:`PAX_FORMAT`,
787input names that contain non-ASCII characters need to be decoded before being
788stored as *UTF-8* strings. The other formats do not make use of *encoding*
789unless unicode objects are used as input names. These are converted to 8-bit
790character strings before they are added to the archive.
791
792The *errors* argument defines how characters are treated that cannot be
793converted to or from *encoding*. Possible values are listed in section
794:ref:`codec-base-classes`. In read mode, there is an additional scheme
795``'utf-8'`` which means that bad characters are replaced by their *UTF-8*
796representation. This is the default scheme. In write mode the default value for
797*errors* is ``'strict'`` to ensure that name information is not altered
798unnoticed.
799
800