1:mod:`tarfile` --- Read and write tar archive files 2=================================================== 3 4.. module:: tarfile 5 :synopsis: Read and write tar-format archive files. 6 7 8.. versionadded:: 2.3 9 10.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de> 11.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de> 12 13**Source code:** :source:`Lib/tarfile.py` 14 15-------------- 16 17The :mod:`tarfile` module makes it possible to read and write tar 18archives, including those using gzip or bz2 compression. 19Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the 20higher-level functions in :ref:`shutil <archiving-operations>`. 21 22Some facts and figures: 23 24* reads and writes :mod:`gzip` and :mod:`bz2` compressed archives 25 if the respective modules are available. 26 27* read/write support for the POSIX.1-1988 (ustar) format. 28 29* read/write support for the GNU tar format including *longname* and *longlink* 30 extensions, read-only support for the *sparse* extension. 31 32* read/write support for the POSIX.1-2001 (pax) format. 33 34 .. versionadded:: 2.6 35 36* handles directories, regular files, hardlinks, symbolic links, fifos, 37 character devices and block devices and is able to acquire and restore file 38 information like timestamp, access permissions and owner. 39 40.. note:: 41 Handling of multi-stream bzip2 files is not supported. Modules such as 42 `bz2file <https://github.com/nvawda/bz2file>`_ let you overcome this. 43 44 45.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs) 46 47 Return a :class:`TarFile` object for the pathname *name*. For detailed 48 information on :class:`TarFile` objects and the keyword arguments that are 49 allowed, see :ref:`tarfile-objects`. 50 51 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults 52 to ``'r'``. Here is a full list of mode combinations: 53 54 +------------------+---------------------------------------------+ 55 | mode | action | 56 +==================+=============================================+ 57 | ``'r' or 'r:*'`` | Open for reading with transparent | 58 | | compression (recommended). | 59 +------------------+---------------------------------------------+ 60 | ``'r:'`` | Open for reading exclusively without | 61 | | compression. | 62 +------------------+---------------------------------------------+ 63 | ``'r:gz'`` | Open for reading with gzip compression. | 64 +------------------+---------------------------------------------+ 65 | ``'r:bz2'`` | Open for reading with bzip2 compression. | 66 +------------------+---------------------------------------------+ 67 | ``'a' or 'a:'`` | Open for appending with no compression. The | 68 | | file is created if it does not exist. | 69 +------------------+---------------------------------------------+ 70 | ``'w' or 'w:'`` | Open for uncompressed writing. | 71 +------------------+---------------------------------------------+ 72 | ``'w:gz'`` | Open for gzip compressed writing. | 73 +------------------+---------------------------------------------+ 74 | ``'w:bz2'`` | Open for bzip2 compressed writing. | 75 +------------------+---------------------------------------------+ 76 77 Note that ``'a:gz'`` or ``'a:bz2'`` is not possible. If *mode* is not suitable 78 to open a certain (compressed) file for reading, :exc:`ReadError` is raised. Use 79 *mode* ``'r'`` to avoid this. If a compression method is not supported, 80 :exc:`CompressionError` is raised. 81 82 If *fileobj* is specified, it is used as an alternative to a file object opened 83 for *name*. It is supposed to be at position 0. 84 85 For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, :func:`tarfile.open` 86 accepts the keyword argument *compresslevel* (default ``9``) to 87 specify the compression level of the file. 88 89 For special purposes, there is a second format for *mode*: 90 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile` 91 object that processes its data as a stream of blocks. No random seeking will 92 be done on the file. If given, *fileobj* may be any object that has a 93 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize* 94 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant 95 in combination with e.g. ``sys.stdin``, a socket file object or a tape 96 device. However, such a :class:`TarFile` object is limited in that it does 97 not allow random access, see :ref:`tar-examples`. The currently 98 possible modes: 99 100 +-------------+--------------------------------------------+ 101 | Mode | Action | 102 +=============+============================================+ 103 | ``'r|*'`` | Open a *stream* of tar blocks for reading | 104 | | with transparent compression. | 105 +-------------+--------------------------------------------+ 106 | ``'r|'`` | Open a *stream* of uncompressed tar blocks | 107 | | for reading. | 108 +-------------+--------------------------------------------+ 109 | ``'r|gz'`` | Open a gzip compressed *stream* for | 110 | | reading. | 111 +-------------+--------------------------------------------+ 112 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for | 113 | | reading. | 114 +-------------+--------------------------------------------+ 115 | ``'w|'`` | Open an uncompressed *stream* for writing. | 116 +-------------+--------------------------------------------+ 117 | ``'w|gz'`` | Open a gzip compressed *stream* for | 118 | | writing. | 119 +-------------+--------------------------------------------+ 120 | ``'w|bz2'`` | Open a bzip2 compressed *stream* for | 121 | | writing. | 122 +-------------+--------------------------------------------+ 123 124 125.. class:: TarFile 126 127 Class for reading and writing tar archives. Do not use this class directly, 128 better use :func:`tarfile.open` instead. See :ref:`tarfile-objects`. 129 130 131.. function:: is_tarfile(name) 132 133 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile` 134 module can read. 135 136 137.. class:: TarFileCompat(filename, mode='r', compression=TAR_PLAIN) 138 139 Class for limited access to tar archives with a :mod:`zipfile`\ -like interface. 140 Please consult the documentation of the :mod:`zipfile` module for more details. 141 *compression* must be one of the following constants: 142 143 144 .. data:: TAR_PLAIN 145 146 Constant for an uncompressed tar archive. 147 148 149 .. data:: TAR_GZIPPED 150 151 Constant for a :mod:`gzip` compressed tar archive. 152 153 154 .. deprecated:: 2.6 155 The :class:`TarFileCompat` class has been removed in Python 3. 156 157 158.. exception:: TarError 159 160 Base class for all :mod:`tarfile` exceptions. 161 162 163.. exception:: ReadError 164 165 Is raised when a tar archive is opened, that either cannot be handled by the 166 :mod:`tarfile` module or is somehow invalid. 167 168 169.. exception:: CompressionError 170 171 Is raised when a compression method is not supported or when the data cannot be 172 decoded properly. 173 174 175.. exception:: StreamError 176 177 Is raised for the limitations that are typical for stream-like :class:`TarFile` 178 objects. 179 180 181.. exception:: ExtractError 182 183 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if 184 :attr:`TarFile.errorlevel`\ ``== 2``. 185 186 187The following constants are available at the module level: 188 189.. data:: ENCODING 190 191 The default character encoding: ``'utf-8'`` on Windows, the value returned by 192 :func:`sys.getfilesystemencoding` otherwise. 193 194 195.. exception:: HeaderError 196 197 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid. 198 199 .. versionadded:: 2.6 200 201 202Each of the following constants defines a tar archive format that the 203:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for 204details. 205 206 207.. data:: USTAR_FORMAT 208 209 POSIX.1-1988 (ustar) format. 210 211 212.. data:: GNU_FORMAT 213 214 GNU tar format. 215 216 217.. data:: PAX_FORMAT 218 219 POSIX.1-2001 (pax) format. 220 221 222.. data:: DEFAULT_FORMAT 223 224 The default format for creating archives. This is currently :const:`GNU_FORMAT`. 225 226 227.. seealso:: 228 229 Module :mod:`zipfile` 230 Documentation of the :mod:`zipfile` standard module. 231 232 :ref:`archiving-operations` 233 Documentation of the higher-level archiving facilities provided by the 234 standard :mod:`shutil` module. 235 236 `GNU tar manual, Basic Tar Format <https://www.gnu.org/software/tar/manual/html_node/Standard.html>`_ 237 Documentation for tar archive files, including GNU tar extensions. 238 239 240.. _tarfile-objects: 241 242TarFile Objects 243--------------- 244 245The :class:`TarFile` object provides an interface to a tar archive. A tar 246archive is a sequence of blocks. An archive member (a stored file) is made up of 247a header block followed by data blocks. It is possible to store a file in a tar 248archive several times. Each archive member is represented by a :class:`TarInfo` 249object, see :ref:`tarinfo-objects` for details. 250 251A :class:`TarFile` object can be used as a context manager in a :keyword:`with` 252statement. It will automatically be closed when the block is completed. Please 253note that in the event of an exception an archive opened for writing will not 254be finalized; only the internally used file object will be closed. See the 255:ref:`tar-examples` section for a use case. 256 257.. versionadded:: 2.7 258 Added support for the context management protocol. 259 260.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors=None, pax_headers=None, debug=0, errorlevel=0) 261 262 All following arguments are optional and can be accessed as instance attributes 263 as well. 264 265 *name* is the pathname of the archive. It can be omitted if *fileobj* is given. 266 In this case, the file object's :attr:`name` attribute is used if it exists. 267 268 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append 269 data to an existing file or ``'w'`` to create a new file overwriting an existing 270 one. 271 272 If *fileobj* is given, it is used for reading or writing data. If it can be 273 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used 274 from position 0. 275 276 .. note:: 277 278 *fileobj* is not closed, when :class:`TarFile` is closed. 279 280 *format* controls the archive format. It must be one of the constants 281 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are 282 defined at module level. 283 284 .. versionadded:: 2.6 285 286 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class 287 with a different one. 288 289 .. versionadded:: 2.6 290 291 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it 292 is :const:`True`, add the content of the target files to the archive. This has no 293 effect on systems that do not support symbolic links. 294 295 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive. 296 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members 297 as possible. This is only useful for reading concatenated or damaged archives. 298 299 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug 300 messages). The messages are written to ``sys.stderr``. 301 302 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`. 303 Nevertheless, they appear as error messages in the debug output, when debugging 304 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError` or 305 :exc:`IOError` exceptions. If ``2``, all *non-fatal* errors are raised as 306 :exc:`TarError` exceptions as well. 307 308 The *encoding* and *errors* arguments control the way strings are converted to 309 unicode objects and vice versa. The default settings will work for most users. 310 See section :ref:`tar-unicode` for in-depth information. 311 312 .. versionadded:: 2.6 313 314 The *pax_headers* argument is an optional dictionary of unicode strings which 315 will be added as a pax global header if *format* is :const:`PAX_FORMAT`. 316 317 .. versionadded:: 2.6 318 319 320.. classmethod:: TarFile.open(...) 321 322 Alternative constructor. The :func:`tarfile.open` function is actually a 323 shortcut to this classmethod. 324 325 326.. method:: TarFile.getmember(name) 327 328 Return a :class:`TarInfo` object for member *name*. If *name* can not be found 329 in the archive, :exc:`KeyError` is raised. 330 331 .. note:: 332 333 If a member occurs more than once in the archive, its last occurrence is assumed 334 to be the most up-to-date version. 335 336 337.. method:: TarFile.getmembers() 338 339 Return the members of the archive as a list of :class:`TarInfo` objects. The 340 list has the same order as the members in the archive. 341 342 343.. method:: TarFile.getnames() 344 345 Return the members as a list of their names. It has the same order as the list 346 returned by :meth:`getmembers`. 347 348 349.. method:: TarFile.list(verbose=True) 350 351 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`, 352 only the names of the members are printed. If it is :const:`True`, output 353 similar to that of :program:`ls -l` is produced. 354 355 356.. method:: TarFile.next() 357 358 Return the next member of the archive as a :class:`TarInfo` object, when 359 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more 360 available. 361 362 363.. method:: TarFile.extractall(path=".", members=None) 364 365 Extract all members from the archive to the current working directory or 366 directory *path*. If optional *members* is given, it must be a subset of the 367 list returned by :meth:`getmembers`. Directory information like owner, 368 modification time and permissions are set after all members have been extracted. 369 This is done to work around two problems: A directory's modification time is 370 reset each time a file is created in it. And, if a directory's permissions do 371 not allow writing, extracting files to it will fail. 372 373 .. warning:: 374 375 Never extract archives from untrusted sources without prior inspection. 376 It is possible that files are created outside of *path*, e.g. members 377 that have absolute filenames starting with ``"/"`` or filenames with two 378 dots ``".."``. 379 380 .. versionadded:: 2.5 381 382 383.. method:: TarFile.extract(member, path="") 384 385 Extract a member from the archive to the current working directory, using its 386 full name. Its file information is extracted as accurately as possible. *member* 387 may be a filename or a :class:`TarInfo` object. You can specify a different 388 directory using *path*. 389 390 .. note:: 391 392 The :meth:`extract` method does not take care of several extraction issues. 393 In most cases you should consider using the :meth:`extractall` method. 394 395 .. warning:: 396 397 See the warning for :meth:`extractall`. 398 399 400.. method:: TarFile.extractfile(member) 401 402 Extract a member from the archive as a file object. *member* may be a filename 403 or a :class:`TarInfo` object. If *member* is a regular file, a file-like object 404 is returned. If *member* is a link, a file-like object is constructed from the 405 link's target. If *member* is none of the above, :const:`None` is returned. 406 407 .. note:: 408 409 The file-like object is read-only. It provides the methods 410 :meth:`read`, :meth:`readline`, :meth:`readlines`, :meth:`seek`, :meth:`tell`, 411 and :meth:`close`, and also supports iteration over its lines. 412 413 414.. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None, filter=None) 415 416 Add the file *name* to the archive. *name* may be any type of file (directory, 417 fifo, symbolic link, etc.). If given, *arcname* specifies an alternative name 418 for the file in the archive. Directories are added recursively by default. This 419 can be avoided by setting *recursive* to :const:`False`. If *exclude* is given 420 it must be a function that takes one filename argument and returns a boolean 421 value. Depending on this value the respective file is either excluded 422 (:const:`True`) or added (:const:`False`). If *filter* is specified it must 423 be a function that takes a :class:`TarInfo` object argument and returns the 424 changed :class:`TarInfo` object. If it instead returns :const:`None` the :class:`TarInfo` 425 object will be excluded from the archive. See :ref:`tar-examples` for an 426 example. 427 428 .. versionchanged:: 2.6 429 Added the *exclude* parameter. 430 431 .. versionchanged:: 2.7 432 Added the *filter* parameter. 433 434 .. deprecated:: 2.7 435 The *exclude* parameter is deprecated, please use the *filter* parameter 436 instead. For maximum portability, *filter* should be used as a keyword 437 argument rather than as a positional argument so that code won't be 438 affected when *exclude* is ultimately removed. 439 440 441.. method:: TarFile.addfile(tarinfo, fileobj=None) 442 443 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given, 444 ``tarinfo.size`` bytes are read from it and added to the archive. You can 445 create :class:`TarInfo` objects directly, or by using :meth:`gettarinfo`. 446 447 .. note:: 448 On Windows platforms, *fileobj* should always be opened with mode ``'rb'`` to 449 avoid irritation about the file size. 450 451 452.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None) 453 454 Create a :class:`TarInfo` object from the result of :func:`os.stat` or 455 equivalent on an existing file. The file is either named by *name*, or 456 specified as a file object *fileobj* with a file descriptor. If 457 given, *arcname* specifies an alternative name for the file in the 458 archive, otherwise, the name is taken from *fileobj*’s 459 :attr:`~file.name` attribute, or the *name* argument. 460 461 You can modify some 462 of the :class:`TarInfo`’s attributes before you add it using :meth:`addfile`. 463 If the file object is not an ordinary file object positioned at the 464 beginning of the file, attributes such as :attr:`~TarInfo.size` may need 465 modifying. This is the case for objects such as :class:`~gzip.GzipFile`. 466 The :attr:`~TarInfo.name` may also be modified, in which case *arcname* 467 could be a dummy string. 468 469 470.. method:: TarFile.close() 471 472 Close the :class:`TarFile`. In write mode, two finishing zero blocks are 473 appended to the archive. 474 475 476.. attribute:: TarFile.posix 477 478 Setting this to :const:`True` is equivalent to setting the :attr:`format` 479 attribute to :const:`USTAR_FORMAT`, :const:`False` is equivalent to 480 :const:`GNU_FORMAT`. 481 482 .. versionchanged:: 2.4 483 *posix* defaults to :const:`False`. 484 485 .. deprecated:: 2.6 486 Use the :attr:`format` attribute instead. 487 488 489.. attribute:: TarFile.pax_headers 490 491 A dictionary containing key-value pairs of pax global headers. 492 493 .. versionadded:: 2.6 494 495 496.. _tarinfo-objects: 497 498TarInfo Objects 499--------------- 500 501A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside 502from storing all required attributes of a file (like file type, size, time, 503permissions, owner etc.), it provides some useful methods to determine its type. 504It does *not* contain the file's data itself. 505 506:class:`TarInfo` objects are returned by :class:`TarFile`'s methods 507:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`. 508 509 510.. class:: TarInfo(name="") 511 512 Create a :class:`TarInfo` object. 513 514 515.. method:: TarInfo.frombuf(buf) 516 517 Create and return a :class:`TarInfo` object from string buffer *buf*. 518 519 .. versionadded:: 2.6 520 Raises :exc:`HeaderError` if the buffer is invalid.. 521 522 523.. method:: TarInfo.fromtarfile(tarfile) 524 525 Read the next member from the :class:`TarFile` object *tarfile* and return it as 526 a :class:`TarInfo` object. 527 528 .. versionadded:: 2.6 529 530 531.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='strict') 532 533 Create a string buffer from a :class:`TarInfo` object. For information on the 534 arguments see the constructor of the :class:`TarFile` class. 535 536 .. versionchanged:: 2.6 537 The arguments were added. 538 539A ``TarInfo`` object has the following public data attributes: 540 541 542.. attribute:: TarInfo.name 543 544 Name of the archive member. 545 546 547.. attribute:: TarInfo.size 548 549 Size in bytes. 550 551 552.. attribute:: TarInfo.mtime 553 554 Time of last modification. 555 556 557.. attribute:: TarInfo.mode 558 559 Permission bits. 560 561 562.. attribute:: TarInfo.type 563 564 File type. *type* is usually one of these constants: :const:`REGTYPE`, 565 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`, 566 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`, 567 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object 568 more conveniently, use the ``is*()`` methods below. 569 570 571.. attribute:: TarInfo.linkname 572 573 Name of the target file name, which is only present in :class:`TarInfo` objects 574 of type :const:`LNKTYPE` and :const:`SYMTYPE`. 575 576 577.. attribute:: TarInfo.uid 578 579 User ID of the user who originally stored this member. 580 581 582.. attribute:: TarInfo.gid 583 584 Group ID of the user who originally stored this member. 585 586 587.. attribute:: TarInfo.uname 588 589 User name. 590 591 592.. attribute:: TarInfo.gname 593 594 Group name. 595 596 597.. attribute:: TarInfo.pax_headers 598 599 A dictionary containing key-value pairs of an associated pax extended header. 600 601 .. versionadded:: 2.6 602 603A :class:`TarInfo` object also provides some convenient query methods: 604 605 606.. method:: TarInfo.isfile() 607 608 Return :const:`True` if the :class:`Tarinfo` object is a regular file. 609 610 611.. method:: TarInfo.isreg() 612 613 Same as :meth:`isfile`. 614 615 616.. method:: TarInfo.isdir() 617 618 Return :const:`True` if it is a directory. 619 620 621.. method:: TarInfo.issym() 622 623 Return :const:`True` if it is a symbolic link. 624 625 626.. method:: TarInfo.islnk() 627 628 Return :const:`True` if it is a hard link. 629 630 631.. method:: TarInfo.ischr() 632 633 Return :const:`True` if it is a character device. 634 635 636.. method:: TarInfo.isblk() 637 638 Return :const:`True` if it is a block device. 639 640 641.. method:: TarInfo.isfifo() 642 643 Return :const:`True` if it is a FIFO. 644 645 646.. method:: TarInfo.isdev() 647 648 Return :const:`True` if it is one of character device, block device or FIFO. 649 650 651.. _tar-examples: 652 653Examples 654-------- 655 656How to extract an entire tar archive to the current working directory:: 657 658 import tarfile 659 tar = tarfile.open("sample.tar.gz") 660 tar.extractall() 661 tar.close() 662 663How to extract a subset of a tar archive with :meth:`TarFile.extractall` using 664a generator function instead of a list:: 665 666 import os 667 import tarfile 668 669 def py_files(members): 670 for tarinfo in members: 671 if os.path.splitext(tarinfo.name)[1] == ".py": 672 yield tarinfo 673 674 tar = tarfile.open("sample.tar.gz") 675 tar.extractall(members=py_files(tar)) 676 tar.close() 677 678How to create an uncompressed tar archive from a list of filenames:: 679 680 import tarfile 681 tar = tarfile.open("sample.tar", "w") 682 for name in ["foo", "bar", "quux"]: 683 tar.add(name) 684 tar.close() 685 686The same example using the :keyword:`with` statement:: 687 688 import tarfile 689 with tarfile.open("sample.tar", "w") as tar: 690 for name in ["foo", "bar", "quux"]: 691 tar.add(name) 692 693How to read a gzip compressed tar archive and display some member information:: 694 695 import tarfile 696 tar = tarfile.open("sample.tar.gz", "r:gz") 697 for tarinfo in tar: 698 print tarinfo.name, "is", tarinfo.size, "bytes in size and is", 699 if tarinfo.isreg(): 700 print "a regular file." 701 elif tarinfo.isdir(): 702 print "a directory." 703 else: 704 print "something else." 705 tar.close() 706 707How to create an archive and reset the user information using the *filter* 708parameter in :meth:`TarFile.add`:: 709 710 import tarfile 711 def reset(tarinfo): 712 tarinfo.uid = tarinfo.gid = 0 713 tarinfo.uname = tarinfo.gname = "root" 714 return tarinfo 715 tar = tarfile.open("sample.tar.gz", "w:gz") 716 tar.add("foo", filter=reset) 717 tar.close() 718 719 720.. _tar-formats: 721 722Supported tar formats 723--------------------- 724 725There are three tar formats that can be created with the :mod:`tarfile` module: 726 727* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames 728 up to a length of at best 256 characters and linknames up to 100 characters. The 729 maximum file size is 8 gigabytes. This is an old and limited but widely 730 supported format. 731 732* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and 733 linknames, files bigger than 8 gigabytes and sparse files. It is the de facto 734 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar 735 extensions for long names, sparse file support is read-only. 736 737* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible 738 format with virtually no limits. It supports long filenames and linknames, large 739 files and stores pathnames in a portable way. However, not all tar 740 implementations today are able to handle pax archives properly. 741 742 The *pax* format is an extension to the existing *ustar* format. It uses extra 743 headers for information that cannot be stored otherwise. There are two flavours 744 of pax headers: Extended headers only affect the subsequent file header, global 745 headers are valid for the complete archive and affect all following files. All 746 the data in a pax header is encoded in *UTF-8* for portability reasons. 747 748There are some more variants of the tar format which can be read, but not 749created: 750 751* The ancient V7 format. This is the first tar format from Unix Seventh Edition, 752 storing only regular files and directories. Names must not be longer than 100 753 characters, there is no user/group name information. Some archives have 754 miscalculated header checksums in case of fields with non-ASCII characters. 755 756* The SunOS tar extended format. This format is a variant of the POSIX.1-2001 757 pax format, but is not compatible. 758 759.. _tar-unicode: 760 761Unicode issues 762-------------- 763 764The tar format was originally conceived to make backups on tape drives with the 765main focus on preserving file system information. Nowadays tar archives are 766commonly used for file distribution and exchanging archives over networks. One 767problem of the original format (that all other formats are merely variants of) 768is that there is no concept of supporting different character encodings. For 769example, an ordinary tar archive created on a *UTF-8* system cannot be read 770correctly on a *Latin-1* system if it contains non-ASCII characters. Names (i.e. 771filenames, linknames, user/group names) containing these characters will appear 772damaged. Unfortunately, there is no way to autodetect the encoding of an 773archive. 774 775The pax format was designed to solve this problem. It stores non-ASCII names 776using the universal character encoding *UTF-8*. When a pax archive is read, 777these *UTF-8* names are converted to the encoding of the local file system. 778 779The details of unicode conversion are controlled by the *encoding* and *errors* 780keyword arguments of the :class:`TarFile` class. 781 782The default value for *encoding* is the local character encoding. It is deduced 783from :func:`sys.getfilesystemencoding` and :func:`sys.getdefaultencoding`. In 784read mode, *encoding* is used exclusively to convert unicode names from a pax 785archive to strings in the local character encoding. In write mode, the use of 786*encoding* depends on the chosen archive format. In case of :const:`PAX_FORMAT`, 787input names that contain non-ASCII characters need to be decoded before being 788stored as *UTF-8* strings. The other formats do not make use of *encoding* 789unless unicode objects are used as input names. These are converted to 8-bit 790character strings before they are added to the archive. 791 792The *errors* argument defines how characters are treated that cannot be 793converted to or from *encoding*. Possible values are listed in section 794:ref:`codec-base-classes`. In read mode, there is an additional scheme 795``'utf-8'`` which means that bad characters are replaced by their *UTF-8* 796representation. This is the default scheme. In write mode the default value for 797*errors* is ``'strict'`` to ensure that name information is not altered 798unnoticed. 799 800