1.. image:: https://travis-ci.org/marcelm/xopen.svg?branch=master 2 :target: https://travis-ci.org/marcelm/xopen 3 :alt: 4 5.. image:: https://img.shields.io/pypi/v/xopen.svg?branch=master 6 :target: https://pypi.python.org/pypi/xopen 7 8.. image:: https://img.shields.io/conda/v/conda-forge/xopen.svg 9 :target: https://anaconda.org/conda-forge/xopen 10 :alt: 11 12.. image:: https://codecov.io/gh/marcelm/xopen/branch/master/graph/badge.svg 13 :target: https://codecov.io/gh/marcelm/xopen 14 :alt: 15 16===== 17xopen 18===== 19 20This small Python module provides an ``xopen`` function that works like the 21built-in ``open`` function, but can also deal with compressed files. 22Supported compression formats are gzip, bzip2 and xz. They are automatically 23recognized by their file extensions `.gz`, `.bz2` or `.xz`. 24 25The focus is on being as efficient as possible on all supported Python versions. 26For example, ``xopen`` uses ``pigz``, which is a parallel version of ``gzip``, 27to open ``.gz`` files, which is faster than using the built-in ``gzip.open`` 28function. ``pigz`` can use multiple threads when compressing, but is also faster 29when reading ``.gz`` files, so it is used both for reading and writing if it is 30available. For gzip compression levels 1 to 3, 31`igzip <https://github.com/intel/isa-l/>`_ is used for an even greater speedup. 32 33For use cases where using only the main thread is desired xopen can be used 34with ``threads=0``. This will use `python-isal 35<https://github.com/pycompression/python-isal>`_ (which binds isa-l) if 36python-isal is installed (automatic on Linux systems, as it is a requirement). 37For installation instructions for python-isal please 38checkout the `python-isal homepage 39<https://github.com/pycompression/python-isal>`_. If python-isal is not 40available ``gzip.open`` is used. 41 42This module has originally been developed as part of the `Cutadapt 43tool <https://cutadapt.readthedocs.io/>`_ that is used in bioinformatics to 44manipulate sequencing data. It has been in successful use within that software 45for a few years. 46 47``xopen`` is compatible with Python versions 3.6 and later. 48 49 50Usage 51----- 52 53Open a file for reading:: 54 55 from xopen import xopen 56 57 with xopen('file.txt.xz') as f: 58 content = f.read() 59 60Or without context manager:: 61 62 from xopen import xopen 63 64 f = xopen('file.txt.xz') 65 content = f.read() 66 f.close() 67 68Open a file in binary mode for writing:: 69 70 from xopen import xopen 71 72 with xopen('file.txt.gz', mode='wb') as f: 73 f.write(b'Hello') 74 75 76Credits 77------- 78 79The name ``xopen`` was taken from the C function of the same name in the 80`utils.h file which is part of 81BWA <https://github.com/lh3/bwa/blob/83662032a2192d5712996f36069ab02db82acf67/utils.h>`_. 82 83Kyle Beauchamp <https://github.com/kyleabeauchamp/> has contributed support for 84appending to files. 85 86Ruben Vorderman <https://github.com/rhpvorderman/> contributed improvements to 87make reading and writing gzipped files faster. 88 89Benjamin Vaisvil <https://github.com/bvaisvil> contributed support for 90format detection from content. 91 92Some ideas were taken from the `canopener project <https://github.com/selassid/canopener>`_. 93If you also want to open S3 files, you may want to use that module instead. 94 95 96Changes 97------- 98v1.1.0 99~~~~~~ 100* Python 3.5 support is dropped. 101* On Linux systems, `python-isal <https://github.com/pycompression/python-isal>`_ 102 is now added as a requirement. This will speed up the reading of gzip files 103 significantly when no external processes are used. 104 105v1.0.0 106~~~~~~ 107* If installed, the ``igzip`` program (part of 108 `Intel ISA-L <https://github.com/intel/isa-l/>`_) is now used for reading 109 and writing gzip-compressed files at compression levels 1-3, which results 110 in a significant speedup. 111 112v0.9.0 113~~~~~~ 114* When the file name extension of a file to be opened for reading is not 115 available, the content is inspected (if possible) and used to determine 116 which compression format applies. 117* This release drops Python 2.7 and 3.4 support. Python 3.5 or later is 118 now required. 119 120v0.8.4 121~~~~~~ 122* When reading gzipped files, force ``pigz`` to use only a single process. 123 ``pigz`` cannot use multiple cores anyway when decompressing. By default, 124 it would use extra I/O processes, which slightly reduces wall-clock time, 125 but increases CPU time. Single-core decompression with ``pigz`` is still 126 about twice as fast as regular ``gzip``. 127* Allow ``threads=0`` for specifying that no external ``pigz``/``gzip`` 128 process should be used (then regular ``gzip.open()`` is used instead). 129 130v0.8.3 131~~~~~~ 132* When reading gzipped files, let ``pigz`` use at most four threads by default. 133 This limit previously only applied when writing to a file. 134* Support Python 3.8 135 136v0.8.0 137~~~~~~ 138* Speed improvements when iterating over gzipped files. 139 140v0.6.0 141~~~~~~ 142* For reading from gzipped files, xopen will now use a ``pigz`` subprocess. 143 This is faster than using ``gzip.open``. 144* Python 2 support will be dropped in one of the next releases. 145 146v0.5.0 147~~~~~~ 148* By default, pigz is now only allowed to use at most four threads. This hopefully reduces 149 problems some users had with too many threads when opening many files at the same time. 150* xopen now accepts pathlib.Path objects. 151 152 153Contributors 154------------ 155 156* Marcel Martin 157* Ruben Vorderman 158* For more contributors, see <https://github.com/marcelm/xopen/graphs/contributors> 159 160 161Links 162----- 163 164* `Source code <https://github.com/marcelm/xopen/>`_ 165* `Report an issue <https://github.com/marcelm/xopen/issues>`_ 166* `Project page on PyPI (Python package index) <https://pypi.python.org/pypi/xopen/>`_ 167