• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

.github/workflows/H20-Jan-2021-8782

src/H20-Jan-2021-809626

tests/H03-May-2022-458326

.codecov.ymlH A D20-Jan-2021180 1712

.editorconfigH A D20-Jan-202195 76

.gitignoreH A D20-Jan-202166 87

LICENSEH A D20-Jan-20211.1 KiB2016

PKG-INFOH A D20-Jan-20217.2 KiB182138

README.rstH A D20-Jan-20215.4 KiB167124

pyproject.tomlH A D20-Jan-202168 32

setup.cfgH A D20-Jan-2021197 1914

setup.pyH A D20-Jan-2021980 3128

tox.iniH A D20-Jan-2021683 4235

README.rst

1.. image:: https://travis-ci.org/marcelm/xopen.svg?branch=master
2  :target: https://travis-ci.org/marcelm/xopen
3  :alt:
4
5.. image:: https://img.shields.io/pypi/v/xopen.svg?branch=master
6  :target: https://pypi.python.org/pypi/xopen
7
8.. image:: https://img.shields.io/conda/v/conda-forge/xopen.svg
9  :target: https://anaconda.org/conda-forge/xopen
10  :alt:
11
12.. image:: https://codecov.io/gh/marcelm/xopen/branch/master/graph/badge.svg
13  :target: https://codecov.io/gh/marcelm/xopen
14  :alt:
15
16=====
17xopen
18=====
19
20This small Python module provides an ``xopen`` function that works like the
21built-in ``open`` function, but can also deal with compressed files.
22Supported compression formats are gzip, bzip2 and xz. They are automatically
23recognized by their file extensions `.gz`, `.bz2` or `.xz`.
24
25The focus is on being as efficient as possible on all supported Python versions.
26For example, ``xopen`` uses ``pigz``, which is a parallel version of ``gzip``,
27to open ``.gz`` files, which is faster than using the built-in ``gzip.open``
28function. ``pigz`` can use multiple threads when compressing, but is also faster
29when reading ``.gz`` files, so it is used both for reading and writing if it is
30available. For gzip compression levels 1 to 3,
31`igzip <https://github.com/intel/isa-l/>`_ is used for an even greater speedup.
32
33For use cases where using only the main thread is desired xopen can be used
34with ``threads=0``. This will use `python-isal
35<https://github.com/pycompression/python-isal>`_ (which binds isa-l) if
36python-isal is installed (automatic on Linux systems, as it is a requirement).
37For installation instructions for python-isal please
38checkout the `python-isal homepage
39<https://github.com/pycompression/python-isal>`_. If python-isal is not
40available ``gzip.open`` is used.
41
42This module has originally been developed as part of the `Cutadapt
43tool <https://cutadapt.readthedocs.io/>`_ that is used in bioinformatics to
44manipulate sequencing data. It has been in successful use within that software
45for a few years.
46
47``xopen`` is compatible with Python versions 3.6 and later.
48
49
50Usage
51-----
52
53Open a file for reading::
54
55    from xopen import xopen
56
57    with xopen('file.txt.xz') as f:
58        content = f.read()
59
60Or without context manager::
61
62    from xopen import xopen
63
64    f = xopen('file.txt.xz')
65    content = f.read()
66    f.close()
67
68Open a file in binary mode for writing::
69
70    from xopen import xopen
71
72    with xopen('file.txt.gz', mode='wb') as f:
73        f.write(b'Hello')
74
75
76Credits
77-------
78
79The name ``xopen`` was taken from the C function of the same name in the
80`utils.h file which is part of
81BWA <https://github.com/lh3/bwa/blob/83662032a2192d5712996f36069ab02db82acf67/utils.h>`_.
82
83Kyle Beauchamp <https://github.com/kyleabeauchamp/> has contributed support for
84appending to files.
85
86Ruben Vorderman <https://github.com/rhpvorderman/> contributed improvements to
87make reading and writing gzipped files faster.
88
89Benjamin Vaisvil <https://github.com/bvaisvil> contributed support for
90format detection from content.
91
92Some ideas were taken from the `canopener project <https://github.com/selassid/canopener>`_.
93If you also want to open S3 files, you may want to use that module instead.
94
95
96Changes
97-------
98v1.1.0
99~~~~~~
100* Python 3.5 support is dropped.
101* On Linux systems, `python-isal <https://github.com/pycompression/python-isal>`_
102  is now added as a requirement. This will speed up the reading of gzip files
103  significantly when no external processes are used.
104
105v1.0.0
106~~~~~~
107* If installed, the ``igzip`` program (part of
108  `Intel ISA-L <https://github.com/intel/isa-l/>`_) is now used for reading
109  and writing gzip-compressed files at compression levels 1-3, which results
110  in a significant speedup.
111
112v0.9.0
113~~~~~~
114* When the file name extension of a file to be opened for reading is not
115  available, the content is inspected (if possible) and used to determine
116  which compression format applies.
117* This release drops Python 2.7 and 3.4 support. Python 3.5 or later is
118  now required.
119
120v0.8.4
121~~~~~~
122* When reading gzipped files, force ``pigz`` to use only a single process.
123  ``pigz`` cannot use multiple cores anyway when decompressing. By default,
124  it would use extra I/O processes, which slightly reduces wall-clock time,
125  but increases CPU time. Single-core decompression with ``pigz`` is still
126  about twice as fast as regular ``gzip``.
127* Allow ``threads=0`` for specifying that no external ``pigz``/``gzip``
128  process should be used (then regular ``gzip.open()`` is used instead).
129
130v0.8.3
131~~~~~~
132* When reading gzipped files, let ``pigz`` use at most four threads by default.
133  This limit previously only applied when writing to a file.
134* Support Python 3.8
135
136v0.8.0
137~~~~~~
138* Speed improvements when iterating over gzipped files.
139
140v0.6.0
141~~~~~~
142* For reading from gzipped files, xopen will now use a ``pigz`` subprocess.
143  This is faster than using ``gzip.open``.
144* Python 2 support will be dropped in one of the next releases.
145
146v0.5.0
147~~~~~~
148* By default, pigz is now only allowed to use at most four threads. This hopefully reduces
149  problems some users had with too many threads when opening many files at the same time.
150* xopen now accepts pathlib.Path objects.
151
152
153Contributors
154------------
155
156* Marcel Martin
157* Ruben Vorderman
158* For more contributors, see <https://github.com/marcelm/xopen/graphs/contributors>
159
160
161Links
162-----
163
164* `Source code <https://github.com/marcelm/xopen/>`_
165* `Report an issue <https://github.com/marcelm/xopen/issues>`_
166* `Project page on PyPI (Python package index) <https://pypi.python.org/pypi/xopen/>`_
167