1.. _data-package-design:
2
3Design of data packages for the nibabel and the nipy suite
4==========================================================
5
6See :ref:`data-package-discuss` for a more general discussion of design
7issues.
8
9When developing or using nipy, many data files can be useful. We divide the
10data files nipy uses into at least 3 categories
11
12#. *test data* - data files required for routine code testing
13#. *template data* - data files required for algorithms to function,
14   such as templates or atlases
15#. *example data* - data files for running examples, or optional tests
16
17Files used for routine testing are typically very small data files. They are
18shipped with the software, and live in the code repository. For example, in
19the case of ``nipy`` itself, there are some test files that live in the module
20path ``nipy.testing.data``.  Nibabel ships data files in
21``nibabel.tests.data``.  See :doc:`add_test_data` for discussion.
22
23*template data* and *example data* are example of *data packages*.  What
24follows is a discussion of the design and use of data packages.
25
26.. testsetup::
27
28    # Make fake data and template directories
29    import os
30    from os.path import join as pjoin
31    import tempfile
32    tmpdir = tempfile.mkdtemp()
33    os.environ['NIPY_USER_DIR'] = tmpdir
34    for subdir in ('data', 'templates'):
35        files_dir = pjoin(tmpdir, 'nipy', subdir)
36        os.makedirs(files_dir)
37        with open(pjoin(files_dir, 'config.ini'), 'wt') as fobj:
38            fobj.write(
39    """[DEFAULT]
40    version = 0.2
41    """)
42
43Use cases for data packages
44+++++++++++++++++++++++++++
45
46Using the data package
47``````````````````````
48
49The programmer can use the data like this:
50
51.. testcode::
52
53   from nibabel.data import make_datasource
54
55   templates = make_datasource(dict(relpath='nipy/templates'))
56   fname = templates.get_filename('ICBM152', '2mm', 'T1.nii.gz')
57
58where ``fname`` will be the absolute path to the template image
59``ICBM152/2mm/T1.nii.gz``.
60
61The programmer can insist on a particular version of a ``datasource``:
62
63>>> if templates.version < '0.4':
64...     raise ValueError('Need datasource version at least 0.4')
65Traceback (most recent call last):
66...
67ValueError: Need datasource version at least 0.4
68
69If the repository cannot find the data, then:
70
71>>> make_datasource(dict(relpath='nipy/implausible'))
72Traceback (most recent call last):
73 ...
74nibabel.data.DataError: ...
75
76where ``DataError`` gives a helpful warning about why the data was not
77found, and how it should be installed.
78
79Warnings during installation
80````````````````````````````
81
82The example data and template data may be important, and so we want to warn
83the user if NIPY cannot find either of the two sets of data when installing
84the package.  Thus::
85
86   python setup.py install
87
88will import nipy after installation to check whether these raise an error:
89
90>>> from nibabel.data import make_datasource
91>>> templates = make_datasource(dict(relpath='nipy/templates'))
92>>> example_data = make_datasource(dict(relpath='nipy/data'))
93
94and warn the user accordingly, with some basic instructions for how to
95install the data.
96
97.. _find-data:
98
99Finding the data
100````````````````
101
102The routine ``make_datasource`` will look for data packages that have been
103installed.  For the following call:
104
105>>> templates = make_datasource(dict(relpath='nipy/templates'))
106
107the code will:
108
109#. Get a list of paths where data is known to be stored with
110   ``nibabel.data.get_data_path()``
111#. For each of these paths, search for directory ``nipy/templates``.  If
112   found, and of the correct format (see below), return a datasource,
113   otherwise raise an Exception
114
115The paths collected by ``nibabel.data.get_data_paths()`` are constructed from
116':' (Unix) or ';' separated strings.  The source of the strings (in the order
117in which they will be used in the search above) are:
118
119#. The value of the ``NIPY_DATA_PATH`` environment variable, if set
120#. A section = ``DATA``, parameter = ``path`` entry in a
121   ``config.ini`` file in ``nipy_dir`` where ``nipy_dir`` is
122   ``$HOME/.nipy`` or equivalent.
123#. Section = ``DATA``, parameter = ``path`` entries in configuration
124   ``.ini`` files, where the ``.ini`` files are found by
125   ``glob.glob(os.path.join(etc_dir, '*.ini')`` and ``etc_dir`` is
126   ``/etc/nipy`` on Unix, and some suitable equivalent on Windows.
127#. The result of ``os.path.join(sys.prefix, 'share', 'nipy')``
128#. If ``sys.prefix`` is ``/usr``, we add ``/usr/local/share/nipy``. We
129   need this because Python >= 2.6 in Debian / Ubuntu does default installs to
130   ``/usr/local``.
131#. The result of ``get_nipy_user_dir()``
132
133Requirements for a data package
134```````````````````````````````
135
136To be a valid NIPY project data package, you need to satisfy:
137
138#. The installer installs the data in some place that can be found using
139   the method defined in :ref:`find-data`.
140
141We recommend that:
142
143#. By default, you install data in a standard location such as
144   ``<prefix>/share/nipy`` where ``<prefix>`` is the standard Python
145   prefix obtained by ``>>> import sys; print sys.prefix``
146
147Remember that there is a distinction between the NIPY project - the
148umbrella of neuroimaging in python - and the NIPY package - the main
149code package in the NIPY project.  Thus, if you want to install data
150under the NIPY *package* umbrella, your data might go to
151``/usr/share/nipy/nipy/packagename`` (on Unix).  Note ``nipy`` twice -
152once for the project, once for the package.  If you want to install data
153under - say - the ``pbrain`` package umbrella, that would go in
154``/usr/share/nipy/pbrain/packagename``.
155
156Data package format
157```````````````````
158
159The following tree is an example of the kind of pattern we would expect
160in a data directory, where the ``nipy-data`` and ``nipy-templates``
161packages have been installed::
162
163  <ROOT>
164  `-- nipy
165      |-- data
166      |   |-- config.ini
167      |   `-- placeholder.txt
168      `-- templates
169          |-- ICBM152
170          |   `-- 2mm
171          |       `-- T1.nii.gz
172          |-- colin27
173          |   `-- 2mm
174          |       `-- T1.nii.gz
175          `-- config.ini
176
177The ``<ROOT>`` directory is the directory that will appear somewhere in
178the list from ``nibabel.data.get_data_path()``.  The ``nipy`` subdirectory
179signifies data for the ``nipy`` package (as opposed to other
180NIPY-related packages such as ``pbrain``).  The ``data`` subdirectory of
181``nipy`` contains files from the ``nipy-data`` package.  In the
182``nipy/data`` or ``nipy/templates`` directories, there is a
183``config.ini`` file, that has at least an entry like this::
184
185  [DEFAULT]
186  version = 0.2
187
188giving the version of the data package.
189
190.. _data-package-design-install:
191
192Installing the data
193```````````````````
194
195We use python distutils to install data packages, and the ``data_files``
196mechanism to install the data.  On Unix, with the following command::
197
198   python setup.py install --prefix=/my/prefix
199
200data will go to::
201
202   /my/prefix/share/nipy
203
204For the example above this will result in these subdirectories::
205
206   /my/prefix/share/nipy/nipy/data
207   /my/prefix/share/nipy/nipy/templates
208
209because ``nipy`` is both the project, and the package to which the data
210relates.
211
212If you install to a particular location, you will need to add that location to
213the output of ``nibabel.data.get_data_path()`` using one of the mechanisms
214above, for example, in your system configuration::
215
216   export NIPY_DATA_PATH=/my/prefix/share/nipy
217
218Packaging for distributions
219```````````````````````````
220
221For a particular data package - say ``nipy-templates`` - distributions
222will want to:
223
224#. Install the data in set location.  The default from ``python setup.py
225   install`` for the data packages will be ``/usr/share/nipy`` on Unix.
226#. Point a system installation of NIPY to these data.
227
228For the latter, the most obvious route is to copy an ``.ini`` file named for
229the data package into the NIPY ``etc_dir``.  In this case, on Unix, we will
230want a file called ``/etc/nipy/nipy_templates.ini`` with contents::
231
232   [DATA]
233   path = /usr/share/nipy
234
235Current implementation
236``````````````````````
237
238This section describes how we (the nipy community) implement data packages at
239the moment.
240
241The data in the data packages will not usually be under source control.  This
242is because images don't compress very well, and any change in the data will
243result in a large extra storage cost in the repository.  If you're pretty
244clear that the data files aren't going to change, then a repository could work
245OK.
246
247The data packages will be available at a central release location.  For now
248this will be: http://nipy.org/data-packages/ .
249
250A package, such as ``nipy-templates-0.2.tar.gz`` will have the following sort
251of structure::
252
253
254  <ROOT>
255    |-- setup.py
256    |-- README.txt
257    |-- MANIFEST.in
258    `-- templates
259        |-- ICBM152
260        |   |-- 1mm
261        |   |   `-- T1_brain.nii.gz
262        |   `-- 2mm
263        |       `-- T1.nii.gz
264        |-- colin27
265        |   `-- 2mm
266        |       `-- T1.nii.gz
267        `-- config.ini
268
269
270There should be only one ``nipy/packagename`` directory delivered by a
271particular package.  For example, this package installs ``nipy/templates``,
272but does not contain ``nipy/data``.
273
274Making a new package tarball is simply:
275
276#. Downloading and unpacking e.g. ``nipy-templates-0.1.tar.gz`` to form the
277   directory structure above;
278#. Making any changes to the directory;
279#. Running ``setup.py sdist`` to recreate the package.
280
281The process of making a release should be:
282
283#. Increment the major or minor version number in the ``config.ini`` file;
284#. Make a package tarball as above;
285#. Upload to distribution site.
286
287There is an example nipy data package ``nipy-examplepkg`` in the
288``examples`` directory of the NIPY repository.
289
290The machinery for creating and maintaining data packages is available at
291https://github.com/nipy/data-packaging.
292
293See the ``README.txt`` file there for more information.
294
295.. testcleanup::
296
297    import shutil
298    shutil.rmtree(tmpdir)
299