1# emacs: -*- mode: python-mode; py-indent-offset: 4; indent-tabs-mode: nil -*-
2# vi: set ft=python sts=4 ts=4 sw=4 et:
3### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
4#
5#   See COPYING file distributed along with the NiBabel package for the
6#   copyright and license terms.
7#
8### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
9"""
10NetCDF reader/writer module.
11
12This module is used to read and create NetCDF files. NetCDF files are
13accessed through the `netcdf_file` object. Data written to and from NetCDF
14files are contained in `netcdf_variable` objects. Attributes are given
15as member variables of the `netcdf_file` and `netcdf_variable` objects.
16
17Notes
18-----
19NetCDF files are a self-describing binary data format. The file contains
20metadata that describes the dimensions and variables in the file. More
21details about NetCDF files can be found `here
22<http://www.unidata.ucar.edu/software/netcdf/docs/netcdf.html>`_. There
23are three main sections to a NetCDF data structure:
24
251. Dimensions
262. Variables
273. Attributes
28
29The dimensions section records the name and length of each dimension used
30by the variables. The variables would then indicate which dimensions it
31uses and any attributes such as data units, along with containing the data
32values for the variable. It is good practice to include a
33variable that is the same name as a dimension to provide the values for
34that axes. Lastly, the attributes section would contain additional
35information such as the name of the file creator or the instrument used to
36collect the data.
37
38When writing data to a NetCDF file, there is often the need to indicate the
39'record dimension'. A record dimension is the unbounded dimension for a
40variable. For example, a temperature variable may have dimensions of
41latitude, longitude and time. If one wants to add more temperature data to
42the NetCDF file as time progresses, then the temperature variable should
43have the time dimension flagged as the record dimension.
44
45This module implements the Scientific.IO.NetCDF API to read and create
46NetCDF files. The same API is also used in the PyNIO and pynetcdf
47modules, allowing these modules to be used interchangeably when working
48with NetCDF files. The major advantage of this module over other
49modules is that it doesn't require the code to be linked to the NetCDF
50C libraries.
51
52The code is based on the `NetCDF file format specification
53<http://www.unidata.ucar.edu/software/netcdf/docs/netcdf.html>`_. A
54NetCDF file is a self-describing binary format, with a header followed
55by data. The header contains metadata describing dimensions, variables
56and the position of the data in the file, so access can be done in an
57efficient manner without loading unnecessary data into memory. We use
58the ``mmap`` module to create Numpy arrays mapped to the data on disk,
59for the same purpose.
60
61The structure of a NetCDF file is as follows:
62
63    C D F <VERSION BYTE> <NUMBER OF RECORDS>
64    <DIMENSIONS> <GLOBAL ATTRIBUTES> <VARIABLES METADATA>
65    <NON-RECORD DATA> <RECORD DATA>
66
67Record data refers to data where the first axis can be expanded at
68will. All record variables share a same dimension at the first axis,
69and they are stored at the end of the file per record, ie
70
71    A[0], B[0], ..., A[1], B[1], ..., etc,
72
73so that new data can be appended to the file without changing its original
74structure. Non-record data are padded to a 4n bytes boundary. Record data
75are also padded, unless there is exactly one record variable in the file,
76in which case the padding is dropped.  All data is stored in big endian
77byte order.
78
79The Scientific.IO.NetCDF API allows attributes to be added directly to
80instances of ``netcdf_file`` and ``netcdf_variable``. To differentiate
81between user-set attributes and instance attributes, user-set attributes
82are automatically stored in the ``_attributes`` attribute by overloading
83``__setattr__``. This is the reason why the code sometimes uses
84``obj.__dict__['key'] = value``, instead of simply ``obj.key = value``;
85otherwise the key would be inserted into userspace attributes.
86
87In addition, the NetCDF file header contains the position of the data in
88the file, so access can be done in an efficient manner without loading
89unnecessary data into memory. It uses the ``mmap`` module to create
90Numpy arrays mapped to the data on disk, for the same purpose.
91
92Examples
93--------
94To create a NetCDF file:
95
96Make a temporary file for testing:
97
98    >>> import os
99    >>> from tempfile import mkdtemp
100    >>> tmp_pth = mkdtemp()
101    >>> fname = os.path.join(tmp_pth, 'test.nc')
102
103Then:
104
105    >>> f = netcdf_file(fname, 'w')
106    >>> f.history = 'Created for a test'
107    >>> f.createDimension('time', 10)
108    >>> time = f.createVariable('time', 'i', ('time',))
109    >>> time[:] = range(10)
110    >>> time.units = 'days since 2008-01-01'
111    >>> f.close()
112
113Note the assignment of ``range(10)`` to ``time[:]``.  Exposing the slice
114of the time variable allows for the data to be set in the object, rather
115than letting ``range(10)`` overwrite the ``time`` variable.
116
117To read the NetCDF file we just created:
118
119    >>> f = netcdf_file(fname, 'r')
120    >>> f.history  #23dt next : bytes
121    'Created for a test'
122    >>> time = f.variables['time']
123    >>> time.units  #23dt next : bytes
124    'days since 2008-01-01'
125    >>> time.shape == (10,)
126    True
127    >>> print time[-1]
128    9
129    >>> f.close()
130
131 Delete our temporary directory and file:
132
133    >>> del f, time # needed for windows unlink
134    >>> os.unlink(fname)
135    >>> os.rmdir(tmp_pth)
136"""
137
138
139
140#TODO:
141# * properly implement ``_FillValue``.
142# * implement Jeff Whitaker's patch for masked variables.
143# * fix character variables.
144# * implement PAGESIZE for Python 2.6?
145
146#The Scientific.IO.NetCDF API allows attributes to be added directly to
147#instances of ``netcdf_file`` and ``netcdf_variable``. To differentiate
148#between user-set attributes and instance attributes, user-set attributes
149#are automatically stored in the ``_attributes`` attribute by overloading
150#``__setattr__``. This is the reason why the code sometimes uses
151#``obj.__dict__['key'] = value``, instead of simply ``obj.key = value``;
152#otherwise the key would be inserted into userspace attributes.
153
154
155__all__ = ['netcdf_file']
156
157
158from operator import mul
159from mmap import mmap, ACCESS_READ
160
161import numpy as np
162from ..py3k import asbytes, asstr
163from numpy import fromstring, ndarray, dtype, empty, array, asarray
164from numpy import little_endian as LITTLE_ENDIAN
165
166
167ABSENT       = asbytes('\x00\x00\x00\x00\x00\x00\x00\x00')
168ZERO         = asbytes('\x00\x00\x00\x00')
169NC_BYTE      = asbytes('\x00\x00\x00\x01')
170NC_CHAR      = asbytes('\x00\x00\x00\x02')
171NC_SHORT     = asbytes('\x00\x00\x00\x03')
172NC_INT       = asbytes('\x00\x00\x00\x04')
173NC_FLOAT     = asbytes('\x00\x00\x00\x05')
174NC_DOUBLE    = asbytes('\x00\x00\x00\x06')
175NC_DIMENSION = asbytes('\x00\x00\x00\n')
176NC_VARIABLE  = asbytes('\x00\x00\x00\x0b')
177NC_ATTRIBUTE = asbytes('\x00\x00\x00\x0c')
178
179
180TYPEMAP = { NC_BYTE:   ('b', 1),
181            NC_CHAR:   ('c', 1),
182            NC_SHORT:  ('h', 2),
183            NC_INT:    ('i', 4),
184            NC_FLOAT:  ('f', 4),
185            NC_DOUBLE: ('d', 8) }
186
187REVERSE = { ('b', 1): NC_BYTE,
188            ('B', 1): NC_CHAR,
189            ('c', 1): NC_CHAR,
190            ('h', 2): NC_SHORT,
191            ('i', 4): NC_INT,
192            ('f', 4): NC_FLOAT,
193            ('d', 8): NC_DOUBLE,
194
195            # these come from asarray(1).dtype.char and asarray('foo').dtype.char,
196            # used when getting the types from generic attributes.
197            ('l', 4): NC_INT,
198            ('S', 1): NC_CHAR }
199
200
201class netcdf_file(object):
202    """
203    A file object for NetCDF data.
204
205    A `netcdf_file` object has two standard attributes: `dimensions` and
206    `variables`. The values of both are dictionaries, mapping dimension
207    names to their associated lengths and variable names to variables,
208    respectively. Application programs should never modify these
209    dictionaries.
210
211    All other attributes correspond to global attributes defined in the
212    NetCDF file. Global file attributes are created by assigning to an
213    attribute of the `netcdf_file` object.
214
215    Parameters
216    ----------
217    filename : string or file-like
218        string -> filename
219    mode : {'r', 'w'}, optional
220        read-write mode, default is 'r'
221    mmap : None or bool, optional
222        Whether to mmap `filename` when reading.  Default is True
223        when `filename` is a file name, False when `filename` is a
224        file-like object
225    version : {1, 2}, optional
226        version of netcdf to read / write, where 1 means *Classic
227        format* and 2 means *64-bit offset format*.  Default is 1.  See
228        `here <http://www.unidata.ucar.edu/software/netcdf/docs/netcdf/Which-Format.html>`_
229        for more info.
230
231    """
232    def __init__(self, filename, mode='r', mmap=None, version=1):
233        """Initialize netcdf_file from fileobj (str or file-like).
234
235        Parameters
236        ----------
237        filename : string or file-like
238           string -> filename
239        mode : {'r', 'w'}, optional
240           read-write mode, default is 'r'
241        mmap : None or bool, optional
242           Whether to mmap `filename` when reading.  Default is True
243           when `filename` is a file name, False when `filename` is a
244           file-like object
245        version : {1, 2}, optional
246           version of netcdf to read / write, where 1 means *Classic
247           format* and 2 means *64-bit offset format*.  Default is 1.  See
248           http://www.unidata.ucar.edu/software/netcdf/docs/netcdf/Which-Format.html#Which-Format
249        """
250        if hasattr(filename, 'seek'): # file-like
251            self.fp = filename
252            self.filename = 'None'
253            if mmap is None:
254                mmap = False
255            elif mmap and not hasattr(filename, 'fileno'):
256                raise ValueError('Cannot use file object for mmap')
257        else: # maybe it's a string
258            self.filename = filename
259            self.fp = open(self.filename, '%sb' % mode)
260            if mmap is None:
261                mmap  = True
262        self.use_mmap = mmap
263        self.version_byte = version
264
265        if not mode in 'rw':
266            raise ValueError("Mode must be either 'r' or 'w'.")
267        self.mode = mode
268
269        self.dimensions = {}
270        self.variables = {}
271
272        self._dims = []
273        self._recs = 0
274        self._recsize = 0
275
276        self._attributes = {}
277
278        if mode == 'r':
279            self._read()
280
281    def __setattr__(self, attr, value):
282        # Store user defined attributes in a separate dict,
283        # so we can save them to file later.
284        try:
285            self._attributes[attr] = value
286        except AttributeError:
287            pass
288        self.__dict__[attr] = value
289
290    def close(self):
291        """Closes the NetCDF file."""
292        try:
293            if self.fp.closed:
294                return
295        except AttributeError: # gzip files don't have closed attr
296            pass
297        try:
298            self.flush()
299        finally:
300            self.fp.close()
301    __del__ = close
302
303    def createDimension(self, name, length):
304        """
305        Adds a dimension to the Dimension section of the NetCDF data structure.
306
307        Note that this function merely adds a new dimension that the variables can
308        reference.  The values for the dimension, if desired, should be added as
309        a variable using `createVariable`, referring to this dimension.
310
311        Parameters
312        ----------
313        name : str
314            Name of the dimension (Eg, 'lat' or 'time').
315        length : int
316            Length of the dimension.
317
318        See Also
319        --------
320        createVariable
321
322        """
323        self.dimensions[name] = length
324        self._dims.append(name)
325
326    def createVariable(self, name, type, dimensions):
327        """
328        Create an empty variable for the `netcdf_file` object, specifying its data
329        type and the dimensions it uses.
330
331        Parameters
332        ----------
333        name : str
334            Name of the new variable.
335        type : dtype or str
336            Data type of the variable.
337        dimensions : sequence of str
338            List of the dimension names used by the variable, in the desired order.
339
340        Returns
341        -------
342        variable : netcdf_variable
343            The newly created ``netcdf_variable`` object.
344            This object has also been added to the `netcdf_file` object as well.
345
346        See Also
347        --------
348        createDimension
349
350        Notes
351        -----
352        Any dimensions to be used by the variable should already exist in the
353        NetCDF data structure or should be created by `createDimension` prior to
354        creating the NetCDF variable.
355
356        """
357        shape = tuple([self.dimensions[dim] for dim in dimensions])
358        shape_ = tuple([dim or 0 for dim in shape])  # replace None with 0 for numpy
359
360        if isinstance(type, basestring): type = dtype(type)
361        typecode, size = type.char, type.itemsize
362        if (typecode, size) not in REVERSE:
363            raise ValueError("NetCDF 3 does not support type %s" % type)
364        dtype_ = '>%s' % typecode
365        if size > 1: dtype_ += str(size)
366
367        data = empty(shape_, dtype=dtype_)
368        self.variables[name] = netcdf_variable(data, typecode, size, shape, dimensions)
369        return self.variables[name]
370
371    def flush(self):
372        """
373        Perform a sync-to-disk flush if the `netcdf_file` object is in write mode.
374
375        See Also
376        --------
377        sync : Identical function
378
379        """
380        if hasattr(self, 'mode') and self.mode is 'w':
381            self._write()
382    sync = flush
383
384    def _write(self):
385        self.fp.write(asbytes('CDF'))
386        self.fp.write(array(self.version_byte, '>b').tostring())
387
388        # Write headers and data.
389        self._write_numrecs()
390        self._write_dim_array()
391        self._write_gatt_array()
392        self._write_var_array()
393
394    def _write_numrecs(self):
395        # Get highest record count from all record variables.
396        for var in self.variables.values():
397            if var.isrec and len(var.data) > self._recs:
398                self.__dict__['_recs'] = len(var.data)
399        self._pack_int(self._recs)
400
401    def _write_dim_array(self):
402        if self.dimensions:
403            self.fp.write(NC_DIMENSION)
404            self._pack_int(len(self.dimensions))
405            for name in self._dims:
406                self._pack_string(name)
407                length = self.dimensions[name]
408                self._pack_int(length or 0)  # replace None with 0 for record dimension
409        else:
410            self.fp.write(ABSENT)
411
412    def _write_gatt_array(self):
413        self._write_att_array(self._attributes)
414
415    def _write_att_array(self, attributes):
416        if attributes:
417            self.fp.write(NC_ATTRIBUTE)
418            self._pack_int(len(attributes))
419            for name, values in attributes.items():
420                self._pack_string(name)
421                self._write_values(values)
422        else:
423            self.fp.write(ABSENT)
424
425    def _write_var_array(self):
426        if self.variables:
427            self.fp.write(NC_VARIABLE)
428            self._pack_int(len(self.variables))
429
430            # Sort variables non-recs first, then recs. We use a DSU
431            # since some people use pupynere with Python 2.3.x.
432            deco = [ (v._shape and not v.isrec, k) for (k, v) in self.variables.items() ]
433            deco.sort()
434            variables = [ k for (unused, k) in deco ][::-1]
435
436            # Set the metadata for all variables.
437            for name in variables:
438                self._write_var_metadata(name)
439            # Now that we have the metadata, we know the vsize of
440            # each record variable, so we can calculate recsize.
441            self.__dict__['_recsize'] = sum([
442                    var._vsize for var in self.variables.values()
443                    if var.isrec])
444            # Set the data for all variables.
445            for name in variables:
446                self._write_var_data(name)
447        else:
448            self.fp.write(ABSENT)
449
450    def _write_var_metadata(self, name):
451        var = self.variables[name]
452
453        self._pack_string(name)
454        self._pack_int(len(var.dimensions))
455        for dimname in var.dimensions:
456            dimid = self._dims.index(dimname)
457            self._pack_int(dimid)
458
459        self._write_att_array(var._attributes)
460
461        nc_type = REVERSE[var.typecode(), var.itemsize()]
462        self.fp.write(asbytes(nc_type))
463
464        if not var.isrec:
465            vsize = var.data.size * var.data.itemsize
466            vsize += -vsize % 4
467        else:  # record variable
468            try:
469                vsize = var.data[0].size * var.data.itemsize
470            except IndexError:
471                vsize = 0
472            rec_vars = len([var for var in self.variables.values()
473                    if var.isrec])
474            if rec_vars > 1:
475                vsize += -vsize % 4
476        self.variables[name].__dict__['_vsize'] = vsize
477        self._pack_int(vsize)
478
479        # Pack a bogus begin, and set the real value later.
480        self.variables[name].__dict__['_begin'] = self.fp.tell()
481        self._pack_begin(0)
482
483    def _write_var_data(self, name):
484        var = self.variables[name]
485
486        # Set begin in file header.
487        the_beguine = self.fp.tell()
488        self.fp.seek(var._begin)
489        self._pack_begin(the_beguine)
490        self.fp.seek(the_beguine)
491
492        # Write data.
493        if not var.isrec:
494            self.fp.write(var.data.tostring())
495            count = var.data.size * var.data.itemsize
496            self.fp.write(asbytes('0') * (var._vsize - count))
497        else:  # record variable
498            # Handle rec vars with shape[0] < nrecs.
499            if self._recs > len(var.data):
500                shape = (self._recs,) + var.data.shape[1:]
501                var.data.resize(shape)
502
503            pos0 = pos = self.fp.tell()
504            for rec in var.data:
505                # Apparently scalars cannot be converted to big endian. If we
506                # try to convert a ``=i4`` scalar to, say, '>i4' the dtype
507                # will remain as ``=i4``.
508                if not rec.shape and (rec.dtype.byteorder == '<' or
509                        (rec.dtype.byteorder == '=' and LITTLE_ENDIAN)):
510                    rec = rec.byteswap()
511                self.fp.write(rec.tostring())
512                # Padding
513                count = rec.size * rec.itemsize
514                self.fp.write(asbytes('0') * (var._vsize - count))
515                pos += self._recsize
516                self.fp.seek(pos)
517            self.fp.seek(pos0 + var._vsize)
518
519    def _write_values(self, values):
520        if hasattr(values, 'dtype'):
521            nc_type = REVERSE[values.dtype.char, values.dtype.itemsize]
522        else:
523            types = [
524                    (int, NC_INT),
525                    (long, NC_INT),
526                    (float, NC_FLOAT),
527                    (basestring, NC_CHAR),
528                    ]
529            try:
530                sample = values[0]
531            except TypeError:
532                sample = values
533            for class_, nc_type in types:
534                if isinstance(sample, class_): break
535
536        typecode, size = TYPEMAP[nc_type]
537        dtype_ = '>%s' % typecode
538
539        values = asarray(values, dtype=dtype_)
540
541        self.fp.write(asbytes(nc_type))
542
543        if values.dtype.char == 'S':
544            nelems = values.itemsize
545        else:
546            nelems = values.size
547        self._pack_int(nelems)
548
549        if not values.shape and (values.dtype.byteorder == '<' or
550                (values.dtype.byteorder == '=' and LITTLE_ENDIAN)):
551            values = values.byteswap()
552        self.fp.write(values.tostring())
553        count = values.size * values.itemsize
554        self.fp.write(asbytes('0') * (-count % 4))  # pad
555
556    def _read(self):
557        # Check magic bytes and version
558        magic = self.fp.read(3)
559        if not magic == asbytes('CDF'):
560            raise TypeError("Error: %s is not a valid NetCDF 3 file" %
561                            self.filename)
562        self.__dict__['version_byte'] = fromstring(self.fp.read(1), '>b')[0]
563
564        # Read file headers and set data.
565        self._read_numrecs()
566        self._read_dim_array()
567        self._read_gatt_array()
568        self._read_var_array()
569
570    def _read_numrecs(self):
571        self.__dict__['_recs'] = self._unpack_int()
572
573    def _read_dim_array(self):
574        header = self.fp.read(4)
575        if not header in [ZERO, NC_DIMENSION]:
576            raise ValueError("Unexpected header.")
577        count = self._unpack_int()
578
579        for dim in range(count):
580            name = asstr(self._unpack_string())
581            length = self._unpack_int() or None  # None for record dimension
582            self.dimensions[name] = length
583            self._dims.append(name)  # preserve order
584
585    def _read_gatt_array(self):
586        for k, v in self._read_att_array().items():
587            self.__setattr__(k, v)
588
589    def _read_att_array(self):
590        header = self.fp.read(4)
591        if not header in [ZERO, NC_ATTRIBUTE]:
592            raise ValueError("Unexpected header.")
593        count = self._unpack_int()
594
595        attributes = {}
596        for attr in range(count):
597            name = asstr(self._unpack_string())
598            attributes[name] = self._read_values()
599        return attributes
600
601    def _read_var_array(self):
602        header = self.fp.read(4)
603        if not header in [ZERO, NC_VARIABLE]:
604            raise ValueError("Unexpected header.")
605
606        begin = 0
607        dtypes = {'names': [], 'formats': []}
608        rec_vars = []
609        count = self._unpack_int()
610        for var in range(count):
611            (name, dimensions, shape, attributes,
612             typecode, size, dtype_, begin_, vsize) = self._read_var()
613            # http://www.unidata.ucar.edu/software/netcdf/docs/netcdf.html
614            # Note that vsize is the product of the dimension lengths
615            # (omitting the record dimension) and the number of bytes
616            # per value (determined from the type), increased to the
617            # next multiple of 4, for each variable. If a record
618            # variable, this is the amount of space per record. The
619            # netCDF "record size" is calculated as the sum of the
620            # vsize's of all the record variables.
621            #
622            # The vsize field is actually redundant, because its value
623            # may be computed from other information in the header. The
624            # 32-bit vsize field is not large enough to contain the size
625            # of variables that require more than 2^32 - 4 bytes, so
626            # 2^32 - 1 is used in the vsize field for such variables.
627            if shape and shape[0] is None: # record variable
628                rec_vars.append(name)
629                # The netCDF "record size" is calculated as the sum of
630                # the vsize's of all the record variables.
631                self.__dict__['_recsize'] += vsize
632                if begin == 0: begin = begin_
633                dtypes['names'].append(name)
634                dtypes['formats'].append(str(shape[1:]) + dtype_)
635
636                # Handle padding with a virtual variable.
637                if typecode in 'bch':
638                    actual_size = reduce(mul, (1,) + shape[1:]) * size
639                    padding = -actual_size % 4
640                    if padding:
641                        dtypes['names'].append('_padding_%d' % var)
642                        dtypes['formats'].append('(%d,)>b' % padding)
643
644                # Data will be set later.
645                data = None
646            else: # not a record variable
647                # Calculate size to avoid problems with vsize (above)
648                a_size = reduce(mul, shape, 1) * size
649                if self.use_mmap:
650                    mm = mmap(self.fp.fileno(), begin_+a_size, access=ACCESS_READ)
651                    data = ndarray.__new__(ndarray, shape, dtype=dtype_,
652                            buffer=mm, offset=begin_, order=0)
653                else:
654                    pos = self.fp.tell()
655                    self.fp.seek(begin_)
656                    data = fromstring(self.fp.read(a_size), dtype=dtype_)
657                    data.shape = shape
658                    self.fp.seek(pos)
659
660            # Add variable.
661            self.variables[name] = netcdf_variable(
662                    data, typecode, size, shape, dimensions, attributes)
663
664        if rec_vars:
665            # Remove padding when only one record variable.
666            if len(rec_vars) == 1:
667                dtypes['names'] = dtypes['names'][:1]
668                dtypes['formats'] = dtypes['formats'][:1]
669
670            # Build rec array.
671            if self.use_mmap:
672                mm = mmap(self.fp.fileno(), begin+self._recs*self._recsize, access=ACCESS_READ)
673                rec_array = ndarray.__new__(ndarray, (self._recs,), dtype=dtypes,
674                        buffer=mm, offset=begin, order=0)
675            else:
676                pos = self.fp.tell()
677                self.fp.seek(begin)
678                rec_array = fromstring(self.fp.read(self._recs*self._recsize), dtype=dtypes)
679                rec_array.shape = (self._recs,)
680                self.fp.seek(pos)
681
682            for var in rec_vars:
683                self.variables[var].__dict__['data'] = rec_array[var]
684
685    def _read_var(self):
686        name = asstr(self._unpack_string())
687        dimensions = []
688        shape = []
689        dims = self._unpack_int()
690
691        for i in range(dims):
692            dimid = self._unpack_int()
693            dimname = self._dims[dimid]
694            dimensions.append(dimname)
695            dim = self.dimensions[dimname]
696            shape.append(dim)
697        dimensions = tuple(dimensions)
698        shape = tuple(shape)
699
700        attributes = self._read_att_array()
701        nc_type = self.fp.read(4)
702        vsize = self._unpack_int()
703        begin = [self._unpack_int, self._unpack_int64][self.version_byte-1]()
704
705        typecode, size = TYPEMAP[nc_type]
706        dtype_ = '>%s' % typecode
707
708        return name, dimensions, shape, attributes, typecode, size, dtype_, begin, vsize
709
710    def _read_values(self):
711        nc_type = self.fp.read(4)
712        n = self._unpack_int()
713
714        typecode, size = TYPEMAP[nc_type]
715
716        count = n*size
717        values = self.fp.read(int(count))
718        self.fp.read(-count % 4)  # read padding
719
720        if typecode is not 'c':
721            values = fromstring(values, dtype='>%s' % typecode)
722            if values.shape == (1,): values = values[0]
723        else:
724            values = values.rstrip(asbytes('\x00'))
725        return values
726
727    def _pack_begin(self, begin):
728        if self.version_byte == 1:
729            self._pack_int(begin)
730        elif self.version_byte == 2:
731            self._pack_int64(begin)
732
733    def _pack_int(self, value):
734        self.fp.write(array(value, '>i').tostring())
735    _pack_int32 = _pack_int
736
737    def _unpack_int(self):
738        return int(fromstring(self.fp.read(4), '>i')[0])
739    _unpack_int32 = _unpack_int
740
741    def _pack_int64(self, value):
742        self.fp.write(array(value, '>q').tostring())
743
744    def _unpack_int64(self):
745        return fromstring(self.fp.read(8), '>q')[0]
746
747    def _pack_string(self, s):
748        count = len(s)
749        self._pack_int(count)
750        self.fp.write(asbytes(s))
751        self.fp.write(asbytes('0') * (-count % 4))  # pad
752
753    def _unpack_string(self):
754        count = self._unpack_int()
755        s = self.fp.read(count).rstrip(asbytes('\x00'))
756        self.fp.read(-count % 4)  # read padding
757        return s
758
759
760class netcdf_variable(object):
761    """
762    A data object for the `netcdf` module.
763
764    `netcdf_variable` objects are constructed by calling the method
765    `netcdf_file.createVariable` on the `netcdf_file` object. `netcdf_variable`
766    objects behave much like array objects defined in numpy, except that their
767    data resides in a file. Data is read by indexing and written by assigning
768    to an indexed subset; the entire array can be accessed by the index ``[:]``
769    or (for scalars) by using the methods `getValue` and `assignValue`.
770    `netcdf_variable` objects also have attribute `shape` with the same meaning
771    as for arrays, but the shape cannot be modified. There is another read-only
772    attribute `dimensions`, whose value is the tuple of dimension names.
773
774    All other attributes correspond to variable attributes defined in
775    the NetCDF file. Variable attributes are created by assigning to an
776    attribute of the `netcdf_variable` object.
777
778    Parameters
779    ----------
780    data : array_like
781        The data array that holds the values for the variable.
782        Typically, this is initialized as empty, but with the proper shape.
783    typecode : dtype character code
784        Desired data-type for the data array.
785    size : int
786        Desired element size for the data array.
787    shape : sequence of ints
788        The shape of the array.  This should match the lengths of the
789        variable's dimensions.
790    dimensions : sequence of strings
791        The names of the dimensions used by the variable.  Must be in the
792        same order of the dimension lengths given by `shape`.
793    attributes : dict, optional
794        Attribute values (any type) keyed by string names.  These attributes
795        become attributes for the netcdf_variable object.
796
797
798    Attributes
799    ----------
800    dimensions : list of str
801        List of names of dimensions used by the variable object.
802    isrec, shape
803        Properties
804
805    See also
806    --------
807    isrec, shape
808
809    """
810    def __init__(self, data, typecode, size, shape, dimensions, attributes=None):
811        self.data = data
812        self._typecode = typecode
813        self._size = size
814        self._shape = shape
815        self.dimensions = dimensions
816
817        self._attributes = attributes or {}
818        for k, v in self._attributes.items():
819            self.__dict__[k] = v
820
821    def __setattr__(self, attr, value):
822        # Store user defined attributes in a separate dict,
823        # so we can save them to file later.
824        try:
825            self._attributes[attr] = value
826        except AttributeError:
827            pass
828        self.__dict__[attr] = value
829
830    def isrec(self):
831        """Returns whether the variable has a record dimension or not.
832
833        A record dimension is a dimension along which additional data could be
834        easily appended in the netcdf data structure without much rewriting of
835        the data file. This attribute is a read-only property of the
836        `netcdf_variable`.
837
838        """
839        return self.data.shape and not self._shape[0]
840    isrec = property(isrec)
841
842    def shape(self):
843        """Returns the shape tuple of the data variable.
844
845        This is a read-only attribute and can not be modified in the
846        same manner of other numpy arrays.
847        """
848        return self.data.shape
849    shape = property(shape)
850
851    def getValue(self):
852        """
853        Retrieve a scalar value from a `netcdf_variable` of length one.
854
855        Raises
856        ------
857        ValueError
858            If the netcdf variable is an array of length greater than one,
859            this exception will be raised.
860
861        """
862        return self.data.item()
863
864    def assignValue(self, value):
865        """
866        Assign a scalar value to a `netcdf_variable` of length one.
867
868        Parameters
869        ----------
870        value : scalar
871            Scalar value (of compatible type) to assign to a length-one netcdf
872            variable. This value will be written to file.
873
874        Raises
875        ------
876        ValueError
877            If the input is not a scalar, or if the destination is not a length-one
878            netcdf variable.
879
880        """
881        self.data.itemset(value)
882
883    def typecode(self):
884        """
885        Return the typecode of the variable.
886
887        Returns
888        -------
889        typecode : char
890            The character typecode of the variable (eg, 'i' for int).
891
892        """
893        return self._typecode
894
895    def itemsize(self):
896        """
897        Return the itemsize of the variable.
898
899        Returns
900        -------
901        itemsize : int
902            The element size of the variable (eg, 8 for float64).
903
904        """
905        return self._size
906
907    def __getitem__(self, index):
908        return self.data[index]
909
910    def __setitem__(self, index, data):
911        # Expand data for record vars?
912        if self.isrec:
913            if isinstance(index, tuple):
914                rec_index = index[0]
915            else:
916                rec_index = index
917            if isinstance(rec_index, slice):
918                recs = (rec_index.start or 0) + len(data)
919            else:
920                recs = rec_index + 1
921            if recs > len(self.data):
922                shape = (recs,) + self._shape[1:]
923                self.data.resize(shape)
924        self.data[index] = data
925
926
927NetCDFFile = netcdf_file
928NetCDFVariable = netcdf_variable
929