1# gridData --- python modules to read and write gridded data
2# Copyright (c) 2009-2014 Oliver Beckstein <orbeckst@gmail.com>
3# Released under the GNU Lesser General Public License, version 3 or later.
4
5r"""
6:mod:`~gridData.OpenDX` --- routines to read and write simple OpenDX files
7==========================================================================
8
9The OpenDX format for multi-dimensional grid data. OpenDX is a free
10visualization software, see http://www.opendx.org.
11
12.. Note:: This module only implements a primitive subset, sufficient
13          to represent n-dimensional regular grids.
14
15The OpenDX scalar file format is specified in Appendix `B.2 Data
16Explorer Native Files`_ [#OpenDXformat]_.
17
18If you want to build a dx object from your data you can either use the
19convenient :class:`~gridData.core.Grid` class from the top level
20module (:class:`gridData.Grid`) or see the lower-level methods
21described below.
22
23
24.. _opendx-read-write:
25
26Reading and writing OpenDX files
27--------------------------------
28
29If you have OpenDX files from other software and you just want to
30**read** it into a Python array then you do not really need to use the
31interface in :mod:`gridData.OpenDX`: just use
32:class:`~gridData.core.Grid` and load the file::
33
34  from gridData import Grid
35  g = Grid("data.dx")
36
37This should work for files produced by common visualization programs
38(VMD_, PyMOL_, Chimera_). The documentation for :mod:`gridData` tells
39you more about what to do with the :class:`~gridData.core.Grid`
40object.
41
42If you want to **write** an OpenDX file then you just use the
43:meth:`gridData.core.Grid.export` method with `file_format="dx"` (or
44just use a filename with extension ".dx")::
45
46  g.export("data.dx")
47
48However, some visualization programs do not implement full OpenDX
49specifications and only read very specific, "OpenDX-like"
50files. :mod:`gridData.OpenDX` tries to be compatible with these
51formats. However, sometimes additional help is needed to write an
52OpenDX file that can be read by a specific software, as described
53below:
54
55Known issues for writing OpenDX files
56~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
57
58* APBS require the delta to be written to the seventh significant figure.
59  The delta is now written to reflect this increase in precision.
60
61  .. versionchanged:: 0.6.0
62
63* PyMOL_ requires OpenDX files with the type specification "double" in
64  the `class array` section (see issue `#35`_). By default (since
65  release 0.4.0), the type is set to the one that most closely
66  approximates the dtype of the numpy array :attr:`Grid.grid`, which
67  holds all data. This is often :class:`numpy.float64`, which will
68  create an OpenDX type "double", which PyMOL will read.
69
70  However, if you want to *force* a specific OpenDX type (such as
71  "float" or "double", see :attr:`gridData.OpenDX.array.dx_types` for
72  available values) then you can use the ``type`` keyword argument::
73
74    g.export("for_pymol.dx", type="double")
75
76  If you always want to be able to read OpenDX files with PyMOL, it is
77  suggested to always export with ``type="double"``.
78
79  .. versionadded:: 0.4.0
80
81
82
83.. _VMD: http://www.ks.uiuc.edu/Research/vmd/
84.. _PyMOL: http://www.pymol.org/
85.. _Chimera: https://www.cgl.ucsf.edu/chimera/
86.. _`#35`: https://github.com/MDAnalysis/GridDataFormats/issues/35
87
88
89
90
91Building a dx object from a numpy array ``A``
92---------------------------------------------
93
94If you have a numpy array ``A`` that represents a density in cartesian
95space then you can construct a dx object (named a *field* in OpenDX
96parlance) if you provide some additional information that fixes the
97coordinate system in space and defines the units along the axes.
98
99The following data are required:
100
101grid
102    numpy nD array (typically a nD histogram)
103grid.shape
104    the shape of the array
105origin
106    the cartesian coordinates of the center of the (0,0,..,0) grid cell
107delta
108    :math:`n \times n` array with the length of a grid cell along
109    each axis; for regular rectangular grids the off-diagonal
110    elements are 0 and the diagonal ones correspond to the
111    'bin width' of the histogram, eg ``delta[0,0] = 1.0`` (Angstrom)
112
113The DX data type ("type" in the DX file) is determined from the
114:class:`numpy.dtype` of the :class:`numpy.ndarray` that is provided as
115the *grid* (or with the *type* keyword argument to
116:class:`gridData.OpenDX.array`).
117
118For example, to build a :class:`field`::
119
120  dx = OpenDX.field('density')
121  dx.add('positions', OpenDX.gridpositions(1, grid.shape, origin, delta))
122  dx.add('connections', OpenDX.gridconnections(2, grid.shape))
123  dx.add('data', OpenDX.array(3, grid))
124
125or all with the constructor::
126
127  dx = OpenDX.field('density', components=dict(
128            positions=OpenDX.gridpositions(1,grid.shape, d.origin, d.delta),
129            connections=OpenDX.gridconnections(2, grid.shape),
130            data=OpenDX.array(3, grid)))
131
132
133Building a dx object from a dx file
134-----------------------------------
135
136One can also read data from an existing dx file::
137
138 dx = OpenDX.field(0)
139 dx.read('file.dx')
140
141Only simple arrays are read and initially stored as a 1-d
142:class:`numpy.ndarray` in the `dx.components['data'].array` with the
143:class:`numpy.dtype` determined by the DX type in the file.
144
145The dx :class:`field` object has a method
146:meth:`~OpenDX.field.histogramdd` that produces output identical to
147the :func:`numpy.histogramdd` function by taking the stored dimension
148and deltas into account. In this way, one can store nD histograms in a
149portable and universal manner::
150
151  histogram, edges = dx.histogramdd()
152
153.. rubric:; Footnotes
154
155.. [#OpenDXformat] The original link to the OpenDX file format specs
156   http://opendx.sdsc.edu/docs/html/pages/usrgu068.htm#HDREDF is dead so I am linking
157   to an archived copy at the Internet Archive , `B.2 Data Explorer Native Files`_.
158
159.. _`B.2 Data Explorer Native Files`:
160   https://web.archive.org/web/20080808140524/http://opendx.sdsc.edu/docs/html/pages/usrgu068.htm
161.. http://opendx.sdsc.edu/docs/html/pages/usrgu068.htm#HDREDF
162
163Classes and functions
164---------------------
165
166"""
167from __future__ import absolute_import, division, with_statement
168
169import numpy
170import re
171from six import next
172from six.moves import range
173import gzip
174
175import warnings
176
177# Python 2/3 compatibility (see issue #99)
178# and https://bugs.python.org/issue30012
179import sys
180if sys.version_info >= (3, ):
181    def _gzip_open(filename, mode="rt"):
182        return gzip.open(filename, mode)
183else:
184    def _gzip_open(filename, mode="rt"):
185        return gzip.open(filename)
186del sys
187
188class DXclass(object):
189    """'class' object as defined by OpenDX"""
190    def __init__(self,classid):
191        """id is the object number"""
192        self.id = classid  # serial number of the object
193        self.name = None   # name of the DXclass
194        self.component = None   # component type
195        self.D = None      # dimensions
196
197    def write(self, stream, optstring="", quote=False):
198        """write the 'object' line; additional args are packed in string"""
199        classid = str(self.id)
200        if quote: classid = '"'+classid+'"'
201        # Only use a *single* space between tokens; both chimera's and pymol's DX parser
202        # does not properly implement the OpenDX specs and produces garbage with multiple
203        # spaces. (Chimera 1.4.1, PyMOL 1.3)
204        to_write = 'object '+classid+' class '+str(self.name)+' '+optstring+'\n'
205        self._write_line(stream, to_write)
206
207    @staticmethod
208    def _write_line(stream, line="", quote=False):
209        """write a line to the file"""
210        if isinstance(stream, gzip.GzipFile):
211            line = line.encode()
212        stream.write(line)
213
214    def read(self, stream):
215        raise NotImplementedError('Reading is currently not supported.')
216
217    def ndformat(self,s):
218        """Returns a string with as many repetitions of s as self
219        has dimensions (derived from shape)"""
220        return s * len(self.shape)
221
222    def __repr__(self):
223        return '<OpenDX.'+str(self.name)+' object, id='+str(self.id)+'>'
224
225
226class gridpositions(DXclass):
227    """OpenDX gridpositions class.
228
229    shape     D-tuplet describing size in each dimension
230    origin    coordinates of the centre of the grid cell with index 0,0,...,0
231    delta     DxD array describing the deltas
232    """
233    def __init__(self,classid,shape=None,origin=None,delta=None,**kwargs):
234        if shape is None or origin is None or delta is None:
235            raise ValueError('all keyword arguments are required')
236        self.id = classid
237        self.name = 'gridpositions'
238        self.component = 'positions'
239        self.shape = numpy.asarray(shape)      # D dimensional shape
240        self.origin = numpy.asarray(origin)    # D vector
241        self.rank = len(self.shape)            # D === rank
242
243        self.delta = numpy.asarray(delta)      # DxD array of grid spacings
244        # gridDataFormats  actually provides a simple 1D array with the deltas because only
245        # regular grids are used but the following is a reminder that OpenDX should be able
246        # to handle more complicated volume elements
247        if len(self.delta.shape) == 1:
248            self.delta = numpy.diag(delta)
249        if self.delta.shape != (self.rank, self.rank):
250            # check OpenDX specs for irreg spacing if we want to implement
251            # anything more complicated
252            raise NotImplementedError('Only regularly spaced grids allowed, '
253                                      'not delta={}'.format(self.delta))
254    def write(self, stream):
255        super(gridpositions, self).write(
256            stream, ('counts '+self.ndformat(' %d')) % tuple(self.shape))
257        self._write_line(stream, 'origin %f %f %f\n' % tuple(self.origin))
258        for delta in self.delta:
259            self._write_line(
260                stream, ('delta ' +
261                         self.ndformat(' {:.7g}').format(*delta) +
262                         '\n'))
263
264    def edges(self):
265        """Edges of the grid cells, origin at centre of 0,0,..,0 grid cell.
266
267        Only works for regular, orthonormal grids.
268        """
269        return [self.delta[d,d] * numpy.arange(self.shape[d]+1) + self.origin[d]\
270                - 0.5*self.delta[d,d]     for d in range(self.rank)]
271
272
273class gridconnections(DXclass):
274    """OpenDX gridconnections class"""
275    def __init__(self,classid,shape=None,**kwargs):
276        if shape is None:
277            raise ValueError('all keyword arguments are required')
278        self.id = classid
279        self.name = 'gridconnections'
280        self.component = 'connections'
281        self.shape = numpy.asarray(shape)      # D dimensional shape
282
283    def write(self, stream):
284        super(gridconnections, self).write(
285            stream, ('counts '+self.ndformat(' %d')) % tuple(self.shape))
286
287
288class array(DXclass):
289    """OpenDX array class.
290
291    See `Array Objects`_ for details.
292
293    .. _Array Objects:
294       https://web.archive.org/web/20080808140524/http://opendx.sdsc.edu/docs/html/pages/usrgu068.htm#Header_440
295    """
296    #: conversion from :attr:`numpy.dtype.name` to closest OpenDX array type
297    #: (round-tripping is not guaranteed to produce identical types); not all
298    #: types are supported (e.g., strings are missing)
299    np_types = {
300        "uint8": "byte",         # DX "unsigned byte" equivalent
301        "int8": "signed byte",
302        "uint16": "unsigned short",
303        "int16": "short",         # DX "signed short" equivalent
304        "uint32": "unsigned int",
305        "int32": "int",           # DX "signed int"   equivalent
306        "uint64": "unsigned int", # not explicit in DX, for compatibility
307        "int64": "int",           # not explicit in DX, for compatibility
308        # "hyper",                # ?
309        "float32": "float",       # default
310        "float64": "double",
311        "float16": "float",       # float16 not available in DX, use float
312        # numpy "float128 not available, raise error
313        # "string" not automatically supported
314    }
315    #: conversion from OpenDX type to closest :class:`numpy.dtype`
316    #: (round-tripping is not guaranteed to produce identical types); not all
317    #: types are supported (e.g., strings and conversion to int64 are missing)
318    dx_types = {
319        "byte": "uint8",
320        "unsigned byte": "uint8",
321        "signed byte": "int8",
322        "unsigned short": "uint16",
323        "short": "int16",
324        "signed short": "int16",
325        "unsigned int": "uint32",
326        "int": "int32",
327        "signed int": "int32",
328        # "hyper",                # ?
329        "float": "float32",       # default
330        "double": "float64",
331        # "string" not automatically supported
332    }
333
334    def __init__(self, classid, array=None, type=None, typequote='"',
335                 **kwargs):
336        """
337        Parameters
338        ----------
339        classid : int
340        array : array_like
341        type : str (optional)
342             Set the DX type in the output file and cast `array` to
343             the closest numpy dtype.  `type` must be one of the
344             allowed types in DX files as defined under `Array
345             Objects`_.  The default ``None`` tries to set the type
346             from the :class:`numpy.dtype` of `array`.
347
348             .. versionadded:: 0.4.0
349
350        Raises
351        ------
352        ValueError
353             if `array` is not provided; or if `type` is not of the correct
354             DX type
355        """
356        if array is None:
357            raise ValueError('array keyword argument is required')
358        self.id = classid
359        self.name = 'array'
360        self.component = 'data'
361        # detect type https://github.com/MDAnalysis/GridDataFormats/issues/35
362        if type is None:
363            self.array = numpy.asarray(array)
364            try:
365                self.type = self.np_types[self.array.dtype.name]
366            except KeyError:
367                warnings.warn(("array dtype.name = {0} can not be automatically "
368                               "converted to a DX array type. Use the 'type' keyword "
369                               "to manually specify the correct type.").format(
370                                   self.array.dtype.name))
371                self.type = self.array.dtype.name  # will raise ValueError on writing
372        else:
373            try:
374                self.array = numpy.asarray(array, dtype=self.dx_types[type])
375            except KeyError:
376                raise ValueError(("DX type {0} cannot be converted to an "
377                                  "appropriate numpy dtype. Available "
378                                  "types are: {1}".format(type,
379                                                          list(self.dx_types.values()))))
380            self.type = type
381        self.typequote = typequote
382
383    def write(self, stream):
384        """Write the *class array* section.
385
386        Parameters
387        ----------
388        stream : stream
389
390        Raises
391        ------
392        ValueError
393             If the `dxtype` is not a valid type, :exc:`ValueError` is
394             raised.
395
396        """
397        if self.type not in self.dx_types:
398            raise ValueError(("DX type {} is not supported in the DX format. \n"
399                              "Supported valus are: {}\n"
400                              "Use the type=<type> keyword argument.").format(
401                                  self.type, list(self.dx_types.keys())))
402        typelabel = (self.typequote+self.type+self.typequote)
403        super(array, self).write(stream, 'type {0} rank 0 items {1} data follows'.format(
404            typelabel, self.array.size))
405
406        # grid data, serialized as a C array (z fastest varying)
407        # (flat iterator is equivalent to: for x: for y: for z: grid[x,y,z])
408        # VMD's DX reader requires exactly 3 values per line
409        fmt_string = "{:d}"
410        if (self.array.dtype.kind == 'f' or self.array.dtype.kind == 'c'):
411            precision = numpy.finfo(self.array.dtype).precision
412            fmt_string = "{:."+"{:d}".format(precision)+"f}"
413        values_per_line = 3
414        values = self.array.flat
415        while 1:
416            try:
417                for i in range(values_per_line):
418                    self._write_line(stream, fmt_string.format(next(values)) + "\t")
419                self._write_line(stream, '\n')
420            except StopIteration:
421                self._write_line(stream, '\n')
422                break
423        self._write_line(stream, 'attribute "dep" string "positions"\n')
424
425class field(DXclass):
426    """OpenDX container class
427
428    The *field* is the top-level object and represents the whole
429    OpenDX file. It contains a number of other objects.
430
431    Instantiate a DX object from this class and add subclasses with
432    :meth:`add`.
433
434    """
435    # perhaps this should not derive from DXclass as those are
436    # objects in field but a field cannot contain itself
437    def __init__(self,classid='0',components=None,comments=None):
438        """OpenDX object, which is build from a list of components.
439
440        Parameters
441        ----------
442
443        id : str
444               arbitrary string
445        components : dict
446               dictionary of DXclass instances (no sanity check on the
447               individual ids!) which correspond to
448
449               * positions
450               * connections
451               * data
452
453        comments : list
454               list of strings; each string becomes a comment line
455               prefixed with '#'. Avoid newlines.
456
457
458        A field must have at least the components 'positions',
459        'connections', and 'data'. Those components are associated
460        with objects belonging to the field. When writing a dx file
461        from the field, only the required objects are dumped to the file.
462
463        (For a more general class that can use field:
464        Because there could be more objects than components, we keep a
465        separate object list. When dumping the dx file, first all
466        objects are written and then the field object describes its
467        components. Objects are referenced by their unique id.)
468
469        .. Note:: uniqueness of the *id* is not checked.
470
471
472        Example
473        -------
474        Create a new dx object::
475
476           dx = OpenDX.field('density',[gridpoints,gridconnections,array])
477
478        """
479        if components is None:
480            components = dict(positions=None,connections=None,data=None)
481        if comments is None:
482            comments = ['OpenDX written by gridData.OpenDX',
483                        'from https://github.com/MDAnalysis/GridDataFormats']
484        elif type(comments) is not list:
485            comments = [str(comments)]
486        self.id = classid       # can be an arbitrary string
487        self.name = 'field'
488        self.component = None   # cannot be a component of a field
489        self.components = components
490        self.comments= comments
491
492    def _openfile_writing(self, filename):
493        """Returns a regular or gz file stream for writing"""
494        if filename.endswith('.gz'):
495            return gzip.open(filename, 'wb')
496        else:
497            return open(filename, 'w')
498
499    def write(self, filename):
500        """Write the complete dx object to the file.
501
502        This is the simple OpenDX format which includes the data into
503        the header via the 'object array ... data follows' statement.
504
505        Only simple regular arrays are supported.
506
507        The format should be compatible with VMD's dx reader plugin.
508        """
509        # comments (VMD chokes on lines of len > 80, so truncate)
510        maxcol = 80
511        with self._openfile_writing(str(filename)) as outfile:
512            for line in self.comments:
513                comment = '# '+str(line)
514                self._write_line(outfile, comment[:maxcol]+'\n')
515            # each individual object
516            for component, object in self.sorted_components():
517                object.write(outfile)
518            # the field object itself
519            super(field, self).write(outfile, quote=True)
520            for component, object in self.sorted_components():
521                self._write_line(outfile, 'component "%s" value %s\n' % (
522                    component, str(object.id)))
523
524    def read(self, stream):
525        """Read DX field from file.
526
527            dx = OpenDX.field.read(dxfile)
528
529        The classid is discarded and replaced with the one from the file.
530        """
531        DXfield = self
532        p = DXParser(stream)
533        p.parse(DXfield)
534
535    def add(self,component,DXobj):
536        """add a component to the field"""
537        self[component] = DXobj
538
539    def add_comment(self,comment):
540        """add comments"""
541        self.comments.append(comment)
542
543    def sorted_components(self):
544        """iterator that returns (component,object) in id order"""
545        for component, object in \
546                sorted(self.components.items(),
547                       key=lambda comp_obj: comp_obj[1].id):
548            yield component, object
549
550    def histogramdd(self):
551        """Return array data as (edges,grid), i.e. a numpy nD histogram."""
552        shape = self.components['positions'].shape
553        edges = self.components['positions'].edges()
554        hist = self.components['data'].array.reshape(shape)
555        return (hist,edges)
556
557    def __getitem__(self,key):
558        return self.components[key]
559
560    def __setitem__(self,key,value):
561        self.components[key] = value
562
563    def __repr__(self):
564        return '<OpenDX.field object, id='+str(self.id)+', with '+\
565               str(len(self.components))+' components and '+\
566               str(len(self.components))+' objects>'
567
568
569#------------------------------------------------------------
570# DX file parsing
571#------------------------------------------------------------
572
573class DXParseError(Exception):
574    """general exception for parsing errors in DX files"""
575    pass
576class DXParserNoTokens(DXParseError):
577    """raised when the token buffer is exhausted"""
578    pass
579
580class Token:
581    # token categories (values of dx_regex must match up with these categories)
582    category = {'COMMENT': ['COMMENT'],
583                'WORD': ['WORD'],
584                'STRING': ['QUOTEDSTRING','BARESTRING','STRING'],
585                'WHITESPACE': ['WHITESPACE'],
586                'INTEGER': ['INTEGER'],
587                'REAL': ['REAL'],
588                'NUMBER': ['INTEGER','REAL']}
589    # cast functions
590    cast = {'COMMENT': lambda s:re.sub(r'#\s*','',s),
591            'WORD': str,
592            'STRING': str, 'QUOTEDSTRING': str, 'BARESTRING': str,
593            'WHITESPACE': None,
594            'NUMBER': float, 'INTEGER': int, 'REAL': float}
595
596    def __init__(self,code,text):
597        self.code = code    # store raw code
598        self.text = text
599    def equals(self,v):
600        return self.text == v
601    def iscode(self,code):
602        return self.code in self.category[code]  # use many -> 1 mappings
603    def value(self,ascode=None):
604        """Return text cast to the correct type or the selected type"""
605        if ascode is None:
606            ascode = self.code
607        return self.cast[ascode](self.text)
608    def __repr__(self):
609        return '<token '+str(self.code)+','+str(self.value())+'>'
610
611class DXInitObject(object):
612    """Storage class that holds data to initialize one of the 'real'
613    classes such as OpenDX.array, OpenDX.gridconnections, ...
614
615    All variables are stored in args which will be turned into the
616    arguments for the DX class.
617    """
618    DXclasses = {'gridpositions':gridpositions,
619                 'gridconnections':gridconnections,
620                 'array':array, 'field':field,
621                 }
622
623    def __init__(self,classtype,classid):
624        self.type = classtype
625        self.id = classid
626        self.args = dict()
627    def initialize(self):
628        """Initialize the corresponding DXclass from the data.
629
630        class = DXInitObject.initialize()
631        """
632        return self.DXclasses[self.type](self.id,**self.args)
633    def __getitem__(self,k):
634        return self.args[k]
635    def __setitem__(self,k,v):
636        self.args[k] = v
637    def __repr__(self):
638        return '<DXInitObject instance type='+str(self.type)+', id='+str(self.id)+'>'
639
640class DXParser(object):
641    """Brain-dead baroque implementation to read a simple (VMD) dx file.
642
643    Requires a OpenDX.field instance.
644
645    1) scan for 'object' lines:
646       'object' id 'class' class  [data]
647       [data ...]
648    2) parse data according to class
649    3) construct dx field from classes
650    """
651
652    # the regexes must match with the categories defined in the Token class
653    # REAL regular expression will catch both integers and floats.
654    # Taken from
655    # https://docs.python.org/3/library/re.html#simulating-scanf
656    dx_regex = re.compile(r"""
657    (?P<COMMENT>\#.*$)            # comment (until end of line)
658    |(?P<WORD>(object|class|counts|origin|delta|type|counts|rank|items|data))
659    |"(?P<QUOTEDSTRING>[^\"]*)"   # string in double quotes  (quotes removed)
660    |(?P<WHITESPACE>\s+)          # white space
661    |(?P<REAL>[-+]?               # true real number (decimal point or
662    (\d+(\.\d*)?|\.\d+)           # scientific notation) and integers
663    ([eE][-+]?\d+)?)
664    |(?P<BARESTRING>[a-zA-Z_][^\s\#\"]+) # unquoted strings, starting with non-numeric
665    """, re.VERBOSE)
666
667
668    def __init__(self, filename):
669        """Setup a parser for a simple DX file (from VMD)
670
671        >>> DXfield_object = OpenDX.field(id)
672        >>> p = DXparser('bulk.dx')
673        >>> p.parse(DXfield_object)
674
675        The field object will be completely rewritten (including the
676        id if one is found in the input file. The input files
677        component layout is currently ignored.
678
679        Note that quotes are removed from quoted strings.
680        """
681        self.filename = str(filename)
682        self.field = field('grid data',comments=['filename: {0}'.format(self.filename)])
683        # other variables are initialised every time parse() is called
684
685        self.parsers = {'general':self.__general,
686                        'comment':self.__comment, 'object':self.__object,
687                        'gridpositions':self.__gridpositions,
688                        'gridconnections':self.__gridconnections,
689                        'array':self.__array, 'field':self.__field,
690                        }
691
692
693    def parse(self, DXfield):
694        """Parse the dx file and construct a DX field object with component classes.
695
696        A :class:`field` instance *DXfield* must be provided to be
697        filled by the parser::
698
699           DXfield_object = OpenDX.field(*args)
700           parse(DXfield_object)
701
702        A tokenizer turns the dx file into a stream of tokens. A
703        hierarchy of parsers examines the stream. The level-0 parser
704        ('general') distinguishes comments and objects (level-1). The
705        object parser calls level-3 parsers depending on the object
706        found. The basic idea is that of a 'state machine'. There is
707        one parser active at any time. The main loop is the general
708        parser.
709
710        * Constructing the dx objects with classtype and classid is
711          not implemented yet.
712        * Unknown tokens raise an exception.
713        """
714
715        self.DXfield = DXfield              # OpenDX.field (used by comment parser)
716        self.currentobject = None           # containers for data
717        self.objects = []                   # |
718        self.tokens = []                    # token buffer
719
720        if self.filename.endswith('.gz'):
721            with _gzip_open(self.filename, 'rt') as self.dxfile:
722                self.use_parser('general')
723        else:
724            with open(self.filename, 'r') as self.dxfile:
725                self.use_parser('general')      # parse the whole file and populate self.objects
726
727        # assemble field from objects
728        for o in self.objects:
729            if o.type == 'field':
730                # Almost ignore the field object; VMD, for instance,
731                # does not write components. To make this work
732                # seamlessly I have to think harder how to organize
733                # and use the data, eg preping the field object
734                # properly and the initializing. Probably should also
735                # check uniqueness of ids etc.
736                DXfield.id = o.id
737                continue
738            c = o.initialize()
739            self.DXfield.add(c.component,c)
740
741        # free space
742        del self.currentobject, self.objects
743
744
745
746    def __general(self):
747        """Level-0 parser and main loop.
748
749        Look for a token that matches a level-1 parser and hand over control."""
750        while 1:                            # main loop
751            try:
752                tok = self.__peek()         # only peek, apply_parser() will consume
753            except DXParserNoTokens:
754                # save previous DXInitObject
755                # (kludge in here as the last level-2 parser usually does not return
756                # via the object parser)
757                if self.currentobject and self.currentobject not in self.objects:
758                    self.objects.append(self.currentobject)
759                return                      # stop parsing and finish
760            # decision branches for all level-1 parsers:
761            # (the only way to get out of the lower level parsers!)
762            if tok.iscode('COMMENT'):
763                self.set_parser('comment')  # switch the state
764            elif tok.iscode('WORD') and tok.equals('object'):
765                self.set_parser('object')   # switch the state
766            elif self.__parser is self.__general:
767                # Either a level-2 parser screwed up or some level-1
768                # construct is not implemented.  (Note: this elif can
769                # be only reached at the beginning or after comments;
770                # later we never formally switch back to __general
771                # (would create inifinite loop)
772                raise DXParseError('Unknown level-1 construct at '+str(tok))
773
774            self.apply_parser()     # hand over to new parser
775                                    # (possibly been set further down the hierarchy!)
776
777    # Level-1 parser
778    def __comment(self):
779        """Level-1 parser for comments.
780
781        pattern: #.*
782        Append comment (with initial '# ' stripped) to all comments.
783        """
784        tok = self.__consume()
785        self.DXfield.add_comment(tok.value())
786        self.set_parser('general')   # switch back to general parser
787
788    def __object(self):
789        """Level-1 parser for objects.
790
791        pattern: 'object' id 'class' type ...
792
793        id ::=   integer|string|'"'white space string'"'
794        type ::= string
795        """
796        self.__consume()                    # 'object'
797        classid = self.__consume().text
798        word = self.__consume().text
799        if word != "class":
800            raise DXParseError("reserved word %s should have been 'class'." % word)
801        # save previous DXInitObject
802        if self.currentobject:
803            self.objects.append(self.currentobject)
804        # setup new DXInitObject
805        classtype = self.__consume().text
806        self.currentobject = DXInitObject(classtype=classtype,classid=classid)
807
808        self.use_parser(classtype)
809
810    # Level-2 parser (object parsers)
811    def __gridpositions(self):
812        """Level-2 parser for gridpositions.
813
814        pattern:
815        object 1 class gridpositions counts 97 93 99
816        origin -46.5 -45.5 -48.5
817        delta 1 0 0
818        delta 0 1 0
819        delta 0 0 1
820        """
821        try:
822            tok = self.__consume()
823        except DXParserNoTokens:
824            return
825
826        if tok.equals('counts'):
827            shape = []
828            try:
829                while True:
830                    # raises exception if not an int
831                    self.__peek().value('INTEGER')
832                    tok = self.__consume()
833                    shape.append(tok.value('INTEGER'))
834            except (DXParserNoTokens, ValueError):
835                pass
836            if len(shape) == 0:
837                raise DXParseError('gridpositions: no shape parameters')
838            self.currentobject['shape'] = shape
839        elif tok.equals('origin'):
840            origin = []
841            try:
842                while (self.__peek().iscode('INTEGER') or
843                       self.__peek().iscode('REAL')):
844                    tok = self.__consume()
845                    origin.append(tok.value())
846            except DXParserNoTokens:
847                pass
848            if len(origin) == 0:
849                raise DXParseError('gridpositions: no origin parameters')
850            self.currentobject['origin'] = origin
851        elif tok.equals('delta'):
852            d = []
853            try:
854                while (self.__peek().iscode('INTEGER') or
855                       self.__peek().iscode('REAL')):
856                    tok = self.__consume()
857                    d.append(tok.value())
858            except DXParserNoTokens:
859                pass
860            if len(d) == 0:
861                raise DXParseError('gridpositions: missing delta parameters')
862            try:
863                self.currentobject['delta'].append(d)
864            except KeyError:
865                self.currentobject['delta'] = [d]
866        else:
867            raise DXParseError('gridpositions: '+str(tok)+' not recognized.')
868
869
870    def __gridconnections(self):
871        """Level-2 parser for gridconnections.
872
873        pattern:
874        object 2 class gridconnections counts 97 93 99
875        """
876        try:
877            tok = self.__consume()
878        except DXParserNoTokens:
879            return
880
881        if tok.equals('counts'):
882            shape = []
883            try:
884                while True:
885                    # raises exception if not an int
886                    self.__peek().value('INTEGER')
887                    tok = self.__consume()
888                    shape.append(tok.value('INTEGER'))
889            except (DXParserNoTokens, ValueError):
890                pass
891            if len(shape) == 0:
892                raise DXParseError('gridconnections: no shape parameters')
893            self.currentobject['shape'] = shape
894        else:
895            raise DXParseError('gridconnections: '+str(tok)+' not recognized.')
896
897
898    def __array(self):
899        """Level-2 parser for arrays.
900
901        pattern:
902        object 3 class array type double rank 0 items 12 data follows
903        0 2 0
904        0 0 3.6
905        0 -2.0 1e-12
906        +4.534e+01 .34534 0.43654
907        attribute "dep" string "positions"
908        """
909        try:
910            tok = self.__consume()
911        except DXParserNoTokens:
912            return
913
914        if tok.equals('type'):
915            tok = self.__consume()
916            if not tok.iscode('STRING'):
917                raise DXParseError('array: type was "%s", not a string.'%\
918                                   tok.text)
919            self.currentobject['type'] = tok.value()
920        elif tok.equals('rank'):
921            tok = self.__consume()
922            try:
923                self.currentobject['rank'] = tok.value('INTEGER')
924            except ValueError:
925                raise DXParseError('array: rank was "%s", not an integer.'%\
926                                   tok.text)
927        elif tok.equals('items'):
928            tok = self.__consume()
929            try:
930                self.currentobject['size'] = tok.value('INTEGER')
931            except ValueError:
932                raise DXParseError('array: items was "%s", not an integer.'%\
933                                   tok.text)
934        elif tok.equals('data'):
935            tok = self.__consume()
936            if not tok.iscode('STRING'):
937                raise DXParseError('array: data was "%s", not a string.'%\
938                                   tok.text)
939            if tok.text != 'follows':
940                raise NotImplementedError(\
941                            'array: Only the "data follows header" format is supported.')
942            if not self.currentobject['size']:
943                raise DXParseError("array: missing number of items")
944            # This is the slow part.  Once we get here, we are just
945            # reading in a long list of numbers.  Conversion to floats
946            # will be done later when the numpy array is created.
947
948            # Don't assume anything about whitespace or the number of elements per row
949            self.currentobject['array'] = []
950            while len(self.currentobject['array']) <self.currentobject['size']:
951                 self.currentobject['array'].extend(self.dxfile.readline().strip().split())
952
953            # If you assume that there are three elements per row
954            # (except the last) the following version works and is a little faster.
955            # for i in range(int(numpy.ceil(self.currentobject['size']/3))):
956            #     self.currentobject['array'].append(self.dxfile.readline())
957            # self.currentobject['array'] = ' '.join(self.currentobject['array']).split()
958        elif tok.equals('attribute'):
959            # not used at the moment
960            attribute = self.__consume().value()
961            if not self.__consume().equals('string'):
962                raise DXParseError('array: "string" expected.')
963            value = self.__consume().value()
964        else:
965            raise DXParseError('array: '+str(tok)+' not recognized.')
966
967    def __field(self):
968        """Level-2 parser for a DX field object.
969
970        pattern:
971        object "site map 1" class field
972        component "positions" value 1
973        component "connections" value 2
974        component "data" value 3
975        """
976        try:
977            tok = self.__consume()
978        except DXParserNoTokens:
979            return
980
981        if tok.equals('component'):
982            component = self.__consume().value()
983            if not self.__consume().equals('value'):
984                raise DXParseError('field: "value" expected')
985            classid = self.__consume().value()
986            try:
987                self.currentobject['components'][component] = classid
988            except KeyError:
989                self.currentobject['components'] = {component:classid}
990        else:
991            raise DXParseError('field: '+str(tok)+' not recognized.')
992
993    # parser routines independent of the dx classes
994    # (with ideas from MDAnalysis.Selection and
995    # http://effbot.org/zone/xml-scanner.htm)
996
997    def use_parser(self,parsername):
998        """Set parsername as the current parser and apply it."""
999        self.__parser = self.parsers[parsername]
1000        self.__parser()
1001    def set_parser(self,parsername):
1002        """Set parsername as the current parser."""
1003        self.__parser = self.parsers[parsername]
1004    def apply_parser(self):
1005        """Apply the current parser to the token stream."""
1006        self.__parser()
1007
1008    def __tokenize(self,string):
1009        """Split s into tokens and update the token buffer.
1010
1011        __tokenize(string)
1012
1013        New tokens are appended to the token buffer, discarding white
1014        space.  Based on http://effbot.org/zone/xml-scanner.htm
1015        """
1016        for m in self.dx_regex.finditer(string.strip()):
1017            code = m.lastgroup
1018            text = m.group(m.lastgroup)
1019            tok = Token(code,text)
1020            if not tok.iscode('WHITESPACE'):
1021                 self.tokens.append(tok)
1022                 # print "DEBUG tokenize: "+str(tok)
1023
1024    def __refill_tokenbuffer(self):
1025        """Add a new tokenized line from the file to the token buffer.
1026
1027        __refill_tokenbuffer()
1028
1029        Only reads a new line if the buffer is empty. It is safe to
1030        call it repeatedly.
1031
1032        At end of file, method returns empty strings and it is up to
1033        __peek and __consume to flag the end of the stream.
1034        """
1035        if len(self.tokens) == 0:
1036            self.__tokenize(self.dxfile.readline())
1037
1038    def __peek(self):
1039        self.__refill_tokenbuffer()
1040        try:
1041            return self.tokens[0]
1042        except IndexError:
1043            raise DXParserNoTokens
1044
1045    def __consume(self,):
1046        """Get the next token from the buffer and remove it/them.
1047
1048        try:
1049          while 1:
1050             token = __consume()
1051        except DXParserNoTokens:
1052          pass
1053        """
1054        self.__refill_tokenbuffer()
1055        #print "DEBUG consume: "+str(self.__parser)+' '+str(self.__peek())
1056        try:
1057            return self.tokens.pop(0)  # singlet
1058        except IndexError:
1059            raise DXParserNoTokens
1060