1# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*-
2# vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4
3#
4# MDAnalysis --- https://www.mdanalysis.org
5# Copyright (c) 2006-2017 The MDAnalysis Development Team and contributors
6# (see the file AUTHORS for the full list of names)
7#
8# Released under the GNU Public Licence, v2 or any higher version
9#
10# Please cite your use of MDAnalysis in published work:
11#
12# R. J. Gowers, M. Linke, J. Barnoud, T. J. E. Reddy, M. N. Melo, S. L. Seyler,
13# D. L. Dotson, J. Domanski, S. Buchoux, I. M. Kenney, and O. Beckstein.
14# MDAnalysis: A Python package for the rapid analysis of molecular dynamics
15# simulations. In S. Benthall and S. Rostrup editors, Proceedings of the 15th
16# Python in Science Conference, pages 102-109, Austin, TX, 2016. SciPy.
17#
18# N. Michaud-Agrawal, E. J. Denning, T. B. Woolf, and O. Beckstein.
19# MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics Simulations.
20# J. Comput. Chem. 32 (2011), 2319--2327, doi:10.1002/jcc.21787
21#
22
23# Hydrogen Bonding Analysis
24r"""Hydrogen Bond analysis --- :mod:`MDAnalysis.analysis.hbonds.hbond_analysis`
25===========================================================================
26
27:Author: David Caplan, Lukas Grossar, Oliver Beckstein
28:Year: 2010-2017
29:Copyright: GNU Public License v3
30
31
32Given a :class:`~MDAnalysis.core.universe.Universe` (simulation
33trajectory with 1 or more frames) measure all hydrogen bonds for each
34frame between selections 1 and 2.
35
36The :class:`HydrogenBondAnalysis` class is modeled after the `VMD
37HBONDS plugin`_.
38
39.. _`VMD HBONDS plugin`: http://www.ks.uiuc.edu/Research/vmd/plugins/hbonds/
40
41Options:
42  - *update_selections* (``True``): update selections at each frame?
43  - *selection_1_type* ("both"): selection 1 is the: "donor", "acceptor", "both"
44  - donor-acceptor *distance* (Å): 3.0
45  - Angle *cutoff* (degrees): 120.0
46  - *forcefield* to switch between default values for different force fields
47  - *donors* and *acceptors* atom types (to add additional atom names)
48
49.. _Analysis Output:
50
51Output
52------
53
54The results are
55  - the **identities** of donor and acceptor heavy-atoms,
56  - the **distance** between the heavy atom acceptor atom and the hydrogen atom
57    that is bonded to the heavy atom donor,
58  - the **angle** donor-hydrogen-acceptor angle (180º is linear).
59
60Hydrogen bond data are returned per frame, which is stored in
61:attr:`HydrogenBondAnalysis.timeseries` (In the following description, ``#``
62indicates comments that are not part of the output.)::
63
64    results = [
65        [ # frame 1
66           [ # hbond 1
67              <donor index (0-based)>,
68              <acceptor index (0-based)>, <donor string>, <acceptor string>,
69              <distance>, <angle>
70           ],
71           [ # hbond 2
72              <donor index (0-based)>,
73              <acceptor index (0-based)>, <donor string>, <acceptor string>,
74              <distance>, <angle>
75           ],
76           ....
77        ],
78        [ # frame 2
79          [ ... ], [ ... ], ...
80        ],
81        ...
82    ]
83
84Using the :meth:`HydrogenBondAnalysis.generate_table` method one can reformat
85the results as a flat "normalised" table that is easier to import into a
86database or dataframe for further processing. The table itself is a
87:class:`numpy.recarray`.
88
89.. _Detection-of-hydrogen-bonds:
90
91Detection of hydrogen bonds
92---------------------------
93
94Hydrogen bonds are recorded based on a geometric criterion:
95
961. The distance between acceptor and hydrogen is less than or equal to
97   `distance` (default is 3 Å).
98
992. The angle between donor-hydrogen-acceptor is greater than or equal to
100   `angle` (default is 120º).
101
102The cut-off values `angle` and `distance` can be set as keywords to
103:class:`HydrogenBondAnalysis`.
104
105Donor and acceptor heavy atoms are detected from atom names. The current
106defaults are appropriate for the CHARMM27 and GLYCAM06 force fields as defined
107in Table `Default atom names for hydrogen bonding analysis`_.
108
109Hydrogen atoms bonded to a donor are searched with one of two algorithms,
110selected with the `detect_hydrogens` keyword.
111
112"distance"
113   Searches for all hydrogens (name "H*" or name "[123]H" or type "H") in the
114   same residue as the donor atom within a cut-off distance of 1.2 Å.
115
116"heuristic"
117   Looks at the next three atoms in the list of atoms following the donor and
118   selects any atom whose name matches (name "H*" or name "[123]H"). For
119
120The "distance" search is more rigorous but slower and is set as the
121default. Until release 0.7.6, only the heuristic search was implemented.
122
123.. versionchanged:: 0.7.6
124   Distance search added (see
125   :meth:`HydrogenBondAnalysis._get_bonded_hydrogens_dist`) and heuristic
126   search improved (:meth:`HydrogenBondAnalysis._get_bonded_hydrogens_list`)
127
128.. _Default atom names for hydrogen bonding analysis:
129
130.. table:: Default heavy atom names for CHARMM27 force field.
131
132   =========== ==============  =========== ====================================
133   group       donor           acceptor    comments
134   =========== ==============  =========== ====================================
135   main chain  N               O, OC1, OC2 OC1, OC2 from amber99sb-ildn
136                                           (Gromacs)
137   water       OH2, OW         OH2, OW     SPC, TIP3P, TIP4P (CHARMM27,Gromacs)
138
139   ARG         NE, NH1, NH2
140   ASN         ND2             OD1
141   ASP                         OD1, OD2
142   CYS         SG
143   CYH                         SG          possible false positives for CYS
144   GLN         NE2             OE1
145   GLU                         OE1, OE2
146   HIS         ND1, NE2        ND1, NE2    presence of H determines if donor
147   HSD         ND1             NE2
148   HSE         NE2             ND1
149   HSP         ND1, NE2
150   LYS         NZ
151   MET                         SD          see e.g. [Gregoret1991]_
152   SER         OG              OG
153   THR         OG1             OG1
154   TRP         NE1
155   TYR         OH              OH
156   =========== ==============  =========== ====================================
157
158.. table:: Heavy atom types for GLYCAM06 force field.
159
160   =========== =========== ==================
161   element     donor       acceptor
162   =========== =========== ==================
163   N           N,NT,N3     N,NT
164   O           OH,OW       O,O2,OH,OS,OW,OY
165   S                       SM
166   =========== =========== ==================
167
168Donor and acceptor names for the CHARMM27 force field will also work for e.g.
169OPLS/AA (tested in Gromacs). Residue names in the table are for information
170only and are not taken into account when determining acceptors and donors.
171This can potentially lead to some ambiguity in the assignment of
172donors/acceptors for residues such as histidine or cytosine.
173
174For more information about the naming convention in GLYCAM06 have a look at the
175`Carbohydrate Naming Convention in Glycam`_.
176
177.. _`Carbohydrate Naming Convention in Glycam`:
178   http://glycam.ccrc.uga.edu/documents/FutureNomenclature.htm
179
180The lists of donor and acceptor names can be extended by providing lists of
181atom names in the `donors` and `acceptors` keywords to
182:class:`HydrogenBondAnalysis`. If the lists are entirely inappropriate
183(e.g. when analysing simulations done with a force field that uses very
184different atom names) then one should either use the value "other" for `forcefield`
185to set no default values, or derive a new class and set the default list oneself::
186
187 class HydrogenBondAnalysis_OtherFF(HydrogenBondAnalysis):
188       DEFAULT_DONORS = {"OtherFF": tuple(set([...]))}
189       DEFAULT_ACCEPTORS = {"OtherFF": tuple(set([...]))}
190
191Then simply use the new class instead of the parent class and call it with
192`forcefield` = "OtherFF". Please also consider to contribute the list of heavy
193atom names to MDAnalysis.
194
195.. rubric:: References
196
197.. [Gregoret1991] L.M. Gregoret, S.D. Rader, R.J. Fletterick, and
198   F.E. Cohen. Hydrogen bonds involving sulfur atoms in proteins. Proteins,
199   9(2):99–107, 1991. `10.1002/prot.340090204`_.
200
201.. _`10.1002/prot.340090204`: http://dx.doi.org/10.1002/prot.340090204
202
203
204How to perform HydrogenBondAnalysis
205-----------------------------------
206
207All protein-water hydrogen bonds can be analysed with ::
208
209  import MDAnalysis
210  import MDAnalysis.analysis.hbonds
211
212  u = MDAnalysis.Universe('topology', 'trajectory')
213  h = MDAnalysis.analysis.hbonds.HydrogenBondAnalysis(u, 'protein', 'resname HOH', distance=3.0, angle=120.0)
214  h.run()
215
216.. Note::
217
218   Due to the way :class:`HydrogenBondAnalysis` is implemented, it is
219   more efficient to have the second selection (`selection2`) be the
220   *larger* group, e.g. the water when looking at water-protein
221   H-bonds or the whole protein when looking at ligand-protein
222   interactions.
223
224
225The results are stored as the attribute
226:attr:`HydrogenBondAnalysis.timeseries`; see :ref:`Analysis Output` for the
227format and further options.
228
229A number of convenience functions are provided to process the
230:attr:`~HydrogenBondAnalysis.timeseries` according to varying criteria:
231
232:meth:`~HydrogenBondAnalysis.count_by_time`
233   time series of the number of hydrogen bonds per time step
234:meth:`~HydrogenBondAnalysis.count_by_type`
235   data structure with the frequency of each observed hydrogen bond
236:meth:`~HydrogenBondAnalysis.timesteps_by_type`
237   data structure with a list of time steps for each observed hydrogen bond
238
239For further data analysis it is convenient to process the
240:attr:`~HydrogenBondAnalysis.timeseries` data into a normalized table with the
241:meth:`~HydrogenBondAnalysis.generate_table` method, which creates a new data
242structure :attr:`HydrogenBondAnalysis.table` that contains one row for each
243observation of a hydrogen bond::
244
245  h.generate_table()
246
247This table can then be easily turned into, e.g., a `pandas.DataFrame`_, and
248further analyzed::
249
250  import pandas as pd
251  df = pd.DataFrame.from_records(h.table)
252
253For example, plotting a histogram of the hydrogen bond angles and lengths is as
254simple as ::
255
256  df.hist(column=["angle", "distance"])
257
258.. _pandas.DataFrame: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html
259
260
261.. TODO: notes on selection updating
262
263
264Classes
265-------
266
267.. autoclass:: HydrogenBondAnalysis
268   :members:
269
270   .. attribute:: timesteps
271
272      List of the times of each timestep. This can be used together with
273      :attr:`~HydrogenBondAnalysis.timeseries` to find the specific time point
274      of a hydrogen bond existence, or see :attr:`~HydrogenBondAnalysis.table`.
275
276   .. attribute:: table
277
278      A normalised table of the data in
279      :attr:`HydrogenBondAnalysis.timeseries`, generated by
280      :meth:`HydrogenBondAnalysis.generate_table`. It is a
281      :class:`numpy.recarray` with the following columns:
282
283          0. "time"
284          1. "donor_index"
285          2. "acceptor_index"
286          3. "donor_resnm"
287          4. "donor_resid"
288          5. "donor_atom"
289          6. "acceptor_resnm"
290          7. "acceptor_resid"
291          8. "acceptor_atom"
292          9. "distance"
293          10. "angle"
294
295      It takes up more space than :attr:`~HydrogenBondAnalysis.timeseries` but
296      it is easier to analyze and to import into databases or dataframes.
297
298
299      .. rubric:: Example
300
301      For example, to create a `pandas.DataFrame`_ from ``h.table``::
302
303         import pandas as pd
304         df = pd.DataFrame.from_records(h.table)
305
306
307      .. versionchanged:: 0.17.0
308         The 1-based donor and acceptor indices (*donor_idx* and
309         *acceptor_idx*) are deprecated in favor of 0-based indices.
310
311   .. automethod:: _get_bonded_hydrogens
312
313   .. automethod:: _get_bonded_hydrogens_dist
314
315   .. automethod:: _get_bonded_hydrogens_list
316
317"""
318from __future__ import division, absolute_import
319import six
320from six.moves import range, zip, map, cPickle
321
322import warnings
323import logging
324from collections import defaultdict
325
326import numpy as np
327
328from MDAnalysis import MissingDataWarning, NoDataError, SelectionError, SelectionWarning
329from .. import base
330from MDAnalysis.lib.log import ProgressMeter
331from MDAnalysis.lib.NeighborSearch import AtomNeighborSearch
332from MDAnalysis.lib import distances
333from MDAnalysis.lib.util import deprecate
334
335
336logger = logging.getLogger('MDAnalysis.analysis.hbonds')
337
338
339class HydrogenBondAnalysis(base.AnalysisBase):
340    """Perform a hydrogen bond analysis
341
342    The analysis of the trajectory is performed with the
343    :meth:`HydrogenBondAnalysis.run` method. The result is stored in
344    :attr:`HydrogenBondAnalysis.timeseries`. See
345    :meth:`~HydrogenBondAnalysis.run` for the format.
346
347    The default atom names are taken from the CHARMM 27 force field files, which
348    will also work for e.g. OPLS/AA in Gromacs, and GLYCAM06.
349
350    *Donors* (associated hydrogens are deduced from topology)
351      *CHARMM 27*
352        N of the main chain, water OH2/OW, ARG NE/NH1/NH2, ASN ND2, HIS ND1/NE2,
353        SER OG, TYR OH, CYS SG, THR OG1, GLN NE2, LYS NZ, TRP NE1
354      *GLYCAM06*
355        N,NT,N3,OH,OW
356
357    *Acceptors*
358      *CHARMM 27*
359        O of the main chain, water OH2/OW, ASN OD1, ASP OD1/OD2, CYH SG, GLN OE1,
360        GLU OE1/OE2, HIS ND1/NE2, MET SD, SER OG, THR OG1, TYR OH
361      *GLYCAM06*
362        N,NT,O,O2,OH,OS,OW,OY,P,S,SM
363      *amber99sb-ildn(Gromacs)*
364        OC1, OC2 of the main chain
365
366    See Also
367    --------
368    :ref:`Default atom names for hydrogen bonding analysis`
369
370
371    .. versionchanged:: 0.7.6
372       DEFAULT_DONORS/ACCEPTORS is now embedded in a dict to switch between
373       default values for different force fields.
374    """
375
376    # use tuple(set()) here so that one can just copy&paste names from the
377    # table; set() takes care for removing duplicates. At the end the
378    # DEFAULT_DONORS and DEFAULT_ACCEPTORS should simply be tuples.
379
380    #: default heavy atom names whose hydrogens are treated as *donors*
381    #: (see :ref:`Default atom names for hydrogen bonding analysis`);
382    #: use the keyword `donors` to add a list of additional donor names.
383    DEFAULT_DONORS = {
384        'CHARMM27': tuple(set([
385            'N', 'OH2', 'OW', 'NE', 'NH1', 'NH2', 'ND2', 'SG', 'NE2', 'ND1', 'NZ', 'OG', 'OG1', 'NE1', 'OH'])),
386        'GLYCAM06': tuple(set(['N', 'NT', 'N3', 'OH', 'OW'])),
387        'other': tuple(set([]))}
388
389    #: default atom names that are treated as hydrogen *acceptors*
390    #: (see :ref:`Default atom names for hydrogen bonding analysis`);
391    #: use the keyword `acceptors` to add a list of additional acceptor names.
392    DEFAULT_ACCEPTORS = {
393        'CHARMM27': tuple(set([
394            'O', 'OC1', 'OC2', 'OH2', 'OW', 'OD1', 'OD2', 'SG', 'OE1', 'OE1', 'OE2', 'ND1', 'NE2', 'SD', 'OG', 'OG1', 'OH'])),
395        'GLYCAM06': tuple(set(['N', 'NT', 'O', 'O2', 'OH', 'OS', 'OW', 'OY', 'SM'])),
396        'other': tuple(set([]))}
397
398    #: A :class:`collections.defaultdict` of covalent radii of common donors
399    #: (used in :meth`_get_bonded_hydrogens_list` to check if a hydrogen is
400    #: sufficiently close to its donor heavy atom). Values are stored for
401    #: N, O, P, and S. Any other heavy atoms are assumed to have hydrogens
402    #: covalently bound at a maximum distance of 1.5 Å.
403    r_cov = defaultdict(lambda: 1.5,  # default value
404                        N=1.31, O=1.31, P=1.58, S=1.55)
405
406    def __init__(self, universe, selection1='protein', selection2='all', selection1_type='both',
407                 update_selection1=True, update_selection2=True, filter_first=True, distance_type='hydrogen',
408                 distance=3.0, angle=120.0,
409                 forcefield='CHARMM27', donors=None, acceptors=None,
410                 debug=False, detect_hydrogens='distance', verbose=False, pbc=False, **kwargs):
411        """Set up calculation of hydrogen bonds between two selections in a universe.
412
413        The timeseries is accessible as the attribute :attr:`HydrogenBondAnalysis.timeseries`.
414
415        Some initial checks are performed. If there are no atoms selected by
416        `selection1` or `selection2` or if no donor hydrogens or acceptor atoms
417        are found then a :exc:`SelectionError` is raised for any selection that
418        does *not* update (`update_selection1` and `update_selection2`
419        keywords). For selections that are set to update, only a warning is
420        logged because it is assumed that the selection might contain atoms at
421        a later frame (e.g. for distance based selections).
422
423        If no hydrogen bonds are detected or if the initial check fails, look
424        at the log output (enable with :func:`MDAnalysis.start_logging` and set
425        `verbose` ``=True``). It is likely that the default names for donors
426        and acceptors are not suitable (especially for non-standard
427        ligands). In this case, either change the `forcefield` or use
428        customized `donors` and/or `acceptors`.
429
430        Parameters
431        ----------
432        universe : Universe
433            Universe object
434        selection1 : str (optional)
435            Selection string for first selection ['protein']
436        selection2 : str (optional)
437            Selection string for second selection ['all']
438        selection1_type : {"donor", "acceptor", "both"} (optional)
439            Selection 1 can be 'donor', 'acceptor' or 'both'. Note that the
440            value for `selection1_type` automatically determines how
441            `selection2` handles donors and acceptors: If `selection1` contains
442            'both' then `selection2` will also contain 'both'. If `selection1`
443            is set to 'donor' then `selection2` is 'acceptor' (and vice versa).
444            ['both'].
445        update_selection1 : bool (optional)
446            Update selection 1 at each frame? Setting to ``False`` is recommended
447            for any static selection to increase performance. [``True``]
448        update_selection2 : bool (optional)
449            Update selection 2 at each frame? Setting to ``False`` is recommended
450            for any static selection to increase performance. [``True``]
451        filter_first : bool (optional)
452            Filter selection 2 first to only atoms 3 * `distance` away [``True``]
453        distance : float (optional)
454            Distance cutoff for hydrogen bonds; only interactions with a H-A distance
455            <= `distance` (and the appropriate D-H-A angle, see `angle`) are
456            recorded. (Note: `distance_type` can change this to the D-A distance.) [3.0]
457        angle : float (optional)
458            Angle cutoff for hydrogen bonds; an ideal H-bond has an angle of
459            180º.  A hydrogen bond is only recorded if the D-H-A angle is
460            >=  `angle`. The default of 120º also finds fairly non-specific
461            hydrogen interactions and a possibly better value is 150º. [120.0]
462        forcefield : {"CHARMM27", "GLYCAM06", "other"} (optional)
463            Name of the forcefield used. Switches between different
464            :attr:`~HydrogenBondAnalysis.DEFAULT_DONORS` and
465            :attr:`~HydrogenBondAnalysis.DEFAULT_ACCEPTORS` values.
466            ["CHARMM27"]
467        donors : sequence (optional)
468            Extra H donor atom types (in addition to those in
469            :attr:`~HydrogenBondAnalysis.DEFAULT_DONORS`), must be a sequence.
470        acceptors : sequence (optional)
471            Extra H acceptor atom types (in addition to those in
472            :attr:`~HydrogenBondAnalysis.DEFAULT_ACCEPTORS`), must be a sequence.
473        detect_hydrogens : {"distance", "heuristic"} (optional)
474            Determine the algorithm to find hydrogens connected to donor
475            atoms. Can be "distance" (default; finds all hydrogens in the
476            donor's residue within a cutoff of the donor) or "heuristic"
477            (looks for the next few atoms in the atom list). "distance" should
478            always give the correct answer but "heuristic" is faster,
479            especially when the donor list is updated each
480            for each frame. ["distance"]
481        distance_type : {"hydrogen", "heavy"} (optional)
482            Measure hydrogen bond lengths between donor and acceptor heavy
483            attoms ("heavy") or between donor hydrogen and acceptor heavy
484            atom ("hydrogen"). If using "heavy" then one should set the *distance*
485            cutoff to a higher value such as 3.5 Å. ["hydrogen"]
486        debug : bool (optional)
487            If set to ``True`` enables per-frame debug logging. This is disabled
488            by default because it generates a very large amount of output in
489            the log file. (Note that a logger must have been started to see
490            the output, e.g. using :func:`MDAnalysis.start_logging`.) [``False``]
491        verbose : bool (optional)
492            Toggle progress output. (Can also be given as keyword argument to
493            :meth:`run`.)
494        pbc : bool (optional)
495            Whether to consider periodic boundaries in calculations [``False``]
496
497        Raises
498        ------
499        :exc:`SelectionError`
500            is raised for each static selection without the required
501            donors and/or acceptors.
502
503        Notes
504        -----
505        In order to speed up processing, atoms are filtered by a coarse
506        distance criterion before a detailed hydrogen bonding analysis is
507        performed (`filter_first` = ``True``). If one of your selections is
508        e.g. the solvent then `update_selection1` (or `update_selection2`) must
509        also be ``True`` so that the list of candidate atoms is updated at each
510        step: this is now the default.
511
512        If your selections will essentially remain the same for all time steps
513        (i.e. residues are not moving farther than 3 x `distance`), for
514        instance, if neither water nor large conformational changes are
515        involved or if the optimization is disabled (`filter_first` =
516        ``False``) then you can improve performance by setting the
517        `update_selection1` and/or `update_selection2` keywords to ``False``.
518
519
520        .. versionchanged:: 0.7.6
521           New `verbose` keyword (and per-frame debug logging disabled by
522           default).
523
524           New `detect_hydrogens` keyword to switch between two different
525           algorithms to detect hydrogens bonded to donor. "distance" is a new,
526           rigorous distance search within the residue of the donor atom,
527           "heuristic" is the previous list scan (improved with an additional
528           distance check).
529
530           New `forcefield` keyword to switch between different values of
531           DEFAULT_DONORS/ACCEPTORS to accomodate different force fields.
532           Also has an option "other" for no default values.
533
534        .. versionchanged:: 0.8
535           The new default for `update_selection1` and `update_selection2` is now
536           ``True`` (see `Issue 138`_). Set to ``False`` if your selections only
537           need to be determined once (will increase performance).
538
539        .. versionchanged:: 0.9.0
540           New keyword `distance_type` to select between calculation between
541           heavy atoms or hydrogen-acceptor. It defaults to the previous
542           behavior (i.e. "hydrogen").
543
544        .. versionchanged:: 0.11.0
545           Initial checks for selections that potentially raise :exc:`SelectionError`.
546
547        .. versionchanged:: 0.17.0
548           use 0-based indexing
549
550        .. deprecated:: 0.16
551           The previous `verbose` keyword argument was replaced by
552           `debug`. Note that the `verbose` keyword argument is now
553           consistently used to toggle progress meters throughout the library.
554
555        .. _`Issue 138`: https://github.com/MDAnalysis/mdanalysis/issues/138
556
557        """
558        super(HydrogenBondAnalysis, self).__init__(universe.trajectory, **kwargs)
559        # per-frame debugging output?
560        self.debug = debug
561
562        self._get_bonded_hydrogens_algorithms = {
563            "distance": self._get_bonded_hydrogens_dist,  # 0.7.6 default
564            "heuristic": self._get_bonded_hydrogens_list,  # pre 0.7.6
565        }
566        if not detect_hydrogens in self._get_bonded_hydrogens_algorithms:
567            raise ValueError("detect_hydrogens must be one of {0!r}".format(
568                             self._get_bonded_hydrogens_algorithms.keys()))
569        self.detect_hydrogens = detect_hydrogens
570
571        self.u = universe
572        self.selection1 = selection1
573        self.selection2 = selection2
574        self.selection1_type = selection1_type
575        self.update_selection1 = update_selection1
576        self.update_selection2 = update_selection2
577        self.filter_first = filter_first
578        self.distance = distance
579        self.distance_type = distance_type  # note: everything except 'heavy' will give the default behavior
580        self.angle = angle
581        self.pbc = pbc and all(self.u.dimensions[:3])
582
583        # set up the donors/acceptors lists
584        if donors is None:
585            donors = []
586        if acceptors is None:
587            acceptors = []
588        self.forcefield = forcefield
589        self.donors = tuple(set(self.DEFAULT_DONORS[forcefield]).union(donors))
590        self.acceptors = tuple(set(self.DEFAULT_ACCEPTORS[forcefield]).union(acceptors))
591
592        if not (self.selection1 and self.selection2):
593            raise ValueError('HydrogenBondAnalysis: invalid selections')
594        elif self.selection1_type not in ('both', 'donor', 'acceptor'):
595            raise ValueError('HydrogenBondAnalysis: Invalid selection type {0!s}'.format(self.selection1_type))
596
597        self._timeseries = None  # final result accessed as self.timeseries
598        self.timesteps = None  # time for each frame
599
600        self.table = None  # placeholder for output table
601
602        self._update_selection_1()
603        self._update_selection_2()
604
605        self._log_parameters()
606
607        if self.selection1_type == 'donor':
608            self._sanity_check(1, 'donors')
609            self._sanity_check(2, 'acceptors')
610        elif self.selection1_type == 'acceptor':
611            self._sanity_check(1, 'acceptors')
612            self._sanity_check(2, 'donors')
613        else:  # both
614            self._sanity_check(1, 'donors')
615            self._sanity_check(1, 'acceptors')
616            self._sanity_check(2, 'acceptors')
617            self._sanity_check(2, 'donors')
618        logger.info("HBond analysis: initial checks passed.")
619
620    def _sanity_check(self, selection, htype):
621        """sanity check the selections 1 and 2
622
623        *selection* is 1 or 2, *htype* is "donors" or "acceptors"
624
625        If selections do not update and the required donor and acceptor
626        selections are empty then a :exc:`SelectionError` is immediately
627        raised.
628
629        If selections update dynamically then it is possible that the selection
630        will yield donors/acceptors at a later step and we only issue a
631        warning.
632
633        .. versionadded:: 0.11.0
634        """
635        assert selection in (1, 2)
636        assert htype in ("donors", "acceptors")
637        # horrible data organization:  _s1_donors, _s2_acceptors, etc, update_selection1, ...
638        atoms = getattr(self, "_s{0}_{1}".format(selection, htype))
639        update = getattr(self, "update_selection{0}".format(selection))
640        if not atoms:
641            errmsg = "No {1} found in selection {0}. " \
642                "You might have to specify a custom '{1}' keyword.".format(
643                selection, htype)
644            if not update:
645                logger.error(errmsg)
646                raise SelectionError(errmsg)
647            else:
648                errmsg += " Selection will update so continuing with fingers crossed."
649                warnings.warn(errmsg, category=SelectionWarning)
650                logger.warning(errmsg)
651
652    def _log_parameters(self):
653        """Log important parameters to the logfile."""
654        logger.info("HBond analysis: selection1 = %r (update: %r)", self.selection1, self.update_selection1)
655        logger.info("HBond analysis: selection2 = %r (update: %r)", self.selection2, self.update_selection2)
656        logger.info("HBond analysis: criterion: donor %s atom and acceptor atom distance <= %.3f A", self.distance_type,
657                    self.distance)
658        logger.info("HBond analysis: criterion: angle D-H-A >= %.3f degrees", self.angle)
659        logger.info("HBond analysis: force field %s to guess donor and acceptor names", self.forcefield)
660        logger.info("HBond analysis: bonded hydrogen detection algorithm: %r", self.detect_hydrogens)
661
662    def _get_bonded_hydrogens(self, atom, **kwargs):
663        """Find hydrogens bonded to `atom`.
664
665        This method is typically not called by a user but it is documented to
666        facilitate understanding of the internals of
667        :class:`HydrogenBondAnalysis`.
668
669        Parameters
670        ----------
671        atom : groups.Atom
672             heavy atom
673        **kwargs
674             passed through to the calculation method that was selected with
675             the `detect_hydrogens` kwarg of :class:`HydrogenBondAnalysis`.
676
677        Returns
678        -------
679        hydrogen_atoms : AtomGroup or []
680            list of hydrogens (can be a :class:`~MDAnalysis.core.groups.AtomGroup`)
681            or empty list ``[]`` if none were found.
682
683        See Also
684        --------
685        :meth:`_get_bonded_hydrogens_dist`
686        :meth:`_get_bonded_hydrogens_list`
687
688
689        .. versionchanged:: 0.7.6
690           Can switch algorithm by using the `detect_hydrogens` keyword to the
691           constructor. *kwargs* can be used to supply arguments for algorithm.
692
693        """
694        return self._get_bonded_hydrogens_algorithms[self.detect_hydrogens](atom, **kwargs)
695
696    def _get_bonded_hydrogens_dist(self, atom):
697        """Find hydrogens bonded within cutoff to `atom`.
698
699        Hydrogens are detected by either name ("H*", "[123]H*") or type ("H");
700        this is not fool-proof as the atom type is not always a character but
701        the name pattern should catch most typical occurrences.
702
703        The distance from `atom` is calculated for all hydrogens in the residue
704        and only those within a cutoff are kept. The cutoff depends on the
705        heavy atom (more precisely, on its element, which is taken as the first
706        letter of its name ``atom.name[0]``) and is parameterized in
707        :attr:`HydrogenBondAnalysis.r_cov`. If no match is found then the
708        default of 1.5 Å is used.
709
710
711        Parameters
712        ----------
713        atom : groups.Atom
714             heavy atom
715
716        Returns
717        -------
718        hydrogen_atoms : AtomGroup or []
719            list of hydrogens (can be a :class:`~MDAnalysis.core.groups.AtomGroup`)
720            or empty list ``[]`` if none were found.
721
722        Notes
723        -----
724        The performance of this implementation could be improved once the
725        topology always contains bonded information; it currently uses the
726        selection parser with an "around" selection.
727
728
729        .. versionadded:: 0.7.6
730
731        """
732        try:
733            return atom.residue.atoms.select_atoms(
734                "(name H* 1H* 2H* 3H* or type H) and around {0:f} name {1!s}"
735                "".format(self.r_cov[atom.name[0]], atom.name))
736        except NoDataError:
737            return []
738
739    def _get_bonded_hydrogens_list(self, atom, **kwargs):
740        """Find "bonded" hydrogens to the donor *atom*.
741
742        At the moment this relies on the **assumption** that the
743        hydrogens are listed directly after the heavy atom in the
744        topology. If this is not the case then this function will
745        fail.
746
747        Hydrogens are detected by name ``H*``, ``[123]H*`` and they have to be
748        within a maximum distance from the heavy atom. The cutoff distance
749        depends on the heavy atom and is parameterized in
750        :attr:`HydrogenBondAnalysis.r_cov`.
751
752        Parameters
753        ----------
754        atom : groups.Atom
755             heavy atom
756        **kwargs
757             ignored
758
759        Returns
760        -------
761        hydrogen_atoms : AtomGroup or []
762            list of hydrogens (can be a :class:`~MDAnalysis.core.groups.AtomGroup`)
763            or empty list ``[]`` if none were found.
764
765
766        .. versionchanged:: 0.7.6
767
768           Added detection of ``[123]H`` and additional check that a
769           selected hydrogen is bonded to the donor atom (i.e. its
770           distance to the donor is less than the covalent radius
771           stored in :attr:`HydrogenBondAnalysis.r_cov` or the default
772           1.5 Å).
773
774           Changed name to
775           :meth:`~HydrogenBondAnalysis._get_bonded_hydrogens_list`
776           and added *kwargs* so that it can be used instead of
777           :meth:`~HydrogenBondAnalysis._get_bonded_hydrogens_dist`.
778
779        """
780        warnings.warn("_get_bonded_hydrogens_list() (heuristic detection) does "
781                      "not always find "
782                      "all hydrogens; Using detect_hydrogens='distance', when "
783                      "constructing the HydrogenBondAnalysis class is safer. "
784                      "Removal of this feature is targeted for 1.0",
785                      category=DeprecationWarning)
786        box = self.u.dimensions if self.pbc else None
787        try:
788            hydrogens = [
789                a for a in self.u.atoms[atom.index + 1:atom.index + 4]
790                if (a.name.startswith(('H', '1H', '2H', '3H')) and
791                    distances.calc_bonds(atom.position, a.position, box=box) < self.r_cov[atom.name[0]])
792                ]
793        except IndexError:
794            hydrogens = []  # weird corner case that atom is the last one in universe
795        return hydrogens
796
797    def _update_selection_1(self):
798        self._s1 = self.u.select_atoms(self.selection1)
799        self.logger_debug("Size of selection 1: {0} atoms".format(len(self._s1)))
800        if not self._s1:
801            logger.warning("Selection 1 '{0}' did not select any atoms."
802                           .format(str(self.selection1)[:80]))
803        self._s1_donors = {}
804        self._s1_donors_h = {}
805        self._s1_acceptors = {}
806        if self.selection1_type in ('donor', 'both'):
807            self._s1_donors = self._s1.select_atoms(
808                'name {0}'.format(' '.join(self.donors)))
809            self._s1_donors_h = {}
810            for i, d in enumerate(self._s1_donors):
811                tmp = self._get_bonded_hydrogens(d)
812                if tmp:
813                    self._s1_donors_h[i] = tmp
814            self.logger_debug("Selection 1 donors: {0}".format(len(self._s1_donors)))
815            self.logger_debug("Selection 1 donor hydrogens: {0}".format(len(self._s1_donors_h)))
816        if self.selection1_type in ('acceptor', 'both'):
817            self._s1_acceptors = self._s1.select_atoms(
818                'name {0}'.format(' '.join(self.acceptors)))
819            self.logger_debug("Selection 1 acceptors: {0}".format(len(self._s1_acceptors)))
820
821    def _update_selection_2(self):
822        box = self.u.dimensions if self.pbc else None
823        self._s2 = self.u.select_atoms(self.selection2)
824        if self.filter_first and self._s2:
825            self.logger_debug('Size of selection 2 before filtering:'
826                              ' {} atoms'.format(len(self._s2)))
827            ns_selection_2 = AtomNeighborSearch(self._s2, box)
828            self._s2 = ns_selection_2.search(self._s1, 3. * self.distance)
829        self.logger_debug('Size of selection 2: {0} atoms'.format(len(self._s2)))
830        if not self._s2:
831            logger.warning('Selection 2 "{0}" did not select any atoms.'
832                           .format(str(self.selection2)[:80]))
833        self._s2_donors = {}
834        self._s2_donors_h = {}
835        self._s2_acceptors = {}
836        if not self._s2:
837            return None
838        if self.selection1_type in ('donor', 'both'):
839            self._s2_acceptors = self._s2.select_atoms(
840                'name {0}'.format(' '.join(self.acceptors)))
841            self.logger_debug("Selection 2 acceptors: {0:d}".format(len(self._s2_acceptors)))
842        if self.selection1_type in ('acceptor', 'both'):
843            self._s2_donors = self._s2.select_atoms(
844                'name {0}'.format(' '.join(self.donors)))
845            self._s2_donors_h = {}
846            for i, d in enumerate(self._s2_donors):
847                tmp = self._get_bonded_hydrogens(d)
848                if tmp:
849                    self._s2_donors_h[i] = tmp
850            self.logger_debug("Selection 2 donors: {0:d}".format(len(self._s2_donors)))
851            self.logger_debug("Selection 2 donor hydrogens: {0:d}".format(len(self._s2_donors_h)))
852
853    def logger_debug(self, *args):
854        if self.debug:
855            logger.debug(*args)
856
857    def run(self, start=None, stop=None, step=None, verbose=None, **kwargs):
858        """Analyze trajectory and produce timeseries.
859
860        Stores the hydrogen bond data per frame as
861        :attr:`HydrogenBondAnalysis.timeseries` (see there for output
862        format).
863
864        Parameters
865        ----------
866        start : int (optional)
867            starting frame-index for analysis, ``None`` is the first one, 0.
868            `start` and `stop` are 0-based frame indices and are used to slice
869            the trajectory (if supported) [``None``]
870        stop : int (optional)
871            last trajectory frame for analysis, ``None`` is the last one [``None``]
872        step : int (optional)
873            read every `step` between `start` (included) and `stop` (excluded),
874            ``None`` selects 1. [``None``]
875        verbose : bool (optional)
876             toggle progress meter output :class:`~MDAnalysis.lib.log.ProgressMeter`
877             [``True``]
878        debug : bool (optional)
879             enable detailed logging of debugging information; this can create
880             *very big* log files so it is disabled (``False``) by default; setting
881             `debug` toggles the debug status for :class:`HydrogenBondAnalysis`,
882             namely the value of :attr:`HydrogenBondAnalysis.debug`.
883
884        Other Parameters
885        ----------------
886        remove_duplicates : bool (optional)
887             duplicate hydrogen bonds are removed from output if set to the
888             default value ``True``; normally, this should not be changed.
889
890        See Also
891        --------
892        :meth:`HydrogenBondAnalysis.generate_table` :
893               processing the data into a different format.
894
895
896        .. versionchanged:: 0.7.6
897           Results are not returned, only stored in
898           :attr:`~HydrogenBondAnalysis.timeseries` and duplicate hydrogen bonds
899           are removed from output (can be suppressed with `remove_duplicates` =
900           ``False``)
901
902        .. versionchanged:: 0.11.0
903           Accept `quiet` keyword. Analysis will now proceed through frames even if
904           no donors or acceptors were found in a particular frame.
905
906        .. deprecated:: 0.16
907           The `quiet` keyword argument is deprecated in favor of the `verbose`
908           one. Previous use of `verbose` now corresponds to the new keyword
909           argument `debug`.
910
911        """
912        # sets self.start/stop/step and _pm
913        self._setup_frames(self._trajectory, start, stop, step)
914
915        logger.info("HBond analysis: starting")
916        logger.debug("HBond analysis: donors    %r", self.donors)
917        logger.debug("HBond analysis: acceptors %r", self.acceptors)
918
919        remove_duplicates = kwargs.pop('remove_duplicates', True)  # False: old behaviour
920        if not remove_duplicates:
921            logger.warning("Hidden feature remove_duplicates=False activated: you will probably get duplicate H-bonds.")
922
923        debug = kwargs.pop('debug', None)
924        if debug is not None and debug != self.debug:
925            self.debug = debug
926            logger.debug("Toggling debug to %r", self.debug)
927        if not self.debug:
928            logger.debug("HBond analysis: For full step-by-step debugging output use debug=True")
929
930        self._timeseries = []
931        self.timesteps = []
932
933        pm = ProgressMeter(self.n_frames,
934                           format="HBonds frame {current_step:5d}: {step:5d}/{numsteps} [{percentage:5.1f}%]\r",
935                           verbose=kwargs.get('verbose', False))
936
937        try:
938            self.u.trajectory.time
939            def _get_timestep():
940                return self.u.trajectory.time
941            logger.debug("HBond analysis is recording time step")
942        except NotImplementedError:
943            # chained reader or xyz(?) cannot do time yet
944            def _get_timestep():
945                return self.u.trajectory.frame
946            logger.warning("HBond analysis is recording frame number instead of time step")
947
948        logger.info("Starting analysis (frame index start=%d stop=%d, step=%d)",
949                    self.start, self.stop, self.step)
950
951        for progress, ts in enumerate(self.u.trajectory[self.start:self.stop:self.step]):
952            # all bonds for this timestep
953            frame_results = []
954            # dict of tuples (atom.index, atom.index) for quick check if
955            # we already have the bond (to avoid duplicates)
956            already_found = {}
957
958            frame = ts.frame
959            timestep = _get_timestep()
960            self.timesteps.append(timestep)
961
962            pm.echo(progress, current_step=frame)
963            self.logger_debug("Analyzing frame %(frame)d, timestep %(timestep)f ps", vars())
964            if self.update_selection1:
965                self._update_selection_1()
966            if self.update_selection2:
967                self._update_selection_2()
968
969            box = self.u.dimensions if self.pbc else None
970            if self.selection1_type in ('donor', 'both') and self._s2_acceptors:
971                self.logger_debug("Selection 1 Donors <-> Acceptors")
972                ns_acceptors = AtomNeighborSearch(self._s2_acceptors, box)
973                for i, donor_h_set in self._s1_donors_h.items():
974                    d = self._s1_donors[i]
975                    for h in donor_h_set:
976                        res = ns_acceptors.search(h, self.distance)
977                        for a in res:
978                            angle = distances.calc_angles(d.position,
979                                                          h.position,
980                                                          a.position, box=box)
981                            angle = np.rad2deg(angle)
982                            donor_atom = h if self.distance_type != 'heavy' else d
983                            dist = distances.calc_bonds(donor_atom.position, a.position, box=box)
984                            if angle >= self.angle and dist <= self.distance:
985                                self.logger_debug(
986                                    "S1-D: {0!s} <-> S2-A: {1!s} {2:f} A, {3:f} DEG".format(h.index, a.index, dist, angle))
987                                frame_results.append(
988                                    [h.index, a.index,
989                                    (h.resname, h.resid, h.name),
990                                    (a.resname, a.resid, a.name),
991                                    dist, angle])
992
993                                already_found[(h.index, a.index)] = True
994            if self.selection1_type in ('acceptor', 'both') and self._s1_acceptors:
995                self.logger_debug("Selection 1 Acceptors <-> Donors")
996                ns_acceptors = AtomNeighborSearch(self._s1_acceptors, box)
997                for i, donor_h_set in self._s2_donors_h.items():
998                    d = self._s2_donors[i]
999                    for h in donor_h_set:
1000                        res = ns_acceptors.search(h, self.distance)
1001                        for a in res:
1002                            if remove_duplicates and (
1003                                    (h.index, a.index) in already_found or
1004                                    (a.index, h.index) in already_found):
1005                                continue
1006                            angle = distances.calc_angles(d.position,
1007                                                          h.position,
1008                                                          a.position, box=box)
1009                            angle = np.rad2deg(angle)
1010                            donor_atom = h if self.distance_type != 'heavy' else d
1011                            dist = distances.calc_bonds(donor_atom.position, a.position, box=box)
1012                            if angle >= self.angle and dist <= self.distance:
1013                                self.logger_debug(
1014                                    "S1-A: {0!s} <-> S2-D: {1!s} {2:f} A, {3:f} DEG".format(a.index, h.index, dist, angle))
1015                                frame_results.append(
1016                                    [h.index, a.index,
1017                                     (h.resname, h.resid, h.name),
1018                                     (a.resname, a.resid, a.name),
1019                                    dist, angle])
1020
1021            self._timeseries.append(frame_results)
1022
1023        logger.info("HBond analysis: complete; timeseries  %s.timeseries",
1024                    self.__class__.__name__)
1025
1026    @property
1027    def timeseries(self):
1028        """Time series of hydrogen bonds.
1029
1030        The results of the hydrogen bond analysis can be accessed as a `list` of `list` of `list`:
1031
1032        1. `timeseries[i]`: data for the i-th trajectory frame (at time
1033           `timesteps[i]`, see :attr:`timesteps`)
1034        2. `timeseries[i][j]`: j-th hydrogen bond that was detected at the i-th
1035           frame.
1036        3. ``donor_index, acceptor_index,
1037           donor_name_str, acceptor_name_str, distance, angle =
1038           timeseries[i][j]``: structure of one hydrogen bond data item
1039
1040
1041        In the following description, ``#`` indicates comments that are not
1042        part of the output::
1043
1044          results = [
1045              [ # frame 1
1046                 [ # hbond 1
1047                    <donor index (0-based)>, <acceptor index (0-based)>,
1048                    <donor string>, <acceptor string>, <distance>, <angle>
1049                 ],
1050                 [ # hbond 2
1051                    <donor index (0-based)>, <acceptor index (0-based)>,
1052                    <donor string>, <acceptor string>, <distance>, <angle>
1053                 ],
1054                 ....
1055              ],
1056              [ # frame 2
1057                [ ... ], [ ... ], ...
1058              ],
1059              ...
1060          ]
1061
1062        The time of each step is not stored with each hydrogen bond frame but in
1063        :attr:`~HydrogenBondAnalysis.timesteps`.
1064
1065
1066        Note
1067        ----
1068        For instance, to find an acceptor atom in :attr:`Universe.atoms` by
1069        *index* one would use ``u.atoms[acceptor_index]``.
1070
1071        The :attr:`timeseries` is a managed attribute and it is generated
1072        from the underlying data in :attr:`_timeseries` every time the
1073        attribute is accessed. It is therefore costly to call and if
1074        :attr:`timeseries` is needed repeatedly it is recommended that you
1075        assign to a variable::
1076
1077           h = HydrogenBondAnalysis(u)
1078           h.run()
1079           timeseries = h.timeseries
1080
1081
1082        See Also
1083        --------
1084        :attr:`table` : structured array of the data
1085
1086
1087        .. versionchanged:: 0.16.1
1088           :attr:`timeseries` has become a managed attribute and is generated from the stored
1089           :attr:`_timeseries` when needed. :attr:`_timeseries` contains the donor atom and
1090           acceptor atom specifiers as tuples `(resname, resid, atomid)` instead of strings.
1091
1092        .. versionchanged:: 0.17.0
1093           The 1-based indices "donor_idx" and "acceptor_idx" are being
1094           removed in favor of the 0-based indices "donor_index" and
1095           "acceptor_index".
1096
1097        """
1098        return [[self._reformat_hb(hb) for hb in hframe] for hframe in self._timeseries]
1099
1100    @staticmethod
1101    def _reformat_hb(hb, atomformat="{0[0]!s}{0[1]!s}:{0[2]!s}"):
1102        """Convert 0.16.1 _timeseries hbond item to 0.16.0 hbond item.
1103
1104        In 0.16.1, donor and acceptor are stored as a tuple(resname,
1105        resid, atomid). In 0.16.0 and earlier they were stored as a string.
1106
1107        .. deprecated:: 1.0
1108           This is a compatibility layer so that we can provide the same output
1109           in timeseries as before. However, for 1.0 we should make timeseries
1110           just return _timeseries, i.e., change the format of timeseries to
1111           the un-ambiguous representation provided in _timeseries.
1112
1113        """
1114        return (hb[:2]
1115                + [atomformat.format(hb[2]), atomformat.format(hb[3])]
1116                + hb[4:])
1117
1118    def generate_table(self):
1119        """Generate a normalised table of the results.
1120
1121        The table is stored as a :class:`numpy.recarray` in the
1122        attribute :attr:`~HydrogenBondAnalysis.table`.
1123
1124        See Also
1125        --------
1126        HydrogenBondAnalysis.table
1127
1128        """
1129        if self._timeseries is None:
1130            msg = "No timeseries computed, do run() first."
1131            warnings.warn(msg, category=MissingDataWarning)
1132            logger.warning(msg)
1133            return
1134
1135        num_records = np.sum([len(hframe) for hframe in self._timeseries])
1136        # build empty output table
1137        dtype = [
1138            ("time", float),
1139            ("donor_index", int),  ("acceptor_index", int),
1140            ("donor_resnm", "|U4"), ("donor_resid", int), ("donor_atom", "|U4"),
1141            ("acceptor_resnm", "|U4"), ("acceptor_resid", int), ("acceptor_atom", "|U4"),
1142            ("distance", float), ("angle", float)]
1143        # according to Lukas' notes below, using a recarray at this stage is ineffective
1144        # and speedups of ~x10 can be achieved by filling a standard array, like this:
1145        out = np.empty((num_records,), dtype=dtype)
1146        cursor = 0  # current row
1147        for t, hframe in zip(self.timesteps, self._timeseries):
1148            for (donor_index, acceptor_index, donor,
1149                 acceptor, distance, angle) in hframe:
1150                # donor|acceptor = (resname, resid, atomid)
1151                out[cursor] = (t, donor_index, acceptor_index) + \
1152                donor + acceptor + (distance, angle)
1153                cursor += 1
1154        assert cursor == num_records, "Internal Error: Not all HB records stored"
1155        self.table = out.view(np.recarray)
1156        logger.debug("HBond: Stored results as table with %(num_records)d entries.", vars())
1157
1158    @deprecate(release="0.19.0", remove="1.0.0",
1159               message="You can instead use ``np.save(filename, "
1160               "HydrogendBondAnalysis.table)``.")
1161    def save_table(self, filename="hbond_table.pickle"):
1162        """Saves :attr:`~HydrogenBondAnalysis.table` to a pickled file.
1163
1164        If :attr:`~HydrogenBondAnalysis.table` does not exist yet,
1165        :meth:`generate_table` is called first.
1166
1167        Parameters
1168        ----------
1169        filename : str (optional)
1170             path to the filename
1171
1172        Example
1173        -------
1174        Load with ::
1175
1176           import cPickle
1177           table = cPickle.load(open(filename))
1178
1179        """
1180        if self.table is None:
1181            self.generate_table()
1182        with open(filename, 'w') as f:
1183            cPickle.dump(self.table, f, protocol=cPickle.HIGHEST_PROTOCOL)
1184
1185    def _has_timeseries(self):
1186        has_timeseries = self._timeseries is not None
1187        if not has_timeseries:
1188            msg = "No timeseries computed, do run() first."
1189            warnings.warn(msg, category=MissingDataWarning)
1190            logger.warning(msg)
1191        return has_timeseries
1192
1193    def count_by_time(self):
1194        """Counts the number of hydrogen bonds per timestep.
1195
1196        Processes :attr:`HydrogenBondAnalysis._timeseries` into the time series
1197        ``N(t)`` where ``N`` is the total number of observed hydrogen bonds at
1198        time ``t``.
1199
1200        Returns
1201        -------
1202        counts : numpy.recarray
1203             The resulting array can be thought of as rows ``(time, N)`` where
1204             ``time`` is the time (in ps) of the time step and ``N`` is the
1205             total number of hydrogen bonds.
1206
1207        """
1208        if not self._has_timeseries():
1209            return
1210
1211        out = np.empty((len(self.timesteps),), dtype=[('time', float), ('count', int)])
1212        for cursor, time_count in enumerate(zip(self.timesteps,
1213                                               (len(series) for series in self._timeseries))):
1214            out[cursor] = time_count
1215        return out.view(np.recarray)
1216
1217    def count_by_type(self):
1218        """Counts the frequency of hydrogen bonds of a specific type.
1219
1220        Processes :attr:`HydrogenBondAnalysis._timeseries` and returns a
1221        :class:`numpy.recarray` containing atom indices, residue names, residue
1222        numbers (for donors and acceptors) and the fraction of the total time
1223        during which the hydrogen bond was detected.
1224
1225        Returns
1226        -------
1227        counts : numpy.recarray
1228             Each row of the array contains data to define a unique hydrogen
1229             bond together with the frequency (fraction of the total time) that
1230             it has been observed.
1231
1232
1233        .. versionchanged:: 0.17.0
1234           The 1-based indices "donor_idx" and "acceptor_idx" are being
1235           deprecated in favor of zero-based indices.
1236        """
1237        if not self._has_timeseries():
1238            return
1239
1240        hbonds = defaultdict(int)
1241        for hframe in self._timeseries:
1242            for (donor_index, acceptor_index, donor,
1243                 acceptor, distance, angle) in hframe:
1244                donor_resnm, donor_resid, donor_atom = donor
1245                acceptor_resnm, acceptor_resid, acceptor_atom = acceptor
1246                # generate unambigous key for current hbond \
1247                # (the donor_heavy_atom placeholder '?' is added later)
1248                # idx_zero is redundant for an unambigous key, but included for
1249                # consistency.
1250                hb_key = (
1251                    donor_index, acceptor_index,
1252                    donor_resnm, donor_resid, "?", donor_atom,
1253                    acceptor_resnm, acceptor_resid, acceptor_atom)
1254
1255                hbonds[hb_key] += 1
1256
1257        # build empty output table
1258        dtype = [
1259            ("donor_index", int), ("acceptor_index", int), ('donor_resnm', 'U4'),
1260            ('donor_resid', int), ('donor_heavy_atom', 'U4'), ('donor_atom', 'U4'),
1261            ('acceptor_resnm', 'U4'), ('acceptor_resid', int), ('acceptor_atom', 'U4'),
1262            ('frequency', float)
1263        ]
1264        out = np.empty((len(hbonds),), dtype=dtype)
1265
1266        # float because of division later
1267        tsteps = float(len(self.timesteps))
1268        for cursor, (key, count) in enumerate(six.iteritems(hbonds)):
1269            out[cursor] = key + (count / tsteps,)
1270
1271        # return array as recarray
1272        # The recarray has not been used within the function, because accessing the
1273        # the elements of a recarray (3.65 us) is much slower then accessing those
1274        # of a ndarray (287 ns).
1275        r = out.view(np.recarray)
1276
1277        # patch in donor heavy atom names (replaces '?' in the key)
1278        h2donor = self._donor_lookup_table_byindex()
1279        r.donor_heavy_atom[:] = [h2donor[idx] for idx in r.donor_index]
1280
1281        return r
1282
1283    def timesteps_by_type(self):
1284        """Frames during which each hydrogen bond existed, sorted by hydrogen bond.
1285
1286        Processes :attr:`HydrogenBondAnalysis._timeseries` and returns a
1287        :class:`numpy.recarray` containing atom indices, residue names, residue
1288        numbers (for donors and acceptors) and each timestep at which the
1289        hydrogen bond was detected.
1290
1291        In principle, this is the same as :attr:`~HydrogenBondAnalysis.table`
1292        but sorted by hydrogen bond and with additional data for the
1293        *donor_heavy_atom* and angle and distance omitted.
1294
1295
1296        Returns
1297        -------
1298        data : numpy.recarray
1299
1300
1301        .. versionchanged:: 0.17.0
1302           The 1-based indices "donor_idx" and "acceptor_idx" are being
1303           replaced in favor of zero-based indices.
1304
1305        """
1306        if not self._has_timeseries():
1307            return
1308
1309        hbonds = defaultdict(list)
1310        for (t, hframe) in zip(self.timesteps, self._timeseries):
1311            for (donor_index, acceptor_index, donor,
1312                 acceptor, distance, angle) in hframe:
1313                donor_resnm, donor_resid, donor_atom = donor
1314                acceptor_resnm, acceptor_resid, acceptor_atom = acceptor
1315                # generate unambigous key for current hbond
1316                # (the donor_heavy_atom placeholder '?' is added later)
1317                # idx_zero is redundant for key but added for consistency
1318                hb_key = (
1319                    donor_index, acceptor_index,
1320                    donor_resnm, donor_resid, "?", donor_atom,
1321                    acceptor_resnm, acceptor_resid, acceptor_atom)
1322                hbonds[hb_key].append(t)
1323
1324        out_nrows = 0
1325        # count number of timesteps per key to get length of output table
1326        for ts_list in six.itervalues(hbonds):
1327            out_nrows += len(ts_list)
1328
1329        # build empty output table
1330        dtype = [
1331            ('donor_index', int),
1332            ('acceptor_index', int), ('donor_resnm', 'U4'), ('donor_resid', int),
1333            ('donor_heavy_atom', 'U4'), ('donor_atom', 'U4'),('acceptor_resnm', 'U4'),
1334            ('acceptor_resid', int), ('acceptor_atom', 'U4'), ('time', float)]
1335        out = np.empty((out_nrows,), dtype=dtype)
1336
1337        out_row = 0
1338        for (key, times) in six.iteritems(hbonds):
1339            for tstep in times:
1340                out[out_row] = key + (tstep,)
1341                out_row += 1
1342
1343        # return array as recarray
1344        # The recarray has not been used within the function, because accessing the
1345        # the elements of a recarray (3.65 us) is much slower then accessing those
1346        # of a ndarray (287 ns).
1347        r = out.view(np.recarray)
1348
1349        # patch in donor heavy atom names (replaces '?' in the key)
1350        h2donor = self._donor_lookup_table_byindex()
1351        r.donor_heavy_atom[:] = [h2donor[idx] for idx in r.donor_index]
1352
1353        return r
1354
1355    def _donor_lookup_table_byres(self):
1356        """Look-up table to identify the donor heavy atom from resid and hydrogen name.
1357
1358        Assumptions:
1359        * resids are unique
1360        * hydrogen atom names are unique within a residue
1361        * selections have not changed (because we are simply looking at the last content
1362          of the donors and donor hydrogen lists)
1363
1364        Donors from `selection1` and `selection2` are merged.
1365
1366        Output dictionary ``h2donor`` can be used as::
1367
1368           heavy_atom_name = h2donor[resid][hydrogen_name]
1369
1370        """
1371        s1d = self._s1_donors  # list of donor Atom instances
1372        s1h = self._s1_donors_h  # dict indexed by donor position in donor list, containg AtomGroups of H
1373        s2d = self._s2_donors
1374        s2h = self._s2_donors_h
1375
1376        def _make_dict(donors, hydrogens):
1377            # two steps so that entry for one residue can be UPDATED for multiple donors
1378            d = dict((donors[k].resid, {}) for k in range(len(donors)) if k in hydrogens)
1379            for k in range(len(donors)):
1380                if k in hydrogens:
1381                    d[donors[k].resid].update(dict((atom.name, donors[k].name) for atom in hydrogens[k]))
1382            return d
1383
1384        h2donor = _make_dict(s2d, s2h)  # 2 is typically the larger group
1385        # merge (in principle h2donor.update(_make_dict(s1d, s1h) should be sufficient
1386        # with our assumptions but the following should be really safe)
1387        for resid, names in _make_dict(s1d, s1h).items():
1388            if resid in h2donor:
1389                h2donor[resid].update(names)
1390            else:
1391                h2donor[resid] = names
1392
1393        return h2donor
1394
1395    def _donor_lookup_table_byindex(self):
1396        """Look-up table to identify the donor heavy atom from hydrogen atom index.
1397
1398        Assumptions:
1399        * selections have not changed (because we are simply looking at the last content
1400          of the donors and donor hydrogen lists)
1401
1402        Donors from `selection1` and `selection2` are merged.
1403
1404        Output dictionary ``h2donor`` can be used as::
1405
1406           heavy_atom_name = h2donor[index]
1407
1408        """
1409        s1d = self._s1_donors  # list of donor Atom instances
1410        s1h = self._s1_donors_h  # dict indexed by donor position in donor list, containg AtomGroups of H
1411        s2d = self._s2_donors
1412        s2h = self._s2_donors_h
1413
1414        def _make_dict(donors, hydrogens):
1415            #return dict(flatten_1([(atom.id, donors[k].name) for atom in hydrogens[k]] for k in range(len(donors))
1416            # if k in hydrogens))
1417            x = []
1418            for k in range(len(donors)):
1419                if k in hydrogens:
1420                    x.extend([(atom.index, donors[k].name) for atom in hydrogens[k]])
1421            return dict(x)
1422
1423        h2donor = _make_dict(s2d, s2h)  # 2 is typically the larger group
1424        h2donor.update(_make_dict(s1d, s1h))
1425
1426        return h2donor
1427