1# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*- 2# vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 3# 4# MDAnalysis --- https://www.mdanalysis.org 5# Copyright (c) 2006-2017 The MDAnalysis Development Team and contributors 6# (see the file AUTHORS for the full list of names) 7# 8# Released under the GNU Public Licence, v2 or any higher version 9# 10# Please cite your use of MDAnalysis in published work: 11# 12# R. J. Gowers, M. Linke, J. Barnoud, T. J. E. Reddy, M. N. Melo, S. L. Seyler, 13# D. L. Dotson, J. Domanski, S. Buchoux, I. M. Kenney, and O. Beckstein. 14# MDAnalysis: A Python package for the rapid analysis of molecular dynamics 15# simulations. In S. Benthall and S. Rostrup editors, Proceedings of the 15th 16# Python in Science Conference, pages 102-109, Austin, TX, 2016. SciPy. 17# 18# N. Michaud-Agrawal, E. J. Denning, T. B. Woolf, and O. Beckstein. 19# MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics Simulations. 20# J. Comput. Chem. 32 (2011), 2319--2327, doi:10.1002/jcc.21787 21# 22 23# Hydrogen Bonding Analysis 24r"""Hydrogen Bond analysis --- :mod:`MDAnalysis.analysis.hbonds.hbond_analysis` 25=========================================================================== 26 27:Author: David Caplan, Lukas Grossar, Oliver Beckstein 28:Year: 2010-2017 29:Copyright: GNU Public License v3 30 31 32Given a :class:`~MDAnalysis.core.universe.Universe` (simulation 33trajectory with 1 or more frames) measure all hydrogen bonds for each 34frame between selections 1 and 2. 35 36The :class:`HydrogenBondAnalysis` class is modeled after the `VMD 37HBONDS plugin`_. 38 39.. _`VMD HBONDS plugin`: http://www.ks.uiuc.edu/Research/vmd/plugins/hbonds/ 40 41Options: 42 - *update_selections* (``True``): update selections at each frame? 43 - *selection_1_type* ("both"): selection 1 is the: "donor", "acceptor", "both" 44 - donor-acceptor *distance* (Å): 3.0 45 - Angle *cutoff* (degrees): 120.0 46 - *forcefield* to switch between default values for different force fields 47 - *donors* and *acceptors* atom types (to add additional atom names) 48 49.. _Analysis Output: 50 51Output 52------ 53 54The results are 55 - the **identities** of donor and acceptor heavy-atoms, 56 - the **distance** between the heavy atom acceptor atom and the hydrogen atom 57 that is bonded to the heavy atom donor, 58 - the **angle** donor-hydrogen-acceptor angle (180º is linear). 59 60Hydrogen bond data are returned per frame, which is stored in 61:attr:`HydrogenBondAnalysis.timeseries` (In the following description, ``#`` 62indicates comments that are not part of the output.):: 63 64 results = [ 65 [ # frame 1 66 [ # hbond 1 67 <donor index (0-based)>, 68 <acceptor index (0-based)>, <donor string>, <acceptor string>, 69 <distance>, <angle> 70 ], 71 [ # hbond 2 72 <donor index (0-based)>, 73 <acceptor index (0-based)>, <donor string>, <acceptor string>, 74 <distance>, <angle> 75 ], 76 .... 77 ], 78 [ # frame 2 79 [ ... ], [ ... ], ... 80 ], 81 ... 82 ] 83 84Using the :meth:`HydrogenBondAnalysis.generate_table` method one can reformat 85the results as a flat "normalised" table that is easier to import into a 86database or dataframe for further processing. The table itself is a 87:class:`numpy.recarray`. 88 89.. _Detection-of-hydrogen-bonds: 90 91Detection of hydrogen bonds 92--------------------------- 93 94Hydrogen bonds are recorded based on a geometric criterion: 95 961. The distance between acceptor and hydrogen is less than or equal to 97 `distance` (default is 3 Å). 98 992. The angle between donor-hydrogen-acceptor is greater than or equal to 100 `angle` (default is 120º). 101 102The cut-off values `angle` and `distance` can be set as keywords to 103:class:`HydrogenBondAnalysis`. 104 105Donor and acceptor heavy atoms are detected from atom names. The current 106defaults are appropriate for the CHARMM27 and GLYCAM06 force fields as defined 107in Table `Default atom names for hydrogen bonding analysis`_. 108 109Hydrogen atoms bonded to a donor are searched with one of two algorithms, 110selected with the `detect_hydrogens` keyword. 111 112"distance" 113 Searches for all hydrogens (name "H*" or name "[123]H" or type "H") in the 114 same residue as the donor atom within a cut-off distance of 1.2 Å. 115 116"heuristic" 117 Looks at the next three atoms in the list of atoms following the donor and 118 selects any atom whose name matches (name "H*" or name "[123]H"). For 119 120The "distance" search is more rigorous but slower and is set as the 121default. Until release 0.7.6, only the heuristic search was implemented. 122 123.. versionchanged:: 0.7.6 124 Distance search added (see 125 :meth:`HydrogenBondAnalysis._get_bonded_hydrogens_dist`) and heuristic 126 search improved (:meth:`HydrogenBondAnalysis._get_bonded_hydrogens_list`) 127 128.. _Default atom names for hydrogen bonding analysis: 129 130.. table:: Default heavy atom names for CHARMM27 force field. 131 132 =========== ============== =========== ==================================== 133 group donor acceptor comments 134 =========== ============== =========== ==================================== 135 main chain N O, OC1, OC2 OC1, OC2 from amber99sb-ildn 136 (Gromacs) 137 water OH2, OW OH2, OW SPC, TIP3P, TIP4P (CHARMM27,Gromacs) 138 139 ARG NE, NH1, NH2 140 ASN ND2 OD1 141 ASP OD1, OD2 142 CYS SG 143 CYH SG possible false positives for CYS 144 GLN NE2 OE1 145 GLU OE1, OE2 146 HIS ND1, NE2 ND1, NE2 presence of H determines if donor 147 HSD ND1 NE2 148 HSE NE2 ND1 149 HSP ND1, NE2 150 LYS NZ 151 MET SD see e.g. [Gregoret1991]_ 152 SER OG OG 153 THR OG1 OG1 154 TRP NE1 155 TYR OH OH 156 =========== ============== =========== ==================================== 157 158.. table:: Heavy atom types for GLYCAM06 force field. 159 160 =========== =========== ================== 161 element donor acceptor 162 =========== =========== ================== 163 N N,NT,N3 N,NT 164 O OH,OW O,O2,OH,OS,OW,OY 165 S SM 166 =========== =========== ================== 167 168Donor and acceptor names for the CHARMM27 force field will also work for e.g. 169OPLS/AA (tested in Gromacs). Residue names in the table are for information 170only and are not taken into account when determining acceptors and donors. 171This can potentially lead to some ambiguity in the assignment of 172donors/acceptors for residues such as histidine or cytosine. 173 174For more information about the naming convention in GLYCAM06 have a look at the 175`Carbohydrate Naming Convention in Glycam`_. 176 177.. _`Carbohydrate Naming Convention in Glycam`: 178 http://glycam.ccrc.uga.edu/documents/FutureNomenclature.htm 179 180The lists of donor and acceptor names can be extended by providing lists of 181atom names in the `donors` and `acceptors` keywords to 182:class:`HydrogenBondAnalysis`. If the lists are entirely inappropriate 183(e.g. when analysing simulations done with a force field that uses very 184different atom names) then one should either use the value "other" for `forcefield` 185to set no default values, or derive a new class and set the default list oneself:: 186 187 class HydrogenBondAnalysis_OtherFF(HydrogenBondAnalysis): 188 DEFAULT_DONORS = {"OtherFF": tuple(set([...]))} 189 DEFAULT_ACCEPTORS = {"OtherFF": tuple(set([...]))} 190 191Then simply use the new class instead of the parent class and call it with 192`forcefield` = "OtherFF". Please also consider to contribute the list of heavy 193atom names to MDAnalysis. 194 195.. rubric:: References 196 197.. [Gregoret1991] L.M. Gregoret, S.D. Rader, R.J. Fletterick, and 198 F.E. Cohen. Hydrogen bonds involving sulfur atoms in proteins. Proteins, 199 9(2):99–107, 1991. `10.1002/prot.340090204`_. 200 201.. _`10.1002/prot.340090204`: http://dx.doi.org/10.1002/prot.340090204 202 203 204How to perform HydrogenBondAnalysis 205----------------------------------- 206 207All protein-water hydrogen bonds can be analysed with :: 208 209 import MDAnalysis 210 import MDAnalysis.analysis.hbonds 211 212 u = MDAnalysis.Universe('topology', 'trajectory') 213 h = MDAnalysis.analysis.hbonds.HydrogenBondAnalysis(u, 'protein', 'resname HOH', distance=3.0, angle=120.0) 214 h.run() 215 216.. Note:: 217 218 Due to the way :class:`HydrogenBondAnalysis` is implemented, it is 219 more efficient to have the second selection (`selection2`) be the 220 *larger* group, e.g. the water when looking at water-protein 221 H-bonds or the whole protein when looking at ligand-protein 222 interactions. 223 224 225The results are stored as the attribute 226:attr:`HydrogenBondAnalysis.timeseries`; see :ref:`Analysis Output` for the 227format and further options. 228 229A number of convenience functions are provided to process the 230:attr:`~HydrogenBondAnalysis.timeseries` according to varying criteria: 231 232:meth:`~HydrogenBondAnalysis.count_by_time` 233 time series of the number of hydrogen bonds per time step 234:meth:`~HydrogenBondAnalysis.count_by_type` 235 data structure with the frequency of each observed hydrogen bond 236:meth:`~HydrogenBondAnalysis.timesteps_by_type` 237 data structure with a list of time steps for each observed hydrogen bond 238 239For further data analysis it is convenient to process the 240:attr:`~HydrogenBondAnalysis.timeseries` data into a normalized table with the 241:meth:`~HydrogenBondAnalysis.generate_table` method, which creates a new data 242structure :attr:`HydrogenBondAnalysis.table` that contains one row for each 243observation of a hydrogen bond:: 244 245 h.generate_table() 246 247This table can then be easily turned into, e.g., a `pandas.DataFrame`_, and 248further analyzed:: 249 250 import pandas as pd 251 df = pd.DataFrame.from_records(h.table) 252 253For example, plotting a histogram of the hydrogen bond angles and lengths is as 254simple as :: 255 256 df.hist(column=["angle", "distance"]) 257 258.. _pandas.DataFrame: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html 259 260 261.. TODO: notes on selection updating 262 263 264Classes 265------- 266 267.. autoclass:: HydrogenBondAnalysis 268 :members: 269 270 .. attribute:: timesteps 271 272 List of the times of each timestep. This can be used together with 273 :attr:`~HydrogenBondAnalysis.timeseries` to find the specific time point 274 of a hydrogen bond existence, or see :attr:`~HydrogenBondAnalysis.table`. 275 276 .. attribute:: table 277 278 A normalised table of the data in 279 :attr:`HydrogenBondAnalysis.timeseries`, generated by 280 :meth:`HydrogenBondAnalysis.generate_table`. It is a 281 :class:`numpy.recarray` with the following columns: 282 283 0. "time" 284 1. "donor_index" 285 2. "acceptor_index" 286 3. "donor_resnm" 287 4. "donor_resid" 288 5. "donor_atom" 289 6. "acceptor_resnm" 290 7. "acceptor_resid" 291 8. "acceptor_atom" 292 9. "distance" 293 10. "angle" 294 295 It takes up more space than :attr:`~HydrogenBondAnalysis.timeseries` but 296 it is easier to analyze and to import into databases or dataframes. 297 298 299 .. rubric:: Example 300 301 For example, to create a `pandas.DataFrame`_ from ``h.table``:: 302 303 import pandas as pd 304 df = pd.DataFrame.from_records(h.table) 305 306 307 .. versionchanged:: 0.17.0 308 The 1-based donor and acceptor indices (*donor_idx* and 309 *acceptor_idx*) are deprecated in favor of 0-based indices. 310 311 .. automethod:: _get_bonded_hydrogens 312 313 .. automethod:: _get_bonded_hydrogens_dist 314 315 .. automethod:: _get_bonded_hydrogens_list 316 317""" 318from __future__ import division, absolute_import 319import six 320from six.moves import range, zip, map, cPickle 321 322import warnings 323import logging 324from collections import defaultdict 325 326import numpy as np 327 328from MDAnalysis import MissingDataWarning, NoDataError, SelectionError, SelectionWarning 329from .. import base 330from MDAnalysis.lib.log import ProgressMeter 331from MDAnalysis.lib.NeighborSearch import AtomNeighborSearch 332from MDAnalysis.lib import distances 333from MDAnalysis.lib.util import deprecate 334 335 336logger = logging.getLogger('MDAnalysis.analysis.hbonds') 337 338 339class HydrogenBondAnalysis(base.AnalysisBase): 340 """Perform a hydrogen bond analysis 341 342 The analysis of the trajectory is performed with the 343 :meth:`HydrogenBondAnalysis.run` method. The result is stored in 344 :attr:`HydrogenBondAnalysis.timeseries`. See 345 :meth:`~HydrogenBondAnalysis.run` for the format. 346 347 The default atom names are taken from the CHARMM 27 force field files, which 348 will also work for e.g. OPLS/AA in Gromacs, and GLYCAM06. 349 350 *Donors* (associated hydrogens are deduced from topology) 351 *CHARMM 27* 352 N of the main chain, water OH2/OW, ARG NE/NH1/NH2, ASN ND2, HIS ND1/NE2, 353 SER OG, TYR OH, CYS SG, THR OG1, GLN NE2, LYS NZ, TRP NE1 354 *GLYCAM06* 355 N,NT,N3,OH,OW 356 357 *Acceptors* 358 *CHARMM 27* 359 O of the main chain, water OH2/OW, ASN OD1, ASP OD1/OD2, CYH SG, GLN OE1, 360 GLU OE1/OE2, HIS ND1/NE2, MET SD, SER OG, THR OG1, TYR OH 361 *GLYCAM06* 362 N,NT,O,O2,OH,OS,OW,OY,P,S,SM 363 *amber99sb-ildn(Gromacs)* 364 OC1, OC2 of the main chain 365 366 See Also 367 -------- 368 :ref:`Default atom names for hydrogen bonding analysis` 369 370 371 .. versionchanged:: 0.7.6 372 DEFAULT_DONORS/ACCEPTORS is now embedded in a dict to switch between 373 default values for different force fields. 374 """ 375 376 # use tuple(set()) here so that one can just copy&paste names from the 377 # table; set() takes care for removing duplicates. At the end the 378 # DEFAULT_DONORS and DEFAULT_ACCEPTORS should simply be tuples. 379 380 #: default heavy atom names whose hydrogens are treated as *donors* 381 #: (see :ref:`Default atom names for hydrogen bonding analysis`); 382 #: use the keyword `donors` to add a list of additional donor names. 383 DEFAULT_DONORS = { 384 'CHARMM27': tuple(set([ 385 'N', 'OH2', 'OW', 'NE', 'NH1', 'NH2', 'ND2', 'SG', 'NE2', 'ND1', 'NZ', 'OG', 'OG1', 'NE1', 'OH'])), 386 'GLYCAM06': tuple(set(['N', 'NT', 'N3', 'OH', 'OW'])), 387 'other': tuple(set([]))} 388 389 #: default atom names that are treated as hydrogen *acceptors* 390 #: (see :ref:`Default atom names for hydrogen bonding analysis`); 391 #: use the keyword `acceptors` to add a list of additional acceptor names. 392 DEFAULT_ACCEPTORS = { 393 'CHARMM27': tuple(set([ 394 'O', 'OC1', 'OC2', 'OH2', 'OW', 'OD1', 'OD2', 'SG', 'OE1', 'OE1', 'OE2', 'ND1', 'NE2', 'SD', 'OG', 'OG1', 'OH'])), 395 'GLYCAM06': tuple(set(['N', 'NT', 'O', 'O2', 'OH', 'OS', 'OW', 'OY', 'SM'])), 396 'other': tuple(set([]))} 397 398 #: A :class:`collections.defaultdict` of covalent radii of common donors 399 #: (used in :meth`_get_bonded_hydrogens_list` to check if a hydrogen is 400 #: sufficiently close to its donor heavy atom). Values are stored for 401 #: N, O, P, and S. Any other heavy atoms are assumed to have hydrogens 402 #: covalently bound at a maximum distance of 1.5 Å. 403 r_cov = defaultdict(lambda: 1.5, # default value 404 N=1.31, O=1.31, P=1.58, S=1.55) 405 406 def __init__(self, universe, selection1='protein', selection2='all', selection1_type='both', 407 update_selection1=True, update_selection2=True, filter_first=True, distance_type='hydrogen', 408 distance=3.0, angle=120.0, 409 forcefield='CHARMM27', donors=None, acceptors=None, 410 debug=False, detect_hydrogens='distance', verbose=False, pbc=False, **kwargs): 411 """Set up calculation of hydrogen bonds between two selections in a universe. 412 413 The timeseries is accessible as the attribute :attr:`HydrogenBondAnalysis.timeseries`. 414 415 Some initial checks are performed. If there are no atoms selected by 416 `selection1` or `selection2` or if no donor hydrogens or acceptor atoms 417 are found then a :exc:`SelectionError` is raised for any selection that 418 does *not* update (`update_selection1` and `update_selection2` 419 keywords). For selections that are set to update, only a warning is 420 logged because it is assumed that the selection might contain atoms at 421 a later frame (e.g. for distance based selections). 422 423 If no hydrogen bonds are detected or if the initial check fails, look 424 at the log output (enable with :func:`MDAnalysis.start_logging` and set 425 `verbose` ``=True``). It is likely that the default names for donors 426 and acceptors are not suitable (especially for non-standard 427 ligands). In this case, either change the `forcefield` or use 428 customized `donors` and/or `acceptors`. 429 430 Parameters 431 ---------- 432 universe : Universe 433 Universe object 434 selection1 : str (optional) 435 Selection string for first selection ['protein'] 436 selection2 : str (optional) 437 Selection string for second selection ['all'] 438 selection1_type : {"donor", "acceptor", "both"} (optional) 439 Selection 1 can be 'donor', 'acceptor' or 'both'. Note that the 440 value for `selection1_type` automatically determines how 441 `selection2` handles donors and acceptors: If `selection1` contains 442 'both' then `selection2` will also contain 'both'. If `selection1` 443 is set to 'donor' then `selection2` is 'acceptor' (and vice versa). 444 ['both']. 445 update_selection1 : bool (optional) 446 Update selection 1 at each frame? Setting to ``False`` is recommended 447 for any static selection to increase performance. [``True``] 448 update_selection2 : bool (optional) 449 Update selection 2 at each frame? Setting to ``False`` is recommended 450 for any static selection to increase performance. [``True``] 451 filter_first : bool (optional) 452 Filter selection 2 first to only atoms 3 * `distance` away [``True``] 453 distance : float (optional) 454 Distance cutoff for hydrogen bonds; only interactions with a H-A distance 455 <= `distance` (and the appropriate D-H-A angle, see `angle`) are 456 recorded. (Note: `distance_type` can change this to the D-A distance.) [3.0] 457 angle : float (optional) 458 Angle cutoff for hydrogen bonds; an ideal H-bond has an angle of 459 180º. A hydrogen bond is only recorded if the D-H-A angle is 460 >= `angle`. The default of 120º also finds fairly non-specific 461 hydrogen interactions and a possibly better value is 150º. [120.0] 462 forcefield : {"CHARMM27", "GLYCAM06", "other"} (optional) 463 Name of the forcefield used. Switches between different 464 :attr:`~HydrogenBondAnalysis.DEFAULT_DONORS` and 465 :attr:`~HydrogenBondAnalysis.DEFAULT_ACCEPTORS` values. 466 ["CHARMM27"] 467 donors : sequence (optional) 468 Extra H donor atom types (in addition to those in 469 :attr:`~HydrogenBondAnalysis.DEFAULT_DONORS`), must be a sequence. 470 acceptors : sequence (optional) 471 Extra H acceptor atom types (in addition to those in 472 :attr:`~HydrogenBondAnalysis.DEFAULT_ACCEPTORS`), must be a sequence. 473 detect_hydrogens : {"distance", "heuristic"} (optional) 474 Determine the algorithm to find hydrogens connected to donor 475 atoms. Can be "distance" (default; finds all hydrogens in the 476 donor's residue within a cutoff of the donor) or "heuristic" 477 (looks for the next few atoms in the atom list). "distance" should 478 always give the correct answer but "heuristic" is faster, 479 especially when the donor list is updated each 480 for each frame. ["distance"] 481 distance_type : {"hydrogen", "heavy"} (optional) 482 Measure hydrogen bond lengths between donor and acceptor heavy 483 attoms ("heavy") or between donor hydrogen and acceptor heavy 484 atom ("hydrogen"). If using "heavy" then one should set the *distance* 485 cutoff to a higher value such as 3.5 Å. ["hydrogen"] 486 debug : bool (optional) 487 If set to ``True`` enables per-frame debug logging. This is disabled 488 by default because it generates a very large amount of output in 489 the log file. (Note that a logger must have been started to see 490 the output, e.g. using :func:`MDAnalysis.start_logging`.) [``False``] 491 verbose : bool (optional) 492 Toggle progress output. (Can also be given as keyword argument to 493 :meth:`run`.) 494 pbc : bool (optional) 495 Whether to consider periodic boundaries in calculations [``False``] 496 497 Raises 498 ------ 499 :exc:`SelectionError` 500 is raised for each static selection without the required 501 donors and/or acceptors. 502 503 Notes 504 ----- 505 In order to speed up processing, atoms are filtered by a coarse 506 distance criterion before a detailed hydrogen bonding analysis is 507 performed (`filter_first` = ``True``). If one of your selections is 508 e.g. the solvent then `update_selection1` (or `update_selection2`) must 509 also be ``True`` so that the list of candidate atoms is updated at each 510 step: this is now the default. 511 512 If your selections will essentially remain the same for all time steps 513 (i.e. residues are not moving farther than 3 x `distance`), for 514 instance, if neither water nor large conformational changes are 515 involved or if the optimization is disabled (`filter_first` = 516 ``False``) then you can improve performance by setting the 517 `update_selection1` and/or `update_selection2` keywords to ``False``. 518 519 520 .. versionchanged:: 0.7.6 521 New `verbose` keyword (and per-frame debug logging disabled by 522 default). 523 524 New `detect_hydrogens` keyword to switch between two different 525 algorithms to detect hydrogens bonded to donor. "distance" is a new, 526 rigorous distance search within the residue of the donor atom, 527 "heuristic" is the previous list scan (improved with an additional 528 distance check). 529 530 New `forcefield` keyword to switch between different values of 531 DEFAULT_DONORS/ACCEPTORS to accomodate different force fields. 532 Also has an option "other" for no default values. 533 534 .. versionchanged:: 0.8 535 The new default for `update_selection1` and `update_selection2` is now 536 ``True`` (see `Issue 138`_). Set to ``False`` if your selections only 537 need to be determined once (will increase performance). 538 539 .. versionchanged:: 0.9.0 540 New keyword `distance_type` to select between calculation between 541 heavy atoms or hydrogen-acceptor. It defaults to the previous 542 behavior (i.e. "hydrogen"). 543 544 .. versionchanged:: 0.11.0 545 Initial checks for selections that potentially raise :exc:`SelectionError`. 546 547 .. versionchanged:: 0.17.0 548 use 0-based indexing 549 550 .. deprecated:: 0.16 551 The previous `verbose` keyword argument was replaced by 552 `debug`. Note that the `verbose` keyword argument is now 553 consistently used to toggle progress meters throughout the library. 554 555 .. _`Issue 138`: https://github.com/MDAnalysis/mdanalysis/issues/138 556 557 """ 558 super(HydrogenBondAnalysis, self).__init__(universe.trajectory, **kwargs) 559 # per-frame debugging output? 560 self.debug = debug 561 562 self._get_bonded_hydrogens_algorithms = { 563 "distance": self._get_bonded_hydrogens_dist, # 0.7.6 default 564 "heuristic": self._get_bonded_hydrogens_list, # pre 0.7.6 565 } 566 if not detect_hydrogens in self._get_bonded_hydrogens_algorithms: 567 raise ValueError("detect_hydrogens must be one of {0!r}".format( 568 self._get_bonded_hydrogens_algorithms.keys())) 569 self.detect_hydrogens = detect_hydrogens 570 571 self.u = universe 572 self.selection1 = selection1 573 self.selection2 = selection2 574 self.selection1_type = selection1_type 575 self.update_selection1 = update_selection1 576 self.update_selection2 = update_selection2 577 self.filter_first = filter_first 578 self.distance = distance 579 self.distance_type = distance_type # note: everything except 'heavy' will give the default behavior 580 self.angle = angle 581 self.pbc = pbc and all(self.u.dimensions[:3]) 582 583 # set up the donors/acceptors lists 584 if donors is None: 585 donors = [] 586 if acceptors is None: 587 acceptors = [] 588 self.forcefield = forcefield 589 self.donors = tuple(set(self.DEFAULT_DONORS[forcefield]).union(donors)) 590 self.acceptors = tuple(set(self.DEFAULT_ACCEPTORS[forcefield]).union(acceptors)) 591 592 if not (self.selection1 and self.selection2): 593 raise ValueError('HydrogenBondAnalysis: invalid selections') 594 elif self.selection1_type not in ('both', 'donor', 'acceptor'): 595 raise ValueError('HydrogenBondAnalysis: Invalid selection type {0!s}'.format(self.selection1_type)) 596 597 self._timeseries = None # final result accessed as self.timeseries 598 self.timesteps = None # time for each frame 599 600 self.table = None # placeholder for output table 601 602 self._update_selection_1() 603 self._update_selection_2() 604 605 self._log_parameters() 606 607 if self.selection1_type == 'donor': 608 self._sanity_check(1, 'donors') 609 self._sanity_check(2, 'acceptors') 610 elif self.selection1_type == 'acceptor': 611 self._sanity_check(1, 'acceptors') 612 self._sanity_check(2, 'donors') 613 else: # both 614 self._sanity_check(1, 'donors') 615 self._sanity_check(1, 'acceptors') 616 self._sanity_check(2, 'acceptors') 617 self._sanity_check(2, 'donors') 618 logger.info("HBond analysis: initial checks passed.") 619 620 def _sanity_check(self, selection, htype): 621 """sanity check the selections 1 and 2 622 623 *selection* is 1 or 2, *htype* is "donors" or "acceptors" 624 625 If selections do not update and the required donor and acceptor 626 selections are empty then a :exc:`SelectionError` is immediately 627 raised. 628 629 If selections update dynamically then it is possible that the selection 630 will yield donors/acceptors at a later step and we only issue a 631 warning. 632 633 .. versionadded:: 0.11.0 634 """ 635 assert selection in (1, 2) 636 assert htype in ("donors", "acceptors") 637 # horrible data organization: _s1_donors, _s2_acceptors, etc, update_selection1, ... 638 atoms = getattr(self, "_s{0}_{1}".format(selection, htype)) 639 update = getattr(self, "update_selection{0}".format(selection)) 640 if not atoms: 641 errmsg = "No {1} found in selection {0}. " \ 642 "You might have to specify a custom '{1}' keyword.".format( 643 selection, htype) 644 if not update: 645 logger.error(errmsg) 646 raise SelectionError(errmsg) 647 else: 648 errmsg += " Selection will update so continuing with fingers crossed." 649 warnings.warn(errmsg, category=SelectionWarning) 650 logger.warning(errmsg) 651 652 def _log_parameters(self): 653 """Log important parameters to the logfile.""" 654 logger.info("HBond analysis: selection1 = %r (update: %r)", self.selection1, self.update_selection1) 655 logger.info("HBond analysis: selection2 = %r (update: %r)", self.selection2, self.update_selection2) 656 logger.info("HBond analysis: criterion: donor %s atom and acceptor atom distance <= %.3f A", self.distance_type, 657 self.distance) 658 logger.info("HBond analysis: criterion: angle D-H-A >= %.3f degrees", self.angle) 659 logger.info("HBond analysis: force field %s to guess donor and acceptor names", self.forcefield) 660 logger.info("HBond analysis: bonded hydrogen detection algorithm: %r", self.detect_hydrogens) 661 662 def _get_bonded_hydrogens(self, atom, **kwargs): 663 """Find hydrogens bonded to `atom`. 664 665 This method is typically not called by a user but it is documented to 666 facilitate understanding of the internals of 667 :class:`HydrogenBondAnalysis`. 668 669 Parameters 670 ---------- 671 atom : groups.Atom 672 heavy atom 673 **kwargs 674 passed through to the calculation method that was selected with 675 the `detect_hydrogens` kwarg of :class:`HydrogenBondAnalysis`. 676 677 Returns 678 ------- 679 hydrogen_atoms : AtomGroup or [] 680 list of hydrogens (can be a :class:`~MDAnalysis.core.groups.AtomGroup`) 681 or empty list ``[]`` if none were found. 682 683 See Also 684 -------- 685 :meth:`_get_bonded_hydrogens_dist` 686 :meth:`_get_bonded_hydrogens_list` 687 688 689 .. versionchanged:: 0.7.6 690 Can switch algorithm by using the `detect_hydrogens` keyword to the 691 constructor. *kwargs* can be used to supply arguments for algorithm. 692 693 """ 694 return self._get_bonded_hydrogens_algorithms[self.detect_hydrogens](atom, **kwargs) 695 696 def _get_bonded_hydrogens_dist(self, atom): 697 """Find hydrogens bonded within cutoff to `atom`. 698 699 Hydrogens are detected by either name ("H*", "[123]H*") or type ("H"); 700 this is not fool-proof as the atom type is not always a character but 701 the name pattern should catch most typical occurrences. 702 703 The distance from `atom` is calculated for all hydrogens in the residue 704 and only those within a cutoff are kept. The cutoff depends on the 705 heavy atom (more precisely, on its element, which is taken as the first 706 letter of its name ``atom.name[0]``) and is parameterized in 707 :attr:`HydrogenBondAnalysis.r_cov`. If no match is found then the 708 default of 1.5 Å is used. 709 710 711 Parameters 712 ---------- 713 atom : groups.Atom 714 heavy atom 715 716 Returns 717 ------- 718 hydrogen_atoms : AtomGroup or [] 719 list of hydrogens (can be a :class:`~MDAnalysis.core.groups.AtomGroup`) 720 or empty list ``[]`` if none were found. 721 722 Notes 723 ----- 724 The performance of this implementation could be improved once the 725 topology always contains bonded information; it currently uses the 726 selection parser with an "around" selection. 727 728 729 .. versionadded:: 0.7.6 730 731 """ 732 try: 733 return atom.residue.atoms.select_atoms( 734 "(name H* 1H* 2H* 3H* or type H) and around {0:f} name {1!s}" 735 "".format(self.r_cov[atom.name[0]], atom.name)) 736 except NoDataError: 737 return [] 738 739 def _get_bonded_hydrogens_list(self, atom, **kwargs): 740 """Find "bonded" hydrogens to the donor *atom*. 741 742 At the moment this relies on the **assumption** that the 743 hydrogens are listed directly after the heavy atom in the 744 topology. If this is not the case then this function will 745 fail. 746 747 Hydrogens are detected by name ``H*``, ``[123]H*`` and they have to be 748 within a maximum distance from the heavy atom. The cutoff distance 749 depends on the heavy atom and is parameterized in 750 :attr:`HydrogenBondAnalysis.r_cov`. 751 752 Parameters 753 ---------- 754 atom : groups.Atom 755 heavy atom 756 **kwargs 757 ignored 758 759 Returns 760 ------- 761 hydrogen_atoms : AtomGroup or [] 762 list of hydrogens (can be a :class:`~MDAnalysis.core.groups.AtomGroup`) 763 or empty list ``[]`` if none were found. 764 765 766 .. versionchanged:: 0.7.6 767 768 Added detection of ``[123]H`` and additional check that a 769 selected hydrogen is bonded to the donor atom (i.e. its 770 distance to the donor is less than the covalent radius 771 stored in :attr:`HydrogenBondAnalysis.r_cov` or the default 772 1.5 Å). 773 774 Changed name to 775 :meth:`~HydrogenBondAnalysis._get_bonded_hydrogens_list` 776 and added *kwargs* so that it can be used instead of 777 :meth:`~HydrogenBondAnalysis._get_bonded_hydrogens_dist`. 778 779 """ 780 warnings.warn("_get_bonded_hydrogens_list() (heuristic detection) does " 781 "not always find " 782 "all hydrogens; Using detect_hydrogens='distance', when " 783 "constructing the HydrogenBondAnalysis class is safer. " 784 "Removal of this feature is targeted for 1.0", 785 category=DeprecationWarning) 786 box = self.u.dimensions if self.pbc else None 787 try: 788 hydrogens = [ 789 a for a in self.u.atoms[atom.index + 1:atom.index + 4] 790 if (a.name.startswith(('H', '1H', '2H', '3H')) and 791 distances.calc_bonds(atom.position, a.position, box=box) < self.r_cov[atom.name[0]]) 792 ] 793 except IndexError: 794 hydrogens = [] # weird corner case that atom is the last one in universe 795 return hydrogens 796 797 def _update_selection_1(self): 798 self._s1 = self.u.select_atoms(self.selection1) 799 self.logger_debug("Size of selection 1: {0} atoms".format(len(self._s1))) 800 if not self._s1: 801 logger.warning("Selection 1 '{0}' did not select any atoms." 802 .format(str(self.selection1)[:80])) 803 self._s1_donors = {} 804 self._s1_donors_h = {} 805 self._s1_acceptors = {} 806 if self.selection1_type in ('donor', 'both'): 807 self._s1_donors = self._s1.select_atoms( 808 'name {0}'.format(' '.join(self.donors))) 809 self._s1_donors_h = {} 810 for i, d in enumerate(self._s1_donors): 811 tmp = self._get_bonded_hydrogens(d) 812 if tmp: 813 self._s1_donors_h[i] = tmp 814 self.logger_debug("Selection 1 donors: {0}".format(len(self._s1_donors))) 815 self.logger_debug("Selection 1 donor hydrogens: {0}".format(len(self._s1_donors_h))) 816 if self.selection1_type in ('acceptor', 'both'): 817 self._s1_acceptors = self._s1.select_atoms( 818 'name {0}'.format(' '.join(self.acceptors))) 819 self.logger_debug("Selection 1 acceptors: {0}".format(len(self._s1_acceptors))) 820 821 def _update_selection_2(self): 822 box = self.u.dimensions if self.pbc else None 823 self._s2 = self.u.select_atoms(self.selection2) 824 if self.filter_first and self._s2: 825 self.logger_debug('Size of selection 2 before filtering:' 826 ' {} atoms'.format(len(self._s2))) 827 ns_selection_2 = AtomNeighborSearch(self._s2, box) 828 self._s2 = ns_selection_2.search(self._s1, 3. * self.distance) 829 self.logger_debug('Size of selection 2: {0} atoms'.format(len(self._s2))) 830 if not self._s2: 831 logger.warning('Selection 2 "{0}" did not select any atoms.' 832 .format(str(self.selection2)[:80])) 833 self._s2_donors = {} 834 self._s2_donors_h = {} 835 self._s2_acceptors = {} 836 if not self._s2: 837 return None 838 if self.selection1_type in ('donor', 'both'): 839 self._s2_acceptors = self._s2.select_atoms( 840 'name {0}'.format(' '.join(self.acceptors))) 841 self.logger_debug("Selection 2 acceptors: {0:d}".format(len(self._s2_acceptors))) 842 if self.selection1_type in ('acceptor', 'both'): 843 self._s2_donors = self._s2.select_atoms( 844 'name {0}'.format(' '.join(self.donors))) 845 self._s2_donors_h = {} 846 for i, d in enumerate(self._s2_donors): 847 tmp = self._get_bonded_hydrogens(d) 848 if tmp: 849 self._s2_donors_h[i] = tmp 850 self.logger_debug("Selection 2 donors: {0:d}".format(len(self._s2_donors))) 851 self.logger_debug("Selection 2 donor hydrogens: {0:d}".format(len(self._s2_donors_h))) 852 853 def logger_debug(self, *args): 854 if self.debug: 855 logger.debug(*args) 856 857 def run(self, start=None, stop=None, step=None, verbose=None, **kwargs): 858 """Analyze trajectory and produce timeseries. 859 860 Stores the hydrogen bond data per frame as 861 :attr:`HydrogenBondAnalysis.timeseries` (see there for output 862 format). 863 864 Parameters 865 ---------- 866 start : int (optional) 867 starting frame-index for analysis, ``None`` is the first one, 0. 868 `start` and `stop` are 0-based frame indices and are used to slice 869 the trajectory (if supported) [``None``] 870 stop : int (optional) 871 last trajectory frame for analysis, ``None`` is the last one [``None``] 872 step : int (optional) 873 read every `step` between `start` (included) and `stop` (excluded), 874 ``None`` selects 1. [``None``] 875 verbose : bool (optional) 876 toggle progress meter output :class:`~MDAnalysis.lib.log.ProgressMeter` 877 [``True``] 878 debug : bool (optional) 879 enable detailed logging of debugging information; this can create 880 *very big* log files so it is disabled (``False``) by default; setting 881 `debug` toggles the debug status for :class:`HydrogenBondAnalysis`, 882 namely the value of :attr:`HydrogenBondAnalysis.debug`. 883 884 Other Parameters 885 ---------------- 886 remove_duplicates : bool (optional) 887 duplicate hydrogen bonds are removed from output if set to the 888 default value ``True``; normally, this should not be changed. 889 890 See Also 891 -------- 892 :meth:`HydrogenBondAnalysis.generate_table` : 893 processing the data into a different format. 894 895 896 .. versionchanged:: 0.7.6 897 Results are not returned, only stored in 898 :attr:`~HydrogenBondAnalysis.timeseries` and duplicate hydrogen bonds 899 are removed from output (can be suppressed with `remove_duplicates` = 900 ``False``) 901 902 .. versionchanged:: 0.11.0 903 Accept `quiet` keyword. Analysis will now proceed through frames even if 904 no donors or acceptors were found in a particular frame. 905 906 .. deprecated:: 0.16 907 The `quiet` keyword argument is deprecated in favor of the `verbose` 908 one. Previous use of `verbose` now corresponds to the new keyword 909 argument `debug`. 910 911 """ 912 # sets self.start/stop/step and _pm 913 self._setup_frames(self._trajectory, start, stop, step) 914 915 logger.info("HBond analysis: starting") 916 logger.debug("HBond analysis: donors %r", self.donors) 917 logger.debug("HBond analysis: acceptors %r", self.acceptors) 918 919 remove_duplicates = kwargs.pop('remove_duplicates', True) # False: old behaviour 920 if not remove_duplicates: 921 logger.warning("Hidden feature remove_duplicates=False activated: you will probably get duplicate H-bonds.") 922 923 debug = kwargs.pop('debug', None) 924 if debug is not None and debug != self.debug: 925 self.debug = debug 926 logger.debug("Toggling debug to %r", self.debug) 927 if not self.debug: 928 logger.debug("HBond analysis: For full step-by-step debugging output use debug=True") 929 930 self._timeseries = [] 931 self.timesteps = [] 932 933 pm = ProgressMeter(self.n_frames, 934 format="HBonds frame {current_step:5d}: {step:5d}/{numsteps} [{percentage:5.1f}%]\r", 935 verbose=kwargs.get('verbose', False)) 936 937 try: 938 self.u.trajectory.time 939 def _get_timestep(): 940 return self.u.trajectory.time 941 logger.debug("HBond analysis is recording time step") 942 except NotImplementedError: 943 # chained reader or xyz(?) cannot do time yet 944 def _get_timestep(): 945 return self.u.trajectory.frame 946 logger.warning("HBond analysis is recording frame number instead of time step") 947 948 logger.info("Starting analysis (frame index start=%d stop=%d, step=%d)", 949 self.start, self.stop, self.step) 950 951 for progress, ts in enumerate(self.u.trajectory[self.start:self.stop:self.step]): 952 # all bonds for this timestep 953 frame_results = [] 954 # dict of tuples (atom.index, atom.index) for quick check if 955 # we already have the bond (to avoid duplicates) 956 already_found = {} 957 958 frame = ts.frame 959 timestep = _get_timestep() 960 self.timesteps.append(timestep) 961 962 pm.echo(progress, current_step=frame) 963 self.logger_debug("Analyzing frame %(frame)d, timestep %(timestep)f ps", vars()) 964 if self.update_selection1: 965 self._update_selection_1() 966 if self.update_selection2: 967 self._update_selection_2() 968 969 box = self.u.dimensions if self.pbc else None 970 if self.selection1_type in ('donor', 'both') and self._s2_acceptors: 971 self.logger_debug("Selection 1 Donors <-> Acceptors") 972 ns_acceptors = AtomNeighborSearch(self._s2_acceptors, box) 973 for i, donor_h_set in self._s1_donors_h.items(): 974 d = self._s1_donors[i] 975 for h in donor_h_set: 976 res = ns_acceptors.search(h, self.distance) 977 for a in res: 978 angle = distances.calc_angles(d.position, 979 h.position, 980 a.position, box=box) 981 angle = np.rad2deg(angle) 982 donor_atom = h if self.distance_type != 'heavy' else d 983 dist = distances.calc_bonds(donor_atom.position, a.position, box=box) 984 if angle >= self.angle and dist <= self.distance: 985 self.logger_debug( 986 "S1-D: {0!s} <-> S2-A: {1!s} {2:f} A, {3:f} DEG".format(h.index, a.index, dist, angle)) 987 frame_results.append( 988 [h.index, a.index, 989 (h.resname, h.resid, h.name), 990 (a.resname, a.resid, a.name), 991 dist, angle]) 992 993 already_found[(h.index, a.index)] = True 994 if self.selection1_type in ('acceptor', 'both') and self._s1_acceptors: 995 self.logger_debug("Selection 1 Acceptors <-> Donors") 996 ns_acceptors = AtomNeighborSearch(self._s1_acceptors, box) 997 for i, donor_h_set in self._s2_donors_h.items(): 998 d = self._s2_donors[i] 999 for h in donor_h_set: 1000 res = ns_acceptors.search(h, self.distance) 1001 for a in res: 1002 if remove_duplicates and ( 1003 (h.index, a.index) in already_found or 1004 (a.index, h.index) in already_found): 1005 continue 1006 angle = distances.calc_angles(d.position, 1007 h.position, 1008 a.position, box=box) 1009 angle = np.rad2deg(angle) 1010 donor_atom = h if self.distance_type != 'heavy' else d 1011 dist = distances.calc_bonds(donor_atom.position, a.position, box=box) 1012 if angle >= self.angle and dist <= self.distance: 1013 self.logger_debug( 1014 "S1-A: {0!s} <-> S2-D: {1!s} {2:f} A, {3:f} DEG".format(a.index, h.index, dist, angle)) 1015 frame_results.append( 1016 [h.index, a.index, 1017 (h.resname, h.resid, h.name), 1018 (a.resname, a.resid, a.name), 1019 dist, angle]) 1020 1021 self._timeseries.append(frame_results) 1022 1023 logger.info("HBond analysis: complete; timeseries %s.timeseries", 1024 self.__class__.__name__) 1025 1026 @property 1027 def timeseries(self): 1028 """Time series of hydrogen bonds. 1029 1030 The results of the hydrogen bond analysis can be accessed as a `list` of `list` of `list`: 1031 1032 1. `timeseries[i]`: data for the i-th trajectory frame (at time 1033 `timesteps[i]`, see :attr:`timesteps`) 1034 2. `timeseries[i][j]`: j-th hydrogen bond that was detected at the i-th 1035 frame. 1036 3. ``donor_index, acceptor_index, 1037 donor_name_str, acceptor_name_str, distance, angle = 1038 timeseries[i][j]``: structure of one hydrogen bond data item 1039 1040 1041 In the following description, ``#`` indicates comments that are not 1042 part of the output:: 1043 1044 results = [ 1045 [ # frame 1 1046 [ # hbond 1 1047 <donor index (0-based)>, <acceptor index (0-based)>, 1048 <donor string>, <acceptor string>, <distance>, <angle> 1049 ], 1050 [ # hbond 2 1051 <donor index (0-based)>, <acceptor index (0-based)>, 1052 <donor string>, <acceptor string>, <distance>, <angle> 1053 ], 1054 .... 1055 ], 1056 [ # frame 2 1057 [ ... ], [ ... ], ... 1058 ], 1059 ... 1060 ] 1061 1062 The time of each step is not stored with each hydrogen bond frame but in 1063 :attr:`~HydrogenBondAnalysis.timesteps`. 1064 1065 1066 Note 1067 ---- 1068 For instance, to find an acceptor atom in :attr:`Universe.atoms` by 1069 *index* one would use ``u.atoms[acceptor_index]``. 1070 1071 The :attr:`timeseries` is a managed attribute and it is generated 1072 from the underlying data in :attr:`_timeseries` every time the 1073 attribute is accessed. It is therefore costly to call and if 1074 :attr:`timeseries` is needed repeatedly it is recommended that you 1075 assign to a variable:: 1076 1077 h = HydrogenBondAnalysis(u) 1078 h.run() 1079 timeseries = h.timeseries 1080 1081 1082 See Also 1083 -------- 1084 :attr:`table` : structured array of the data 1085 1086 1087 .. versionchanged:: 0.16.1 1088 :attr:`timeseries` has become a managed attribute and is generated from the stored 1089 :attr:`_timeseries` when needed. :attr:`_timeseries` contains the donor atom and 1090 acceptor atom specifiers as tuples `(resname, resid, atomid)` instead of strings. 1091 1092 .. versionchanged:: 0.17.0 1093 The 1-based indices "donor_idx" and "acceptor_idx" are being 1094 removed in favor of the 0-based indices "donor_index" and 1095 "acceptor_index". 1096 1097 """ 1098 return [[self._reformat_hb(hb) for hb in hframe] for hframe in self._timeseries] 1099 1100 @staticmethod 1101 def _reformat_hb(hb, atomformat="{0[0]!s}{0[1]!s}:{0[2]!s}"): 1102 """Convert 0.16.1 _timeseries hbond item to 0.16.0 hbond item. 1103 1104 In 0.16.1, donor and acceptor are stored as a tuple(resname, 1105 resid, atomid). In 0.16.0 and earlier they were stored as a string. 1106 1107 .. deprecated:: 1.0 1108 This is a compatibility layer so that we can provide the same output 1109 in timeseries as before. However, for 1.0 we should make timeseries 1110 just return _timeseries, i.e., change the format of timeseries to 1111 the un-ambiguous representation provided in _timeseries. 1112 1113 """ 1114 return (hb[:2] 1115 + [atomformat.format(hb[2]), atomformat.format(hb[3])] 1116 + hb[4:]) 1117 1118 def generate_table(self): 1119 """Generate a normalised table of the results. 1120 1121 The table is stored as a :class:`numpy.recarray` in the 1122 attribute :attr:`~HydrogenBondAnalysis.table`. 1123 1124 See Also 1125 -------- 1126 HydrogenBondAnalysis.table 1127 1128 """ 1129 if self._timeseries is None: 1130 msg = "No timeseries computed, do run() first." 1131 warnings.warn(msg, category=MissingDataWarning) 1132 logger.warning(msg) 1133 return 1134 1135 num_records = np.sum([len(hframe) for hframe in self._timeseries]) 1136 # build empty output table 1137 dtype = [ 1138 ("time", float), 1139 ("donor_index", int), ("acceptor_index", int), 1140 ("donor_resnm", "|U4"), ("donor_resid", int), ("donor_atom", "|U4"), 1141 ("acceptor_resnm", "|U4"), ("acceptor_resid", int), ("acceptor_atom", "|U4"), 1142 ("distance", float), ("angle", float)] 1143 # according to Lukas' notes below, using a recarray at this stage is ineffective 1144 # and speedups of ~x10 can be achieved by filling a standard array, like this: 1145 out = np.empty((num_records,), dtype=dtype) 1146 cursor = 0 # current row 1147 for t, hframe in zip(self.timesteps, self._timeseries): 1148 for (donor_index, acceptor_index, donor, 1149 acceptor, distance, angle) in hframe: 1150 # donor|acceptor = (resname, resid, atomid) 1151 out[cursor] = (t, donor_index, acceptor_index) + \ 1152 donor + acceptor + (distance, angle) 1153 cursor += 1 1154 assert cursor == num_records, "Internal Error: Not all HB records stored" 1155 self.table = out.view(np.recarray) 1156 logger.debug("HBond: Stored results as table with %(num_records)d entries.", vars()) 1157 1158 @deprecate(release="0.19.0", remove="1.0.0", 1159 message="You can instead use ``np.save(filename, " 1160 "HydrogendBondAnalysis.table)``.") 1161 def save_table(self, filename="hbond_table.pickle"): 1162 """Saves :attr:`~HydrogenBondAnalysis.table` to a pickled file. 1163 1164 If :attr:`~HydrogenBondAnalysis.table` does not exist yet, 1165 :meth:`generate_table` is called first. 1166 1167 Parameters 1168 ---------- 1169 filename : str (optional) 1170 path to the filename 1171 1172 Example 1173 ------- 1174 Load with :: 1175 1176 import cPickle 1177 table = cPickle.load(open(filename)) 1178 1179 """ 1180 if self.table is None: 1181 self.generate_table() 1182 with open(filename, 'w') as f: 1183 cPickle.dump(self.table, f, protocol=cPickle.HIGHEST_PROTOCOL) 1184 1185 def _has_timeseries(self): 1186 has_timeseries = self._timeseries is not None 1187 if not has_timeseries: 1188 msg = "No timeseries computed, do run() first." 1189 warnings.warn(msg, category=MissingDataWarning) 1190 logger.warning(msg) 1191 return has_timeseries 1192 1193 def count_by_time(self): 1194 """Counts the number of hydrogen bonds per timestep. 1195 1196 Processes :attr:`HydrogenBondAnalysis._timeseries` into the time series 1197 ``N(t)`` where ``N`` is the total number of observed hydrogen bonds at 1198 time ``t``. 1199 1200 Returns 1201 ------- 1202 counts : numpy.recarray 1203 The resulting array can be thought of as rows ``(time, N)`` where 1204 ``time`` is the time (in ps) of the time step and ``N`` is the 1205 total number of hydrogen bonds. 1206 1207 """ 1208 if not self._has_timeseries(): 1209 return 1210 1211 out = np.empty((len(self.timesteps),), dtype=[('time', float), ('count', int)]) 1212 for cursor, time_count in enumerate(zip(self.timesteps, 1213 (len(series) for series in self._timeseries))): 1214 out[cursor] = time_count 1215 return out.view(np.recarray) 1216 1217 def count_by_type(self): 1218 """Counts the frequency of hydrogen bonds of a specific type. 1219 1220 Processes :attr:`HydrogenBondAnalysis._timeseries` and returns a 1221 :class:`numpy.recarray` containing atom indices, residue names, residue 1222 numbers (for donors and acceptors) and the fraction of the total time 1223 during which the hydrogen bond was detected. 1224 1225 Returns 1226 ------- 1227 counts : numpy.recarray 1228 Each row of the array contains data to define a unique hydrogen 1229 bond together with the frequency (fraction of the total time) that 1230 it has been observed. 1231 1232 1233 .. versionchanged:: 0.17.0 1234 The 1-based indices "donor_idx" and "acceptor_idx" are being 1235 deprecated in favor of zero-based indices. 1236 """ 1237 if not self._has_timeseries(): 1238 return 1239 1240 hbonds = defaultdict(int) 1241 for hframe in self._timeseries: 1242 for (donor_index, acceptor_index, donor, 1243 acceptor, distance, angle) in hframe: 1244 donor_resnm, donor_resid, donor_atom = donor 1245 acceptor_resnm, acceptor_resid, acceptor_atom = acceptor 1246 # generate unambigous key for current hbond \ 1247 # (the donor_heavy_atom placeholder '?' is added later) 1248 # idx_zero is redundant for an unambigous key, but included for 1249 # consistency. 1250 hb_key = ( 1251 donor_index, acceptor_index, 1252 donor_resnm, donor_resid, "?", donor_atom, 1253 acceptor_resnm, acceptor_resid, acceptor_atom) 1254 1255 hbonds[hb_key] += 1 1256 1257 # build empty output table 1258 dtype = [ 1259 ("donor_index", int), ("acceptor_index", int), ('donor_resnm', 'U4'), 1260 ('donor_resid', int), ('donor_heavy_atom', 'U4'), ('donor_atom', 'U4'), 1261 ('acceptor_resnm', 'U4'), ('acceptor_resid', int), ('acceptor_atom', 'U4'), 1262 ('frequency', float) 1263 ] 1264 out = np.empty((len(hbonds),), dtype=dtype) 1265 1266 # float because of division later 1267 tsteps = float(len(self.timesteps)) 1268 for cursor, (key, count) in enumerate(six.iteritems(hbonds)): 1269 out[cursor] = key + (count / tsteps,) 1270 1271 # return array as recarray 1272 # The recarray has not been used within the function, because accessing the 1273 # the elements of a recarray (3.65 us) is much slower then accessing those 1274 # of a ndarray (287 ns). 1275 r = out.view(np.recarray) 1276 1277 # patch in donor heavy atom names (replaces '?' in the key) 1278 h2donor = self._donor_lookup_table_byindex() 1279 r.donor_heavy_atom[:] = [h2donor[idx] for idx in r.donor_index] 1280 1281 return r 1282 1283 def timesteps_by_type(self): 1284 """Frames during which each hydrogen bond existed, sorted by hydrogen bond. 1285 1286 Processes :attr:`HydrogenBondAnalysis._timeseries` and returns a 1287 :class:`numpy.recarray` containing atom indices, residue names, residue 1288 numbers (for donors and acceptors) and each timestep at which the 1289 hydrogen bond was detected. 1290 1291 In principle, this is the same as :attr:`~HydrogenBondAnalysis.table` 1292 but sorted by hydrogen bond and with additional data for the 1293 *donor_heavy_atom* and angle and distance omitted. 1294 1295 1296 Returns 1297 ------- 1298 data : numpy.recarray 1299 1300 1301 .. versionchanged:: 0.17.0 1302 The 1-based indices "donor_idx" and "acceptor_idx" are being 1303 replaced in favor of zero-based indices. 1304 1305 """ 1306 if not self._has_timeseries(): 1307 return 1308 1309 hbonds = defaultdict(list) 1310 for (t, hframe) in zip(self.timesteps, self._timeseries): 1311 for (donor_index, acceptor_index, donor, 1312 acceptor, distance, angle) in hframe: 1313 donor_resnm, donor_resid, donor_atom = donor 1314 acceptor_resnm, acceptor_resid, acceptor_atom = acceptor 1315 # generate unambigous key for current hbond 1316 # (the donor_heavy_atom placeholder '?' is added later) 1317 # idx_zero is redundant for key but added for consistency 1318 hb_key = ( 1319 donor_index, acceptor_index, 1320 donor_resnm, donor_resid, "?", donor_atom, 1321 acceptor_resnm, acceptor_resid, acceptor_atom) 1322 hbonds[hb_key].append(t) 1323 1324 out_nrows = 0 1325 # count number of timesteps per key to get length of output table 1326 for ts_list in six.itervalues(hbonds): 1327 out_nrows += len(ts_list) 1328 1329 # build empty output table 1330 dtype = [ 1331 ('donor_index', int), 1332 ('acceptor_index', int), ('donor_resnm', 'U4'), ('donor_resid', int), 1333 ('donor_heavy_atom', 'U4'), ('donor_atom', 'U4'),('acceptor_resnm', 'U4'), 1334 ('acceptor_resid', int), ('acceptor_atom', 'U4'), ('time', float)] 1335 out = np.empty((out_nrows,), dtype=dtype) 1336 1337 out_row = 0 1338 for (key, times) in six.iteritems(hbonds): 1339 for tstep in times: 1340 out[out_row] = key + (tstep,) 1341 out_row += 1 1342 1343 # return array as recarray 1344 # The recarray has not been used within the function, because accessing the 1345 # the elements of a recarray (3.65 us) is much slower then accessing those 1346 # of a ndarray (287 ns). 1347 r = out.view(np.recarray) 1348 1349 # patch in donor heavy atom names (replaces '?' in the key) 1350 h2donor = self._donor_lookup_table_byindex() 1351 r.donor_heavy_atom[:] = [h2donor[idx] for idx in r.donor_index] 1352 1353 return r 1354 1355 def _donor_lookup_table_byres(self): 1356 """Look-up table to identify the donor heavy atom from resid and hydrogen name. 1357 1358 Assumptions: 1359 * resids are unique 1360 * hydrogen atom names are unique within a residue 1361 * selections have not changed (because we are simply looking at the last content 1362 of the donors and donor hydrogen lists) 1363 1364 Donors from `selection1` and `selection2` are merged. 1365 1366 Output dictionary ``h2donor`` can be used as:: 1367 1368 heavy_atom_name = h2donor[resid][hydrogen_name] 1369 1370 """ 1371 s1d = self._s1_donors # list of donor Atom instances 1372 s1h = self._s1_donors_h # dict indexed by donor position in donor list, containg AtomGroups of H 1373 s2d = self._s2_donors 1374 s2h = self._s2_donors_h 1375 1376 def _make_dict(donors, hydrogens): 1377 # two steps so that entry for one residue can be UPDATED for multiple donors 1378 d = dict((donors[k].resid, {}) for k in range(len(donors)) if k in hydrogens) 1379 for k in range(len(donors)): 1380 if k in hydrogens: 1381 d[donors[k].resid].update(dict((atom.name, donors[k].name) for atom in hydrogens[k])) 1382 return d 1383 1384 h2donor = _make_dict(s2d, s2h) # 2 is typically the larger group 1385 # merge (in principle h2donor.update(_make_dict(s1d, s1h) should be sufficient 1386 # with our assumptions but the following should be really safe) 1387 for resid, names in _make_dict(s1d, s1h).items(): 1388 if resid in h2donor: 1389 h2donor[resid].update(names) 1390 else: 1391 h2donor[resid] = names 1392 1393 return h2donor 1394 1395 def _donor_lookup_table_byindex(self): 1396 """Look-up table to identify the donor heavy atom from hydrogen atom index. 1397 1398 Assumptions: 1399 * selections have not changed (because we are simply looking at the last content 1400 of the donors and donor hydrogen lists) 1401 1402 Donors from `selection1` and `selection2` are merged. 1403 1404 Output dictionary ``h2donor`` can be used as:: 1405 1406 heavy_atom_name = h2donor[index] 1407 1408 """ 1409 s1d = self._s1_donors # list of donor Atom instances 1410 s1h = self._s1_donors_h # dict indexed by donor position in donor list, containg AtomGroups of H 1411 s2d = self._s2_donors 1412 s2h = self._s2_donors_h 1413 1414 def _make_dict(donors, hydrogens): 1415 #return dict(flatten_1([(atom.id, donors[k].name) for atom in hydrogens[k]] for k in range(len(donors)) 1416 # if k in hydrogens)) 1417 x = [] 1418 for k in range(len(donors)): 1419 if k in hydrogens: 1420 x.extend([(atom.index, donors[k].name) for atom in hydrogens[k]]) 1421 return dict(x) 1422 1423 h2donor = _make_dict(s2d, s2h) # 2 is typically the larger group 1424 h2donor.update(_make_dict(s1d, s1h)) 1425 1426 return h2donor 1427