1Getting Started with the RDKit in Python 2%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 3 4Important note 5************** 6 7Beginning with the 2019.03 release, the RDKit is no longer supporting Python 2. 8If you need to continue using Python 2, please stick with a release from the 2018.09 9release cycle. 10 11What is this? 12************* 13 14This document is intended to provide an overview of how one can use 15the RDKit functionality from Python. It's not comprehensive and it's 16not a manual. 17 18If you find mistakes, or have suggestions for improvements, please 19either fix them yourselves in the source document (the .rst file) or 20send them to the mailing list: rdkit-devel@lists.sourceforge.net 21In particular, if you find yourself spending time working out how to 22do something that doesn't appear to be documented please contribute by writing 23it up for this document. Contributing to the documentation is a great service 24both to the RDKit community and to your future self. 25 26Reading and Writing Molecules 27***************************** 28 29Reading single molecules 30======================== 31 32.. testsetup:: 33 34 # clean up in case these tests are running in a python process that has already 35 # imported the IPythonConsole code 36 from rdkit.Chem.Draw import IPythonConsole 37 IPythonConsole.UninstallIPythonRenderer() 38 39The majority of the basic molecular functionality is found in module :py:mod:`rdkit.Chem`: 40 41.. doctest:: 42 43 >>> from rdkit import Chem 44 45Individual molecules can be constructed using a variety of approaches: 46 47.. doctest:: 48 49 >>> m = Chem.MolFromSmiles('Cc1ccccc1') 50 >>> m = Chem.MolFromMolFile('data/input.mol') 51 >>> stringWithMolData=open('data/input.mol','r').read() 52 >>> m = Chem.MolFromMolBlock(stringWithMolData) 53 54All of these functions return a :py:class:`rdkit.Chem.rdchem.Mol` object on success: 55 56.. doctest:: 57 58 >>> m 59 <rdkit.Chem.rdchem.Mol object at 0x...> 60 61or None on failure: 62 63.. doctest:: 64 65 >>> m = Chem.MolFromMolFile('data/invalid.mol') 66 >>> m is None 67 True 68 69An attempt is made to provide sensible error messages: 70 71.. doctest:: 72 73 >>> m1 = Chem.MolFromSmiles('CO(C)C') 74 75displays a message like: ``[12:18:01] Explicit valence for atom # 1 O greater than permitted`` and 76 77.. doctest:: 78 79 >>> m2 = Chem.MolFromSmiles('c1cc1') 80 81displays something like: ``[12:20:41] Can't kekulize mol``. In each case the value ``None`` is returned: 82 83.. doctest:: 84 85 >>> m1 is None 86 True 87 >>> m2 is None 88 True 89 90 91Reading sets of molecules 92========================= 93 94Groups of molecules are read using a Supplier (for example, an :py:class:`rdkit.Chem.rdmolfiles.SDMolSupplier` or a :py:class:`rdkit.Chem.rdmolfiles.SmilesMolSupplier`): 95 96.. doctest:: 97 98 >>> suppl = Chem.SDMolSupplier('data/5ht3ligs.sdf') 99 >>> for mol in suppl: 100 ... print(mol.GetNumAtoms()) 101 ... 102 20 103 24 104 24 105 26 106 107You can easily produce lists of molecules from a Supplier: 108 109.. doctest:: 110 111 >>> mols = [x for x in suppl] 112 >>> len(mols) 113 4 114 115or just treat the Supplier itself as a random-access object: 116 117.. doctest:: 118 119 >>> suppl[0].GetNumAtoms() 120 20 121 122Two good practices when working with Suppliers are to use a context manager and 123to test each molecule to see if it was correctly read before working with it: 124 125.. doctest:: 126 127 >>> with Chem.SDMolSupplier('data/5ht3ligs.sdf') as suppl: 128 ... for mol in suppl: 129 ... if mol is None: continue 130 ... print(mol.GetNumAtoms()) 131 ... 132 20 133 24 134 24 135 26 136 137An alternate type of Supplier, the :py:class:`rdkit.Chem.rdmolfiles.ForwardSDMolSupplier` 138can be used to read from file-like objects: 139 140.. doctest:: 141 142 >>> inf = open('data/5ht3ligs.sdf','rb') 143 >>> with Chem.ForwardSDMolSupplier(inf) as fsuppl: 144 ... for mol in fsuppl: 145 ... if mol is None: continue 146 ... print(mol.GetNumAtoms()) 147 ... 148 20 149 24 150 24 151 26 152 153This means that they can be used to read from compressed files: 154 155.. doctest:: 156 157 >>> import gzip 158 >>> inf = gzip.open('data/actives_5ht3.sdf.gz') 159 >>> with Chem.ForwardSDMolSupplier(inf) as gzsuppl: 160 ... ms = [x for x in gzsuppl if x is not None] 161 >>> len(ms) 162 180 163 164Note that ForwardSDMolSuppliers cannot be used as random-access objects: 165 166.. doctest:: 167 168 >>> inf = open('data/5ht3ligs.sdf','rb') 169 >>> with Chem.ForwardSDMolSupplier(inf) as fsuppl: 170 ... fsuppl[0] 171 Traceback (most recent call last): 172 ... 173 TypeError: 'ForwardSDMolSupplier' object does not support indexing 174 175For reading Smiles or SDF files with large number of records concurrently, MultithreadedMolSuppliers can be used like this: 176 177.. doctest:: 178 179 >>> i = 0 180 >>> with Chem.MultithreadedSDMolSupplier('data/5ht3ligs.sdf') as sdSupl: 181 ... for mol in sdSupl: 182 ... if mol is not None: 183 ... i += 1 184 ... 185 >>> print(i) 186 4 187 188By default a single reader thread is used to extract records from the file and a 189single writer thread is used to process them. Note that due to multithreading 190the output may not be in the expected order. Furthermore, the 191MultithreadedSmilesMolSupplier and the MultithreadedSDMolSupplier cannot be used 192as random-access objects. 193 194 195Writing molecules 196================= 197 198Single molecules can be converted to text using several functions present in the :py:mod:`rdkit.Chem` module. 199 200For example, for SMILES: 201 202.. doctest:: 203 204 >>> m = Chem.MolFromMolFile('data/chiral.mol') 205 >>> Chem.MolToSmiles(m) 206 'C[C@H](O)c1ccccc1' 207 >>> Chem.MolToSmiles(m,isomericSmiles=False) 208 'CC(O)c1ccccc1' 209 210Note that the SMILES provided is canonical, so the output should be the same no matter how a particular molecule is input: 211 212.. doctest:: 213 214 >>> Chem.MolToSmiles(Chem.MolFromSmiles('C1=CC=CN=C1')) 215 'c1ccncc1' 216 >>> Chem.MolToSmiles(Chem.MolFromSmiles('c1cccnc1')) 217 'c1ccncc1' 218 >>> Chem.MolToSmiles(Chem.MolFromSmiles('n1ccccc1')) 219 'c1ccncc1' 220 221If you'd like to have the Kekule form of the SMILES, first Kekulize the molecule, then use the “kekuleSmiles” option: 222 223.. doctest:: 224 225 >>> Chem.Kekulize(m) 226 >>> Chem.MolToSmiles(m,kekuleSmiles=True) 227 'C[C@H](O)C1=CC=CC=C1' 228 229Note: as of this writing (Aug 2008), the smiles provided when one requests kekuleSmiles are not canonical. 230The limitation is not in the SMILES generation, but in the kekulization itself. 231 232MDL Mol blocks are also available: 233 234.. doctest:: 235 236 >>> m2 = Chem.MolFromSmiles('C1CCC1') 237 >>> print(Chem.MolToMolBlock(m2)) # doctest: +NORMALIZE_WHITESPACE 238 <BLANKLINE> 239 RDKit 2D 240 <BLANKLINE> 241 4 4 0 0 0 0 0 0 0 0999 V2000 242 1.0607 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 243 -0.0000 -1.0607 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 244 -1.0607 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 245 0.0000 1.0607 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 246 1 2 1 0 247 2 3 1 0 248 3 4 1 0 249 4 1 1 0 250 M END 251 <BLANKLINE> 252 253To include names in the mol blocks, set the molecule's “_Name” property: 254 255.. doctest:: 256 257 >>> m2.SetProp("_Name","cyclobutane") 258 >>> print(Chem.MolToMolBlock(m2)) # doctest: +NORMALIZE_WHITESPACE 259 cyclobutane 260 RDKit 2D 261 <BLANKLINE> 262 4 4 0 0 0 0 0 0 0 0999 V2000 263 1.0607 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 264 -0.0000 -1.0607 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 265 -1.0607 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 266 0.0000 1.0607 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 267 1 2 1 0 268 2 3 1 0 269 3 4 1 0 270 4 1 1 0 271 M END 272 <BLANKLINE> 273 274In order for atom or bond stereochemistry to be recognised correctly by most 275software, it's essential that the mol block have atomic coordinates. 276It's also convenient for many reasons, such as drawing the molecules. 277Generating a mol block for a molecule that does not have coordinates will, by 278default, automatically cause coordinates to be generated. These are not, 279however, stored with the molecule. 280Coordinates can be generated and stored with the molecule using functionality 281in the :py:mod:`rdkit.Chem.AllChem` module (see the `Chem vs AllChem`_ section for 282more information). 283 284You can either include 2D coordinates (i.e. a depiction): 285 286.. doctest:: 287 288 >>> from rdkit.Chem import AllChem 289 >>> AllChem.Compute2DCoords(m2) 290 0 291 >>> print(Chem.MolToMolBlock(m2)) # doctest: +NORMALIZE_WHITESPACE 292 cyclobutane 293 RDKit 2D 294 <BLANKLINE> 295 4 4 0 0 0 0 0 0 0 0999 V2000 296 1.0607 -0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 297 -0.0000 -1.0607 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 298 -1.0607 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 299 0.0000 1.0607 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 300 1 2 1 0 301 2 3 1 0 302 3 4 1 0 303 4 1 1 0 304 M END 305 <BLANKLINE> 306 307Or you can add 3D coordinates by embedding the molecule (this uses the ETKDG 308method, which is described in more detail below). Note that we add Hs to the 309molecule before generating the conformer. This is essential to get good structures: 310 311.. doctest:: 312 313 >>> m3 = Chem.AddHs(m2) 314 >>> AllChem.EmbedMolecule(m3,randomSeed=0xf00d) # optional random seed for reproducibility) 315 0 316 >>> print(Chem.MolToMolBlock(m3)) # doctest: +NORMALIZE_WHITESPACE 317 cyclobutane 318 RDKit 3D 319 <BLANKLINE> 320 12 12 0 0 0 0 0 0 0 0999 V2000 321 1.0256 0.2491 -0.0964 C 0 0 0 0 0 0 0 0 0 0 0 0 322 -0.2041 0.9236 0.4320 C 0 0 0 0 0 0 0 0 0 0 0 0 323 -1.0435 -0.2466 -0.0266 C 0 0 0 0 0 0 0 0 0 0 0 0 324 0.2104 -0.9922 -0.3417 C 0 0 0 0 0 0 0 0 0 0 0 0 325 1.4182 0.7667 -0.9782 H 0 0 0 0 0 0 0 0 0 0 0 0 326 1.8181 0.1486 0.6820 H 0 0 0 0 0 0 0 0 0 0 0 0 327 -0.1697 1.0826 1.5236 H 0 0 0 0 0 0 0 0 0 0 0 0 328 -0.5336 1.8391 -0.1051 H 0 0 0 0 0 0 0 0 0 0 0 0 329 -1.6809 -0.0600 -0.8987 H 0 0 0 0 0 0 0 0 0 0 0 0 330 -1.6501 -0.6194 0.8220 H 0 0 0 0 0 0 0 0 0 0 0 0 331 0.4659 -1.7768 0.3858 H 0 0 0 0 0 0 0 0 0 0 0 0 332 0.3439 -1.3147 -1.3988 H 0 0 0 0 0 0 0 0 0 0 0 0 333 1 2 1 0 334 2 3 1 0 335 3 4 1 0 336 4 1 1 0 337 1 5 1 0 338 1 6 1 0 339 2 7 1 0 340 2 8 1 0 341 3 9 1 0 342 3 10 1 0 343 4 11 1 0 344 4 12 1 0 345 M END 346<BLANKLINE> 347 348If we don't want the Hs in our later analysis, they are easy to remove: 349 350.. doctest:: 351 352 >>> m3 = Chem.RemoveHs(m3) 353 >>> print(Chem.MolToMolBlock(m3)) # doctest: +NORMALIZE_WHITESPACE 354 cyclobutane 355 RDKit 3D 356 <BLANKLINE> 357 4 4 0 0 0 0 0 0 0 0999 V2000 358 1.0256 0.2491 -0.0964 C 0 0 0 0 0 0 0 0 0 0 0 0 359 -0.2041 0.9236 0.4320 C 0 0 0 0 0 0 0 0 0 0 0 0 360 -1.0435 -0.2466 -0.0266 C 0 0 0 0 0 0 0 0 0 0 0 0 361 0.2104 -0.9922 -0.3417 C 0 0 0 0 0 0 0 0 0 0 0 0 362 1 2 1 0 363 2 3 1 0 364 3 4 1 0 365 4 1 1 0 366 M END 367 <BLANKLINE> 368 369If you'd like to write the molecule to a file, use Python file objects: 370 371.. doctest:: 372 373 >>> print(Chem.MolToMolBlock(m2),file=open('data/foo.mol','w+')) 374 >>> 375 376 377Writing sets of molecules 378========================= 379 380Multiple molecules can be written to a file using an :py:class:`rdkit.Chem.rdmolfiles.SDWriter` object: 381 382.. doctest:: 383 384 >>> with Chem.SDWriter('data/foo.sdf') as w: 385 ... for m in mols: 386 ... w.write(m) 387 >>> 388 389An SDWriter can also be initialized using a file-like object: 390 391.. doctest:: 392 393 >>> from rdkit.six import StringIO 394 >>> sio = StringIO() 395 >>> with Chem.SDWriter(sio) as w: 396 ... for m in mols: 397 ... w.write(m) 398 >>> print(sio.getvalue()) 399 mol-295 400 RDKit 3D 401 <BLANKLINE> 402 20 22 0 0 1 0 0 0 0 0999 V2000 403 2.3200 0.0800 -0.1000 C 0 0 0 0 0 0 0 0 0 0 0 0 404 1.8400 -1.2200 0.1200 C 0 0 0 0 0 0 0 0 0 0 0 0 405 ... 406 1 3 1 0 407 1 4 1 0 408 2 5 1 0 409 M END 410 $$$$ 411 <BLANKLINE> 412 413 414 415Other available Writers include the :py:class:`rdkit.Chem.rdmolfiles.SmilesWriter` and the :py:class:`rdkit.Chem.rdmolfiles.TDTWriter`. 416 417 418Working with Molecules 419********************** 420 421 422Looping over Atoms and Bonds 423============================ 424 425Once you have a molecule, it's easy to loop over its atoms and bonds: 426 427.. doctest:: 428 429 >>> m = Chem.MolFromSmiles('C1OC1') 430 >>> for atom in m.GetAtoms(): 431 ... print(atom.GetAtomicNum()) 432 ... 433 6 434 8 435 6 436 >>> print(m.GetBonds()[0].GetBondType()) 437 SINGLE 438 439You can also request individual bonds or atoms: 440 441.. doctest:: 442 443 >>> m.GetAtomWithIdx(0).GetSymbol() 444 'C' 445 >>> m.GetAtomWithIdx(0).GetExplicitValence() 446 2 447 >>> m.GetBondWithIdx(0).GetBeginAtomIdx() 448 0 449 >>> m.GetBondWithIdx(0).GetEndAtomIdx() 450 1 451 >>> m.GetBondBetweenAtoms(0,1).GetBondType() 452 rdkit.Chem.rdchem.BondType.SINGLE 453 454Atoms keep track of their neighbors: 455 456.. doctest:: 457 458 >>> atom = m.GetAtomWithIdx(0) 459 >>> [x.GetAtomicNum() for x in atom.GetNeighbors()] 460 [8, 6] 461 >>> len(atom.GetNeighbors()[-1].GetBonds()) 462 2 463 464 465Ring Information 466================ 467 468Atoms and bonds both carry information about the molecule's rings: 469 470.. doctest:: 471 472 >>> m = Chem.MolFromSmiles('OC1C2C1CC2') 473 >>> m.GetAtomWithIdx(0).IsInRing() 474 False 475 >>> m.GetAtomWithIdx(1).IsInRing() 476 True 477 >>> m.GetAtomWithIdx(2).IsInRingSize(3) 478 True 479 >>> m.GetAtomWithIdx(2).IsInRingSize(4) 480 True 481 >>> m.GetAtomWithIdx(2).IsInRingSize(5) 482 False 483 >>> m.GetBondWithIdx(1).IsInRingSize(3) 484 True 485 >>> m.GetBondWithIdx(1).IsInRing() 486 True 487 488But note that the information is only about the smallest rings: 489 490.. doctest:: 491 492 >>> m.GetAtomWithIdx(1).IsInRingSize(5) 493 False 494 495More detail about the smallest set of smallest rings (SSSR) is available: 496 497.. doctest:: 498 499 >>> ssr = Chem.GetSymmSSSR(m) 500 >>> len(ssr) 501 2 502 >>> list(ssr[0]) 503 [1, 2, 3] 504 >>> list(ssr[1]) 505 [4, 5, 2, 3] 506 507As the name indicates, this is a symmetrized SSSR; if you are interested in the number of “true” SSSR, use the GetSSSR function. 508 509 510.. doctest:: 511 512 >>> Chem.GetSSSR(m) 513 2 514 515The distinction between symmetrized and non-symmetrized SSSR is discussed in more detail below in the section `The SSSR Problem`_. 516 517For more efficient queries about a molecule's ring systems (avoiding repeated calls to Mol.GetAtomWithIdx), use the :py:class:`rdkit.Chem.rdchem.RingInfo` class: 518 519.. doctest:: 520 521 >>> m = Chem.MolFromSmiles('OC1C2C1CC2') 522 >>> ri = m.GetRingInfo() 523 >>> ri.NumAtomRings(0) 524 0 525 >>> ri.NumAtomRings(1) 526 1 527 >>> ri.NumAtomRings(2) 528 2 529 >>> ri.IsAtomInRingOfSize(1,3) 530 True 531 >>> ri.IsBondInRingOfSize(1,3) 532 True 533 534Modifying molecules 535=================== 536 537Normally molecules are stored in the RDKit with the hydrogen atoms implicit (e.g. not explicitly present in the molecular graph. 538When it is useful to have the hydrogens explicitly present, for example when generating or optimizing the 3D geometry, the :py:func:rdkit.Chem.rdmolops.AddHs function can be used: 539 540.. doctest:: 541 542 >>> m=Chem.MolFromSmiles('CCO') 543 >>> m.GetNumAtoms() 544 3 545 >>> m2 = Chem.AddHs(m) 546 >>> m2.GetNumAtoms() 547 9 548 549The Hs can be removed again using the :py:func:`rdkit.Chem.rdmolops.RemoveHs` function: 550 551.. doctest:: 552 553 >>> m3 = Chem.RemoveHs(m2) 554 >>> m3.GetNumAtoms() 555 3 556 557RDKit molecules are usually stored with the bonds in aromatic rings having aromatic bond types. 558This can be changed with the :py:func:`rdkit.Chem.rdmolops.Kekulize` function: 559 560.. doctest:: 561 562 >>> m = Chem.MolFromSmiles('c1ccccc1') 563 >>> m.GetBondWithIdx(0).GetBondType() 564 rdkit.Chem.rdchem.BondType.AROMATIC 565 >>> Chem.Kekulize(m) 566 >>> m.GetBondWithIdx(0).GetBondType() 567 rdkit.Chem.rdchem.BondType.DOUBLE 568 >>> m.GetBondWithIdx(1).GetBondType() 569 rdkit.Chem.rdchem.BondType.SINGLE 570 571By default, the bonds are still marked as being aromatic: 572 573.. doctest:: 574 575 >>> m.GetBondWithIdx(1).GetIsAromatic() 576 True 577 578because the flags in the original molecule are not cleared (clearAromaticFlags defaults to False). 579You can explicitly force or decline a clearing of the flags: 580 581.. doctest:: 582 583 >>> m = Chem.MolFromSmiles('c1ccccc1') 584 >>> m.GetBondWithIdx(0).GetIsAromatic() 585 True 586 >>> m1 = Chem.MolFromSmiles('c1ccccc1') 587 >>> Chem.Kekulize(m1, clearAromaticFlags=True) 588 >>> m1.GetBondWithIdx(0).GetIsAromatic() 589 False 590 591Bonds can be restored to the aromatic bond type using the :py:func:`rdkit.Chem.rdmolops.SanitizeMol` function: 592 593.. doctest:: 594 595 >>> Chem.SanitizeMol(m) 596 rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE 597 >>> m.GetBondWithIdx(0).GetBondType() 598 rdkit.Chem.rdchem.BondType.AROMATIC 599 600The value returned by `SanitizeMol()` indicates that no problems were encountered. 601 602Working with 2D molecules: Generating Depictions 603================================================ 604 605The RDKit has a library for generating depictions (sets of 2D) coordinates for molecules. 606This library, which is part of the AllChem module, is accessed using the :py:func:`rdkit.Chem.rdDepictor.Compute2DCoords` function: 607 608.. doctest:: 609 610 >>> m = Chem.MolFromSmiles('c1nccc2n1ccc2') 611 >>> AllChem.Compute2DCoords(m) 612 0 613 614The 2D conformer is constructed in a canonical orientation and is 615built to minimize intramolecular clashes, i.e. to maximize the clarity 616of the drawing. 617 618If you have a set of molecules that share a common template and you'd 619like to align them to that template, you can do so as follows: 620 621.. doctest:: 622 623 >>> template = Chem.MolFromSmiles('c1nccc2n1ccc2') 624 >>> AllChem.Compute2DCoords(template) 625 0 626 >>> ms = [Chem.MolFromSmiles(smi) for smi in ('OCCc1ccn2cnccc12','C1CC1Oc1cc2ccncn2c1','CNC(=O)c1nccc2cccn12')] 627 >>> for m in ms: 628 ... _ = AllChem.GenerateDepictionMatching2DStructure(m,template) 629 630Running this process for the molecules above gives: 631 632+---------------+---------------+---------------+ 633| |picture_1| | |picture_0| | |picture_3| | 634+---------------+---------------+---------------+ 635 636Another option for Compute2DCoords allows you to generate 2D depictions for molecules that closely mimic 3D conformers. 637This is available using the function :py:func:`rdkit.Chem.AllChem.GenerateDepictionMatching3DStructure`. 638 639Here is an illustration of the results using the ligand from PDB structure 1XP0: 640 641+---------------+---------------+ 642| |picture_2| | |picture_4| | 643+---------------+---------------+ 644 645More fine-grained control can be obtained using the core function 646:py:func:`rdkit.Chem.rdDepictor.Compute2DCoordsMimicDistmat`, but that is 647beyond the scope of this document. See the implementation of 648GenerateDepictionMatching3DStructure in AllChem.py for an example of 649how it is used. 650 651 652Working with 3D Molecules 653========================= 654 655The RDKit can generate conformers for molecules using two different 656methods. The original method used distance geometry. [#blaney]_ 657The algorithm followed is: 658 6591. The molecule's distance bounds matrix is calculated based on the connection table and a set of rules. 660 6612. The bounds matrix is smoothed using a triangle-bounds smoothing algorithm. 662 6633. A random distance matrix that satisfies the bounds matrix is generated. 664 6654. This distance matrix is embedded in 3D dimensions (producing coordinates for each atom). 666 6675. The resulting coordinates are cleaned up somewhat using a crude force field and the bounds matrix. 668 669Note that the conformers that result from this procedure tend to be fairly ugly. 670They should be cleaned up using a force field. This can be done within the RDKit 671using its implementation of the Universal Force Field (UFF). [#rappe]_ 672 673More recently, there is an implementation of the ETKDG method of Riniker and 674Landrum [#riniker2]_ which uses torsion angle preferences from the Cambridge 675Structural Database (CSD) to correct the conformers after distance geometry has 676been used to generate them. With this method, there should be no need to use a 677minimisation step to clean up the structures. 678 679More detailed information about the conformer generator and the parameters 680controlling it can be found in the "RDKit Book". 681 682Since the 2018.09 release of the RDKit, ETKDG is the default conformer generation method. 683 684The full process of embedding a molecule is easier than all the above verbiage makes it sound: 685 686.. doctest:: 687 688 >>> m2=Chem.AddHs(m) 689 >>> AllChem.EmbedMolecule(m2) 690 0 691 692The RDKit also has an implementation of the MMFF94 force field available. 693[#mmff1]_, [#mmff2]_, [#mmff3]_, [#mmff4]_, [#mmffs]_ Please note that the MMFF 694atom typing code uses its own aromaticity model, so the aromaticity flags of the 695molecule will be modified after calling MMFF-related methods. 696 697Here's an example of using MMFF94 to minimize an RDKit-generated conformer: 698.. doctest:: 699 700 >>> m = Chem.MolFromSmiles('C1CCC1OC') 701 >>> m2=Chem.AddHs(m) 702 >>> AllChem.EmbedMolecule(m2) 703 0 704 >>> AllChem.MMFFOptimizeMolecule(m2) 705 0 706 707Note the calls to `Chem.AddHs()` in the examples above. By default 708RDKit molecules do not have H atoms explicitly present in the graph, 709but they are important for getting realistic geometries, so they 710generally should be added. They can always be removed afterwards 711if necessary with a call to `Chem.RemoveHs()`. 712 713With the RDKit, multiple conformers can also be generated using the 714different embedding methods. In both cases this is simply a matter of 715running the distance geometry calculation multiple times from 716different random start points. The option `numConfs` allows the user to 717set the number of conformers that should be generated. Otherwise the 718procedures are as before. The conformers so generated can be aligned 719to each other and the RMS values calculated. 720 721.. doctest:: 722 723 >>> m = Chem.MolFromSmiles('C1CCC1OC') 724 >>> m2=Chem.AddHs(m) 725 >>> # run ETKDG 10 times 726 >>> cids = AllChem.EmbedMultipleConfs(m2, numConfs=10) 727 >>> print(len(cids)) 728 10 729 >>> rmslist = [] 730 >>> AllChem.AlignMolConformers(m2, RMSlist=rmslist) 731 >>> print(len(rmslist)) 732 9 733 734rmslist contains the RMS values between the first conformer and all others. 735The RMS between two specific conformers (e.g. 1 and 9) can also be calculated. 736The flag prealigned lets the user specify if the conformers are already aligned 737(by default, the function aligns them). 738 739.. doctest:: 740 741 >>> rms = AllChem.GetConformerRMS(m2, 1, 9, prealigned=True) 742 743If you are interested in running MMFF94 on a molecule's conformers (note that 744this is often not necessary when using ETKDG), there's a convenience 745function available: 746 747.. doctest:: 748 749 >>> res = AllChem.MMFFOptimizeMoleculeConfs(m2) 750 751The result is a list a containing 2-tuples: `(not_converged, energy)` for 752each conformer. If `not_converged` is 0, the minimization for that conformer 753converged. 754 755By default `AllChem.EmbedMultipleConfs` and `AllChem.MMFFOptimizeMoleculeConfs()` 756run single threaded, but you can cause them to use 757multiple threads simultaneously for these embarassingly parallel tasks 758via the `numThreads` argument: 759 760.. doctest:: 761 762 >>> cids = AllChem.EmbedMultipleConfs(m2, numThreads=0) 763 >>> res = AllChem.MMFFOptimizeMoleculeConfs(m2, numThreads=0) 764 765Setting `numThreads` to zero causes the software to use the maximum number 766of threads allowed on your computer. 767 768*Disclaimer/Warning*: Conformer generation is a difficult and subtle task. The 769plain distance-geometry 2D->3D conversion provided with the RDKit is not 770intended to be a replacement for a “real” conformer analysis tool; it merely 771provides quick 3D structures for cases when they are required. We believe, 772however, that the newer ETKDG method [#riniker2]_ is suitable for most purposes. 773 774 775Preserving Molecules 776==================== 777 778Molecules can be converted to and from text using Python's pickling machinery: 779 780.. doctest:: 781 782 >>> m = Chem.MolFromSmiles('c1ccncc1') 783 >>> import pickle 784 >>> pkl = pickle.dumps(m) 785 >>> m2=pickle.loads(pkl) 786 >>> Chem.MolToSmiles(m2) 787 'c1ccncc1' 788 789The RDKit pickle format is fairly compact and it is much, much faster to build a 790molecule from a pickle than from a Mol file or SMILES string, so storing 791molecules you will be working with repeatedly as pickles can be a good idea. 792 793The raw binary data that is encapsulated in a pickle can also be directly 794obtained from a molecule: 795 796.. doctest:: 797 798 >>> binStr = m.ToBinary() 799 800This can be used to reconstruct molecules using the Chem.Mol constructor: 801 802.. doctest:: 803 804 >>> m2 = Chem.Mol(binStr) 805 >>> Chem.MolToSmiles(m2) 806 'c1ccncc1' 807 >>> len(binStr) 808 123 809 810Note that this is smaller than the pickle: 811 812.. doctest:: 813 814 >>> len(binStr) < len(pkl) 815 True 816 817The small overhead associated with python's pickling machinery normally doesn't 818end up making much of a difference for collections of larger molecules (the 819extra data associated with the pickle is independent of the size of the 820molecule, while the binary string increases in length as the molecule gets 821larger). 822 823*Tip*: The performance difference associated with storing molecules in a pickled 824form on disk instead of constantly reparsing an SD file or SMILES table is 825difficult to overstate. In a test I just ran on my laptop, loading a set of 699 826drug-like molecules from an SD file took 10.8 seconds; loading the same 827molecules from a pickle file took 0.7 seconds. The pickle file is also smaller – 8281/3 the size of the SD file – but this difference is not always so dramatic 829(it's a particularly fat SD file). 830 831Drawing Molecules 832================= 833 834The RDKit has some built-in functionality for creating images from 835molecules found in the :py:mod:`rdkit.Chem.Draw` package: 836 837.. doctest:: 838 839 >>> with Chem.SDMolSupplier('data/cdk2.sdf') as suppl: 840 ... ms = [x for x in suppl if x is not None] 841 >>> for m in ms: tmp=AllChem.Compute2DCoords(m) 842 >>> from rdkit.Chem import Draw 843 >>> Draw.MolToFile(ms[0],'images/cdk2_mol1.o.png') # doctest: +SKIP 844 >>> Draw.MolToFile(ms[1],'images/cdk2_mol2.o.png') # doctest: +SKIP 845 846Producing these images: 847 848+----------------------------------+----------------------------------+ 849| .. image:: images/cdk2_mol1.png | .. image:: images/cdk2_mol2.png | 850+----------------------------------+----------------------------------+ 851 852It's also possible to produce an image grid out of a set of molecules: 853 854.. doctest:: 855 856 >>> img=Draw.MolsToGridImage(ms[:8],molsPerRow=4,subImgSize=(200,200),legends=[x.GetProp("_Name") for x in ms[:8]]) # doctest: +SKIP 857 858This returns a PIL image, which can then be saved to a file: 859 860.. doctest:: 861 862 >>> img.save('images/cdk2_molgrid.o.png') # doctest: +SKIP 863 864The result looks like this: 865 866.. image:: images/cdk2_molgrid.png 867 868These would of course look better if the common core were 869aligned. This is easy enough to do: 870 871.. doctest:: 872 873 >>> p = Chem.MolFromSmiles('[nH]1cnc2cncnc21') 874 >>> subms = [x for x in ms if x.HasSubstructMatch(p)] 875 >>> len(subms) 876 14 877 >>> AllChem.Compute2DCoords(p) 878 0 879 >>> for m in subms: 880 ... _ = AllChem.GenerateDepictionMatching2DStructure(m,p) 881 >>> img=Draw.MolsToGridImage(subms,molsPerRow=4,subImgSize=(200,200),legends=[x.GetProp("_Name") for x in subms]) # doctest: +SKIP 882 >>> img.save('images/cdk2_molgrid.aligned.o.png') # doctest: +SKIP 883 884 885The result looks like this: 886 887.. image:: images/cdk2_molgrid_aligned.png 888 889Atoms in a molecule can be highlighted by drawing a coloured solid or 890open circle around them, and bonds likewise can have a coloured 891outline applied. An obvious use is to show atoms and bonds that have 892matched a substructure query 893 894.. doctest:: 895 896 >>> from rdkit.Chem.Draw import rdMolDraw2D 897 >>> smi = 'c1cc(F)ccc1Cl' 898 >>> mol = Chem.MolFromSmiles(smi) 899 >>> patt = Chem.MolFromSmarts('ClccccF') 900 >>> hit_ats = list(mol.GetSubstructMatch(patt)) 901 >>> hit_bonds = [] 902 >>> for bond in patt.GetBonds(): 903 ... aid1 = hit_ats[bond.GetBeginAtomIdx()] 904 ... aid2 = hit_ats[bond.GetEndAtomIdx()] 905 ... hit_bonds.append(mol.GetBondBetweenAtoms(aid1,aid2).GetIdx()) 906 >>> d = rdMolDraw2D.MolDraw2DSVG(500, 500) # or MolDraw2DCairo to get PNGs 907 >>> rdMolDraw2D.PrepareAndDrawMolecule(d, mol, highlightAtoms=hit_ats, 908 ... highlightBonds=hit_bonds) 909 910will produce: 911 912.. image:: images/atom_highlights_1.png 913 914It is possible to specify the colours for individual atoms and bonds: 915 916.. doctest:: 917 918 >>> colours = [(0.8,0.0,0.8),(0.8,0.8,0),(0,0.8,0.8),(0,0,0.8)] 919 >>> atom_cols = {} 920 >>> for i, at in enumerate(hit_ats): 921 ... atom_cols[at] = colours[i%4] 922 >>> bond_cols = {} 923 >>> for i, bd in enumerate(hit_bonds): 924 ... bond_cols[bd] = colours[3 - i%4] 925 >>> 926 >>> d = rdMolDraw2D.MolDraw2DCairo(500, 500) 927 >>> rdMolDraw2D.PrepareAndDrawMolecule(d, mol, highlightAtoms=hit_ats, 928 ... highlightAtomColors=atom_cols, 929 ... highlightBonds=hit_bonds, 930 ... highlightBondColors=bond_cols) 931 932to give: 933 934.. image:: images/atom_highlights_2.png 935 936Atoms and bonds can also be highlighted with multiple colours if they 937fall into multiple sets, for example if they are matched by more 938than 1 substructure pattern. This is too complicated to show in this 939simple introduction, but there is an example in 940data/test_multi_colours.py, which produces the somewhat garish 941 942.. image:: images/atom_highlights_3.png 943 944As of version 2020.03, it is possible to add arbitrary small strings to annotate 945atoms and bonds in the drawing. The strings are added as properties 946``atomNote`` and ``bondNote`` and they will be placed automatically close to the 947atom or bond in question in a manner intended to minimise their clash with the 948rest of the drawing. For convenience, here are 3 flags in ``MolDraw2DOptions`` 949that will add stereo information (R/S to atoms, E/Z to bonds) and atom and bond 950sequence numbers. 951 952.. doctest:: 953 954 >>> mol = Chem.MolFromSmiles('Cl[C@H](F)NC\C=C\C') 955 >>> d = rdMolDraw2D.MolDraw2DCairo(250, 200) # or MolDraw2DSVG to get SVGs 956 >>> mol.GetAtomWithIdx(2).SetProp('atomNote', 'foo') 957 >>> mol.GetBondWithIdx(0).SetProp('bondNote', 'bar') 958 >>> d.drawOptions().addStereoAnnotation = True 959 >>> d.drawOptions().addAtomIndices = True 960 >>> d.DrawMolecule(mol) 961 >>> d.FinishDrawing() 962 >>> d.WriteDrawingText('atom_annotation_1.png') # doctest: +SKIP 963 964will produce 965 966.. image:: images/atom_annotation_1.png 967 968If atoms have an ``atomLabel`` property set, this will be used when drawing them: 969 970.. doctest:: 971 972 >>> smi = 'c1nc(*)ccc1* |$;;;R1;;;;R2$|' 973 >>> mol = Chem.MolFromSmiles(smi) 974 >>> mol.GetAtomWithIdx(3).GetProp("atomLabel") 975 'R1' 976 >>> mol.GetAtomWithIdx(7).GetProp("atomLabel") 977 'R2' 978 >>> d = rdMolDraw2D.MolDraw2DCairo(250, 250) 979 >>> rdMolDraw2D.PrepareAndDrawMolecule(d,mol) 980 >>> d.WriteDrawingText("./images/atom_labels_1.png") # doctest: +SKIP 981 982gives: 983 984.. image:: images/atom_labels_1.png 985 986Since the ``atomLabel`` property is also used for other things (for example in CXSMILES as demonstrated), 987if you want to provide your own atom labels, it's better to use the ``_displayLabel`` property: 988 989 >>> smi = 'c1nc(*)ccc1* |$;;;R1;;;;R2$|' 990 >>> mol = Chem.MolFromSmiles(smi) 991 >>> mol.GetAtomWithIdx(3).SetProp("_displayLabel","R<sub>1</sub>") 992 >>> mol.GetAtomWithIdx(7).SetProp("_displayLabel","R<sub>2</sub>") 993 >>> d = rdMolDraw2D.MolDraw2DCairo(250, 250) 994 >>> rdMolDraw2D.PrepareAndDrawMolecule(d,mol) 995 >>> d.WriteDrawingText("./images/atom_labels_2.png") # doctest: +SKIP 996 997this gives: 998 999.. image:: images/atom_labels_2.png 1000 1001Note that you can use ``<sup>`` and ``<sub>`` in these labels to provide super- and subscripts. 1002 1003Finally, if you have atom labels which should be displayed differently when the bond comes 1004into them from the right (the West), you can also set the ``_displayLabelW`` property: 1005 1006 1007.. doctest:: 1008 1009 >>> smi = 'c1nc(*)ccc1* |$;;;R1;;;;R2$|' 1010 >>> mol = Chem.MolFromSmiles(smi) 1011 >>> mol.GetAtomWithIdx(3).SetProp("_displayLabel","CO<sub>2</sub>H") 1012 >>> mol.GetAtomWithIdx(3).SetProp("_displayLabelW","HO<sub>2</sub>C") 1013 >>> mol.GetAtomWithIdx(7).SetProp("_displayLabel","CO<sub>2</sub><sup>-</sup>") 1014 >>> mol.GetAtomWithIdx(7).SetProp("_displayLabelW","<sup>-</sup>OOC") 1015 >>> d = rdMolDraw2D.MolDraw2DCairo(250, 250) 1016 >>> rdMolDraw2D.PrepareAndDrawMolecule(d,mol) 1017 >>> d.WriteDrawingText("./images/atom_labels_3.png") # doctest: +SKIP 1018 1019this gives: 1020 1021.. image:: images/atom_labels_3.png 1022 1023 1024Metadata in Molecule Images 1025=========================== 1026 1027*New in 2020.09 release* 1028 1029The PNG files generated by the `MolDraw2DCairo` class by default include 1030metadata about the molecule(s) or chemical reaction included in the drawing. 1031This metadata can be used later to reconstruct the molecule(s) or reaction. 1032 1033.. doctest:: 1034 1035 >>> template = Chem.MolFromSmiles('c1nccc2n1ccc2') 1036 >>> AllChem.Compute2DCoords(template) 1037 0 1038 >>> ms = [Chem.MolFromSmiles(smi) for smi in ('OCCc1ccn2cnccc12','C1CC1Oc1cc2ccncn2c1','CNC(=O)c1nccc2cccn12')] 1039 >>> _ = [AllChem.GenerateDepictionMatching2DStructure(m,template) for m in ms] 1040 >>> d = rdMolDraw2D.MolDraw2DCairo(250, 200) 1041 >>> d.DrawMolecule(ms[0]) 1042 >>> d.FinishDrawing() 1043 >>> png = d.GetDrawingText() 1044 >>> mol = Chem.MolFromPNGString(png) 1045 >>> Chem.MolToSmiles(mol) 1046 'OCCc1c2ccncn2cc1' 1047 1048The molecular metadata is stored using standard metadata tags in the PNG and is, 1049of course, not visible when you look at the PNG: 1050 1051.. image:: images/mol_metadata1.png 1052 1053If the PNG contains multiple molecules we can retrieve them all at once using 1054`Chem.MolsFromPNGString()`: 1055 1056.. doctest:: 1057 1058 >>> from rdkit.Chem import Draw 1059 >>> png = Draw.MolsToGridImage(ms,returnPNG=True) 1060 >>> mols = Chem.MolsFromPNGString(png) 1061 >>> for mol in mols: 1062 ... print(Chem.MolToSmiles(mol)) 1063 ... 1064 OCCc1c2ccncn2cc1 1065 c1cc2cc(OC3CC3)cn2cn1 1066 CNC(=O)c1nccc2cccn12 1067 1068Substructure Searching 1069********************** 1070 1071Substructure matching can be done using query molecules built from SMARTS: 1072 1073.. doctest:: 1074 1075 >>> m = Chem.MolFromSmiles('c1ccccc1O') 1076 >>> patt = Chem.MolFromSmarts('ccO') 1077 >>> m.HasSubstructMatch(patt) 1078 True 1079 >>> m.GetSubstructMatch(patt) 1080 (0, 5, 6) 1081 1082Those are the atom indices in ``m``, ordered as ``patt``'s atoms. To get all of the matches: 1083 1084.. doctest:: 1085 1086 >>> m.GetSubstructMatches(patt) 1087 ((0, 5, 6), (4, 5, 6)) 1088 1089This can be used to easily filter lists of molecules: 1090 1091.. doctest:: 1092 1093 >>> patt = Chem.MolFromSmarts('c[NH1]') 1094 >>> matches = [] 1095 >>> with Chem.SDMolSupplier('data/actives_5ht3.sdf') as suppl: 1096 ... for mol in suppl: 1097 ... if mol.HasSubstructMatch(patt): 1098 ... matches.append(mol) 1099 >>> len(matches) 1100 22 1101 1102We can write the same thing more compactly using Python's list comprehension syntax: 1103 1104.. doctest:: 1105 1106 >>> with Chem.SDMolSupplier('data/actives_5ht3.sdf') as suppl: 1107 ... matches = [x for x in suppl if x.HasSubstructMatch(patt)] 1108 >>> len(matches) 1109 22 1110 1111Substructure matching can also be done using molecules built from SMILES instead of SMARTS: 1112 1113.. doctest:: 1114 1115 >>> m = Chem.MolFromSmiles('C1=CC=CC=C1OC') 1116 >>> m.HasSubstructMatch(Chem.MolFromSmarts('CO')) 1117 True 1118 >>> m.HasSubstructMatch(Chem.MolFromSmiles('CO')) 1119 True 1120 1121But don't forget that the semantics of the two languages are not exactly equivalent: 1122 1123.. doctest:: 1124 1125 >>> m.HasSubstructMatch(Chem.MolFromSmiles('COC')) 1126 True 1127 >>> m.HasSubstructMatch(Chem.MolFromSmarts('COC')) 1128 False 1129 >>> m.HasSubstructMatch(Chem.MolFromSmarts('COc')) #<- need an aromatic C 1130 True 1131 1132 1133Stereochemistry in substructure matches 1134======================================= 1135 1136By default information about stereochemistry is not used in 1137substructure searches: 1138 1139.. doctest:: 1140 1141 >>> m = Chem.MolFromSmiles('CC[C@H](F)Cl') 1142 >>> m.HasSubstructMatch(Chem.MolFromSmiles('C[C@H](F)Cl')) 1143 True 1144 >>> m.HasSubstructMatch(Chem.MolFromSmiles('C[C@@H](F)Cl')) 1145 True 1146 >>> m.HasSubstructMatch(Chem.MolFromSmiles('CC(F)Cl')) 1147 True 1148 1149But this can be changed via the `useChirality` argument: 1150 1151.. doctest:: 1152 1153 >>> m.HasSubstructMatch(Chem.MolFromSmiles('C[C@H](F)Cl'),useChirality=True) 1154 True 1155 >>> m.HasSubstructMatch(Chem.MolFromSmiles('C[C@@H](F)Cl'),useChirality=True) 1156 False 1157 >>> m.HasSubstructMatch(Chem.MolFromSmiles('CC(F)Cl'),useChirality=True) 1158 True 1159 1160Notice that when `useChirality` is set a non-chiral query **does** match a chiral 1161molecule. The same is not true for a chiral query and a non-chiral molecule: 1162 1163.. doctest:: 1164 1165 >>> m.HasSubstructMatch(Chem.MolFromSmiles('CC(F)Cl')) 1166 True 1167 >>> m2 = Chem.MolFromSmiles('CCC(F)Cl') 1168 >>> m2.HasSubstructMatch(Chem.MolFromSmiles('C[C@H](F)Cl'),useChirality=True) 1169 False 1170 1171Atom Map Indices in SMARTS 1172========================== 1173 1174It is possible to attach indices to the atoms in the SMARTS 1175pattern. This is most often done in reaction SMARTS (see `Chemical 1176Reactions`_), but is more general than that. For example, in the 1177SMARTS patterns for torsion angle analysis published by Guba `et al.` 1178(``DOI: acs.jcim.5b00522``) indices are used to define the four atoms of 1179the torsion of interest. This allows additional atoms to be used to 1180define the environment of the four torsion atoms, as in 1181``[cH0:1][c:2]([cH0])!@[CX3!r:3]=[NX2!r:4]`` for an aromatic C=N 1182torsion. We might wonder in passing why they didn't use 1183recursive SMARTS for this, which would have made life easier, but it 1184is what it is. The atom lists from ``GetSubstructureMatches`` are 1185guaranteed to be in order of the SMARTS, but in this case we'll get five 1186atoms so we need a way of picking out, in the correct order, the four of 1187interest. When the SMARTS is parsed, the relevant atoms are assigned an 1188atom map number property that we can easily extract: 1189 1190.. doctest:: 1191 1192 >>> qmol = Chem.MolFromSmarts( '[cH0:1][c:2]([cH0])!@[CX3!r:3]=[NX2!r:4]' ) 1193 >>> ind_map = {} 1194 >>> for atom in qmol.GetAtoms() : 1195 ... map_num = atom.GetAtomMapNum() 1196 ... if map_num: 1197 ... ind_map[map_num-1] = atom.GetIdx() 1198 >>> ind_map 1199 {0: 0, 1: 1, 2: 3, 3: 4} 1200 >>> map_list = [ind_map[x] for x in sorted(ind_map)] 1201 >>> map_list 1202 [0, 1, 3, 4] 1203 1204Then, when using the query on a molecule you can get the indices of the four 1205matching atoms like this: 1206 1207.. doctest:: 1208 1209 >>> mol = Chem.MolFromSmiles('Cc1cccc(C)c1C(C)=NC') 1210 >>> for match in mol.GetSubstructMatches( qmol ) : 1211 ... mas = [match[x] for x in map_list] 1212 ... print(mas) 1213 [1, 7, 8, 10] 1214 1215 1216Advanced substructure matching 1217============================== 1218 1219Starting with the 2020.03 release, the RDKit allows you to provide an optional 1220function that is used to check whether or not a possible substructure match should 1221be accepted. This function is called with the molecule to be matched and the indices 1222of the matching atoms. 1223 1224Here's an example of how you can use the functionality to do "Markush-like" matching, 1225requiring that all atoms in a sidechain are either carbon (type "all_carbon") or aren't 1226aromatic (type "alkyl"). We start by defining the class that we'll use to test the 1227sidechains: 1228 1229.. testcode:: 1230 1231 from rdkit import Chem 1232 1233 class SidechainChecker(object): 1234 matchers = { 1235 'alkyl': lambda at: not at.GetIsAromatic(), 1236 'all_carbon': lambda at: at.GetAtomicNum() == 6 1237 } 1238 1239 def __init__(self, query, pName="queryType"): 1240 # identify the atoms that have the properties we care about 1241 self._atsToExamine = [(x.GetIdx(), x.GetProp(pName)) for x in query.GetAtoms() 1242 if x.HasProp(pName)] 1243 self._pName = pName 1244 1245 def __call__(self, mol, vect): 1246 seen = [0] * mol.GetNumAtoms() 1247 for idx in vect: 1248 seen[idx] = 1 1249 # loop over the atoms we care about: 1250 for idx, qtyp in self._atsToExamine: 1251 midx = vect[idx] 1252 stack = [midx] 1253 atom = mol.GetAtomWithIdx(midx) 1254 # now do a breadth-first search from that atom, checking 1255 # all of its neighbors that aren't in the substructure 1256 # query: 1257 stack = [atom] 1258 while stack: 1259 atom = stack.pop(0) 1260 if not self.matchers[qtyp](atom): 1261 return False 1262 seen[atom.GetIdx()] = 1 1263 for nbr in atom.GetNeighbors(): 1264 if not seen[nbr.GetIdx()]: 1265 stack.append(nbr) 1266 return True 1267 1268 1269Here's the molecule we'll use: 1270 1271.. image:: images/substruct_search_parameters1.png 1272 1273And the default behavior: 1274 1275.. doctest:: 1276 1277 >>> m = Chem.MolFromSmiles('C2NCC2CC1C(CCCC)C(OCCCC)C1c2ccccc2') 1278 >>> p = Chem.MolFromSmarts('C1CCC1*') 1279 >>> p.GetAtomWithIdx(4).SetProp("queryType", "all_carbon") 1280 >>> m.GetSubstructMatches(p) 1281 ((5, 6, 11, 17, 18), (5, 17, 11, 6, 7), (6, 5, 17, 11, 12), (6, 11, 17, 5, 4)) 1282 1283Now let's add the final check to filter the results: 1284 1285.. doctest:: 1286 1287 >>> params = Chem.SubstructMatchParameters() 1288 >>> checker = SidechainChecker(p) 1289 >>> params.setExtraFinalCheck(checker) 1290 >>> m.GetSubstructMatches(p,params) 1291 ((5, 6, 11, 17, 18), (5, 17, 11, 6, 7)) 1292 1293Repeat that using the 'alkyl' query: 1294 1295.. doctest:: 1296 1297 >>> p.GetAtomWithIdx(4).SetProp("queryType", "alkyl") 1298 >>> checker = SidechainChecker(p) 1299 >>> params.setExtraFinalCheck(checker) 1300 >>> m.GetSubstructMatches(p,params) 1301 ((5, 17, 11, 6, 7), (6, 5, 17, 11, 12), (6, 11, 17, 5, 4)) 1302 1303 1304Chemical Transformations 1305************************ 1306 1307The RDKit contains a number of functions for modifying molecules. Note 1308that these transformation functions are intended to provide an easy 1309way to make simple modifications to molecules. 1310For more complex transformations, use the `Chemical Reactions`_ functionality. 1311 1312Substructure-based transformations 1313================================== 1314 1315There's a variety of functionality for using the RDKit's 1316substructure-matching machinery for doing quick molecular transformations. 1317These transformations include deleting substructures: 1318 1319.. doctest:: 1320 1321 >>> m = Chem.MolFromSmiles('CC(=O)O') 1322 >>> patt = Chem.MolFromSmarts('C(=O)[OH]') 1323 >>> rm = AllChem.DeleteSubstructs(m,patt) 1324 >>> Chem.MolToSmiles(rm) 1325 'C' 1326 1327replacing substructures: 1328 1329.. doctest:: 1330 1331 >>> repl = Chem.MolFromSmiles('OC') 1332 >>> patt = Chem.MolFromSmarts('[$(NC(=O))]') 1333 >>> m = Chem.MolFromSmiles('CC(=O)N') 1334 >>> rms = AllChem.ReplaceSubstructs(m,patt,repl) 1335 >>> rms 1336 (<rdkit.Chem.rdchem.Mol object at 0x...>,) 1337 >>> Chem.MolToSmiles(rms[0]) 1338 'COC(C)=O' 1339 1340as well as simple SAR-table transformations like removing side chains: 1341 1342.. doctest:: 1343 1344 >>> m1 = Chem.MolFromSmiles('BrCCc1cncnc1C(=O)O') 1345 >>> core = Chem.MolFromSmiles('c1cncnc1') 1346 >>> tmp = Chem.ReplaceSidechains(m1,core) 1347 >>> Chem.MolToSmiles(tmp) 1348 '[1*]c1cncnc1[2*]' 1349 1350and removing cores: 1351 1352.. doctest:: 1353 1354 >>> tmp = Chem.ReplaceCore(m1,core) 1355 >>> Chem.MolToSmiles(tmp) 1356 '[1*]CCBr.[2*]C(=O)O' 1357 1358By default the sidechains are labeled based on the order they are found. 1359They can also be labeled according by the number of that core-atom they're attached to: 1360 1361.. doctest:: 1362 1363 >>> m1 = Chem.MolFromSmiles('c1c(CCO)ncnc1C(=O)O') 1364 >>> tmp=Chem.ReplaceCore(m1,core,labelByIndex=True) 1365 >>> Chem.MolToSmiles(tmp) 1366 '[1*]CCO.[5*]C(=O)O' 1367 1368:py:func:`rdkit.Chem.rdmolops.ReplaceCore` returns the sidechains in a single molecule. 1369This can be split into separate molecules using :py:func:`rdkit.Chem.rdmolops.GetMolFrags` : 1370 1371.. doctest:: 1372 1373 >>> rs = Chem.GetMolFrags(tmp,asMols=True) 1374 >>> len(rs) 1375 2 1376 >>> Chem.MolToSmiles(rs[0]) 1377 '[1*]CCO' 1378 >>> Chem.MolToSmiles(rs[1]) 1379 '[5*]C(=O)O' 1380 1381 1382Murcko Decomposition 1383==================== 1384 1385The RDKit provides standard Murcko-type decomposition [#bemis1]_ of molecules 1386into scaffolds: 1387 1388.. doctest:: 1389 1390 >>> from rdkit.Chem.Scaffolds import MurckoScaffold 1391 >>> with Chem.SDMolSupplier('data/cdk2.sdf') as cdk2mols: 1392 ... m1 = cdk2mols[0] 1393 >>> core = MurckoScaffold.GetScaffoldForMol(m1) 1394 >>> Chem.MolToSmiles(core) 1395 'c1ncc2nc[nH]c2n1' 1396 1397or into a generic framework: 1398 1399.. doctest:: 1400 1401 >>> fw = MurckoScaffold.MakeScaffoldGeneric(core) 1402 >>> Chem.MolToSmiles(fw) 1403 'C1CCC2CCCC2C1' 1404 1405 1406Maximum Common Substructure 1407*************************** 1408 1409The FindMCS function find a maximum common substructure (MCS) of two 1410or more molecules: 1411 1412.. doctest:: 1413 1414 >>> from rdkit.Chem import rdFMCS 1415 >>> mol1 = Chem.MolFromSmiles("O=C(NCc1cc(OC)c(O)cc1)CCCC/C=C/C(C)C") 1416 >>> mol2 = Chem.MolFromSmiles("CC(C)CCCCCC(=O)NCC1=CC(=C(C=C1)O)OC") 1417 >>> mol3 = Chem.MolFromSmiles("c1(C=O)cc(OC)c(O)cc1") 1418 >>> mols = [mol1,mol2,mol3] 1419 >>> res=rdFMCS.FindMCS(mols) 1420 >>> res 1421 <rdkit.Chem.rdFMCS.MCSResult object at 0x...> 1422 >>> res.numAtoms 1423 10 1424 >>> res.numBonds 1425 10 1426 >>> res.smartsString 1427 '[#6]1(-[#6]):[#6]:[#6](-[#8]-[#6]):[#6](:[#6]:[#6]:1)-[#8]' 1428 >>> res.canceled 1429 False 1430 1431It returns an MCSResult instance with information about the number of 1432atoms and bonds in the MCS, the SMARTS string which matches the 1433identified MCS, and a flag saying if the algorithm timed out. If no 1434MCS is found then the number of atoms and bonds is set to 0 and the 1435SMARTS to ``''``. 1436 1437By default, two atoms match if they are the same element and two bonds 1438match if they have the same bond type. Specify ``atomCompare`` and 1439``bondCompare`` to use different comparison functions, as in: 1440 1441.. doctest:: 1442 1443 >>> mols = (Chem.MolFromSmiles('NCC'),Chem.MolFromSmiles('OC=C')) 1444 >>> rdFMCS.FindMCS(mols).smartsString 1445 '[#6]' 1446 >>> rdFMCS.FindMCS(mols, atomCompare=rdFMCS.AtomCompare.CompareAny).smartsString 1447 '[#7,#8]-[#6]' 1448 >>> rdFMCS.FindMCS(mols, bondCompare=rdFMCS.BondCompare.CompareAny).smartsString 1449 '[#6]-,=[#6]' 1450 1451The options for the atomCompare argument are: CompareAny says that any 1452atom matches any other atom, CompareElements compares by element type, 1453and CompareIsotopes matches based on the isotope label. Isotope labels 1454can be used to implement user-defined atom types. A bondCompare of 1455CompareAny says that any bond matches any other bond, CompareOrderExact says 1456bonds are equivalent if and only if they have the same bond type, and 1457CompareOrder allows single and aromatic bonds to match each other, but 1458requires an exact order match otherwise: 1459 1460.. doctest:: 1461 1462 >>> mols = (Chem.MolFromSmiles('c1ccccc1'),Chem.MolFromSmiles('C1CCCC=C1')) 1463 >>> rdFMCS.FindMCS(mols,bondCompare=rdFMCS.BondCompare.CompareAny).smartsString 1464 '[#6]1:,-[#6]:,-[#6]:,-[#6]:,-[#6]:,=[#6]:,-1' 1465 >>> rdFMCS.FindMCS(mols,bondCompare=rdFMCS.BondCompare.CompareOrderExact).smartsString 1466 '[#6]' 1467 >>> rdFMCS.FindMCS(mols,bondCompare=rdFMCS.BondCompare.CompareOrder).smartsString 1468 '[#6](:,-[#6]:,-[#6]:,-[#6]):,-[#6]:,-[#6]' 1469 1470A substructure has both atoms and bonds. By default, the algorithm 1471attempts to maximize the number of bonds found. You can change this by 1472setting the ``maximizeBonds`` argument to False. 1473Maximizing the number of bonds tends to maximize the number of rings, 1474although two small rings may have fewer bonds than one large ring. 1475 1476You might not want a 3-valent nitrogen to match one which is 5-valent. 1477The default ``matchValences`` value of False ignores valence 1478information. When True, the atomCompare setting is modified to also 1479require that the two atoms have the same valency. 1480 1481.. doctest:: 1482 1483 >>> mols = (Chem.MolFromSmiles('NC1OC1'),Chem.MolFromSmiles('C1OC1[N+](=O)[O-]')) 1484 >>> rdFMCS.FindMCS(mols).numAtoms 1485 4 1486 >>> rdFMCS.FindMCS(mols, matchValences=True).numBonds 1487 3 1488 1489It can be strange to see a linear carbon chain match a carbon ring, 1490which is what the ``ringMatchesRingOnly`` default of False does. If 1491you set it to True then ring bonds will only match ring bonds. 1492 1493.. doctest:: 1494 1495 >>> mols = [Chem.MolFromSmiles("C1CCC1CCC"), Chem.MolFromSmiles("C1CCCCCC1")] 1496 >>> rdFMCS.FindMCS(mols).smartsString 1497 '[#6](-[#6]-[#6])-[#6]-[#6]-[#6]-[#6]' 1498 >>> rdFMCS.FindMCS(mols, ringMatchesRingOnly=True).smartsString 1499 '[#6&R](-&@[#6&R]-&@[#6&R])-&@[#6&R]' 1500 1501Notice that the SMARTS returned now include ring queries on the atoms and bonds. 1502 1503You can further restrict things and require that partial rings (as in 1504this case) are not allowed. That is, if an atom is part of the MCS and 1505the atom is in a ring of the entire molecule then that atom is also in 1506a ring of the MCS. Setting ``completeRingsOnly`` to True toggles this 1507requirement. 1508 1509.. doctest:: 1510 1511 >>> mols = [Chem.MolFromSmiles("CCC1CC2C1CN2"), Chem.MolFromSmiles("C1CC2C1CC2")] 1512 >>> rdFMCS.FindMCS(mols).smartsString 1513 '[#6]1-[#6]-[#6](-[#6]-1-[#6])-[#6]' 1514 >>> rdFMCS.FindMCS(mols, ringMatchesRingOnly=True).smartsString 1515 '[#6&R](-&@[#6&R]-&@[#6&R]-&@[#6&R]-&@[#6&R])-&@[#6&R]' 1516 >>> rdFMCS.FindMCS(mols, completeRingsOnly=True).smartsString 1517 '[#6]1-&@[#6]-&@[#6]-&@[#6]-&@1' 1518 1519Of course the two options can be combined with each other: 1520 1521.. doctest:: 1522 1523 >>> ms = [Chem.MolFromSmiles(x) for x in ('CC1CCC1','CCC1CC1',)] 1524 >>> rdFMCS.FindMCS(ms,ringMatchesRingOnly=True).smartsString 1525 '[#6&!R]-&!@[#6&R](-&@[#6&R])-&@[#6&R]' 1526 >>> rdFMCS.FindMCS(ms,completeRingsOnly=True).smartsString 1527 '[#6]-&!@[#6]' 1528 >>> rdFMCS.FindMCS(ms,ringMatchesRingOnly=True,completeRingsOnly=True).smartsString 1529 '[#6&!R]-&!@[#6&R]' 1530 1531 1532The MCS algorithm will exhaustively search for a maximum common substructure. 1533Typically this takes a fraction of a second, but for some comparisons this 1534can take minutes or longer. Use the ``timeout`` parameter to stop the search 1535after the given number of seconds (wall-clock seconds, not CPU seconds) and 1536return the best match found in that time. If timeout is reached then the 1537``canceled`` property of the MCSResult will be True instead of False. 1538 1539.. doctest:: 1540 1541 >>> mols = [Chem.MolFromSmiles("Nc1ccccc1"*10), Chem.MolFromSmiles("Nc1ccccccccc1"*10)] 1542 >>> rdFMCS.FindMCS(mols, timeout=1).canceled 1543 True 1544 1545(The MCS after 50 seconds contained 511 atoms.) 1546 1547 1548Fingerprinting and Molecular Similarity 1549*************************************** 1550 1551The RDKit has a variety of built-in functionality for generating molecular fingerprints and using them to calculate molecular similarity. 1552 1553 1554Topological Fingerprints 1555======================== 1556 1557.. doctest:: 1558 1559 >>> from rdkit import DataStructs 1560 >>> ms = [Chem.MolFromSmiles('CCOC'), Chem.MolFromSmiles('CCO'), 1561 ... Chem.MolFromSmiles('COC')] 1562 >>> fps = [Chem.RDKFingerprint(x) for x in ms] 1563 >>> DataStructs.FingerprintSimilarity(fps[0],fps[1]) 1564 0.6... 1565 >>> DataStructs.FingerprintSimilarity(fps[0],fps[2]) 1566 0.4... 1567 >>> DataStructs.FingerprintSimilarity(fps[1],fps[2]) 1568 0.25 1569 1570More details about the algorithm used for the RDKit fingerprint can be found in the "RDKit Book". 1571 1572The default set of parameters used by the fingerprinter is: 1573- minimum path size: 1 bond 1574- maximum path size: 7 bonds 1575- fingerprint size: 2048 bits 1576- number of bits set per hash: 2 1577- minimum fingerprint size: 64 bits 1578- target on-bit density 0.0 1579 1580You can control these when calling 1581:py:func:`rdkit.Chem.rdmolops.RDKFingerprint`. 1582The function 1583:py:func:`rdkit.Chem.Fingerprints.FingerprintMols.FingerprintMol` (written 1584in python) shows how this is done. 1585 1586The default similarity metric used by 1587:py:func:`rdkit.DataStructs.FingerprintSimilarity` is the Tanimoto 1588similarity. One can use different similarity metrics: 1589 1590>>> DataStructs.FingerprintSimilarity(fps[0],fps[1], metric=DataStructs.DiceSimilarity) 15910.75 1592 1593Available similarity metrics include Tanimoto, Dice, Cosine, Sokal, Russel, Kulczynski, McConnaughey, and Tversky. 1594 1595 1596MACCS Keys 1597========== 1598 1599There is a SMARTS-based implementation of the 166 public MACCS keys. 1600 1601.. doctest:: 1602 1603 >>> from rdkit.Chem import MACCSkeys 1604 >>> fps = [MACCSkeys.GenMACCSKeys(x) for x in ms] 1605 >>> DataStructs.FingerprintSimilarity(fps[0],fps[1]) 1606 0.5 1607 >>> DataStructs.FingerprintSimilarity(fps[0],fps[2]) 1608 0.538... 1609 >>> DataStructs.FingerprintSimilarity(fps[1],fps[2]) 1610 0.214... 1611 1612The MACCS keys were critically evaluated and compared to other MACCS 1613implementations in Q3 2008. In cases where the public keys are fully defined, 1614things looked pretty good. 1615 1616 1617Atom Pairs and Topological Torsions 1618=================================== 1619 1620Atom-pair descriptors [#carhart]_ are available in several different forms. 1621The standard form is as fingerprint including counts for each bit instead of just zeros and ones: 1622 1623.. doctest:: 1624 1625 >>> from rdkit.Chem.AtomPairs import Pairs 1626 >>> ms = [Chem.MolFromSmiles('C1CCC1OCC'),Chem.MolFromSmiles('CC(C)OCC'),Chem.MolFromSmiles('CCOCC')] 1627 >>> pairFps = [Pairs.GetAtomPairFingerprint(x) for x in ms] 1628 1629Because the space of bits that can be included in atom-pair fingerprints is 1630huge, they are stored in a sparse manner. We can get the list of bits and their 1631counts for each fingerprint as a dictionary: 1632 1633.. doctest:: 1634 1635 >>> d = pairFps[-1].GetNonzeroElements() 1636 >>> d[541732] 1637 1 1638 >>> d[1606690] 1639 2 1640 1641Descriptions of the bits are also available: 1642 1643.. doctest:: 1644 1645 >>> Pairs.ExplainPairScore(558115) 1646 (('C', 1, 0), 3, ('C', 2, 0)) 1647 1648The above means: C with 1 neighbor and 0 pi electrons which is 3 bonds 1649from a C with 2 neighbors and 0 pi electrons 1650 1651The usual metric for similarity between atom-pair fingerprints is Dice similarity: 1652 1653.. doctest:: 1654 1655 >>> from rdkit import DataStructs 1656 >>> DataStructs.DiceSimilarity(pairFps[0],pairFps[1]) 1657 0.333... 1658 >>> DataStructs.DiceSimilarity(pairFps[0],pairFps[2]) 1659 0.258... 1660 >>> DataStructs.DiceSimilarity(pairFps[1],pairFps[2]) 1661 0.56 1662 1663It's also possible to get atom-pair descriptors encoded as a standard 1664bit vector fingerprint (ignoring the count information): 1665 1666.. doctest:: 1667 1668 >>> pairFps = [Pairs.GetAtomPairFingerprintAsBitVect(x) for x in ms] 1669 1670Since these are standard bit vectors, the :py:mod:`rdkit.DataStructs` 1671module can be used for similarity: 1672 1673.. doctest:: 1674 1675 >>> from rdkit import DataStructs 1676 >>> DataStructs.DiceSimilarity(pairFps[0],pairFps[1]) 1677 0.48 1678 >>> DataStructs.DiceSimilarity(pairFps[0],pairFps[2]) 1679 0.380... 1680 >>> DataStructs.DiceSimilarity(pairFps[1],pairFps[2]) 1681 0.625 1682 1683Topological torsion descriptors [#nilakantan]_ are calculated in 1684essentially the same way: 1685 1686.. doctest:: 1687 1688 >>> from rdkit.Chem.AtomPairs import Torsions 1689 >>> tts = [Torsions.GetTopologicalTorsionFingerprintAsIntVect(x) for x in ms] 1690 >>> DataStructs.DiceSimilarity(tts[0],tts[1]) 1691 0.166... 1692 1693At the time of this writing, topological torsion fingerprints have too many bits 1694to be encodeable using the BitVector machinery, so there is no 1695GetTopologicalTorsionFingerprintAsBitVect function. 1696 1697 1698Morgan Fingerprints (Circular Fingerprints) 1699=========================================== 1700 1701This family of fingerprints, better known as circular fingerprints 1702[#rogers]_, is built by applying the Morgan algorithm to a set of 1703user-supplied atom invariants. When generating Morgan fingerprints, 1704the radius of the fingerprint must also be provided : 1705 1706.. doctest:: 1707 1708 >>> from rdkit.Chem import AllChem 1709 >>> m1 = Chem.MolFromSmiles('Cc1ccccc1') 1710 >>> fp1 = AllChem.GetMorganFingerprint(m1,2) 1711 >>> fp1 1712 <rdkit.DataStructs.cDataStructs.UIntSparseIntVect object at 0x...> 1713 >>> m2 = Chem.MolFromSmiles('Cc1ncccc1') 1714 >>> fp2 = AllChem.GetMorganFingerprint(m2,2) 1715 >>> DataStructs.DiceSimilarity(fp1,fp2) 1716 0.55... 1717 1718Morgan fingerprints, like atom pairs and topological torsions, use 1719counts by default, but it's also possible to calculate them as bit 1720vectors: 1721 1722.. doctest:: 1723 1724 >>> fp1 = AllChem.GetMorganFingerprintAsBitVect(m1,2,nBits=1024) 1725 >>> fp1 1726 <rdkit.DataStructs.cDataStructs.ExplicitBitVect object at 0x...> 1727 >>> fp2 = AllChem.GetMorganFingerprintAsBitVect(m2,2,nBits=1024) 1728 >>> DataStructs.DiceSimilarity(fp1,fp2) 1729 0.51... 1730 1731The default atom invariants use connectivity information similar to 1732those used for the well known ECFP family of fingerprints. 1733Feature-based invariants, similar to those used for the FCFP 1734fingerprints, can also be used. The feature definitions used are 1735defined in the section `Feature Definitions Used in the Morgan 1736Fingerprints`_. At times this can lead to quite different similarity 1737scores: 1738 1739.. doctest:: 1740 1741 >>> m1 = Chem.MolFromSmiles('c1ccccn1') 1742 >>> m2 = Chem.MolFromSmiles('c1ccco1') 1743 >>> fp1 = AllChem.GetMorganFingerprint(m1,2) 1744 >>> fp2 = AllChem.GetMorganFingerprint(m2,2) 1745 >>> ffp1 = AllChem.GetMorganFingerprint(m1,2,useFeatures=True) 1746 >>> ffp2 = AllChem.GetMorganFingerprint(m2,2,useFeatures=True) 1747 >>> DataStructs.DiceSimilarity(fp1,fp2) 1748 0.36... 1749 >>> DataStructs.DiceSimilarity(ffp1,ffp2) 1750 0.90... 1751 1752When comparing the ECFP/FCFP fingerprints and the Morgan fingerprints 1753generated by the RDKit, remember that the 4 in ECFP4 corresponds to 1754the diameter of the atom environments considered, while the Morgan 1755fingerprints take a radius parameter. So the examples above, with 1756radius=2, are roughly equivalent to ECFP4 and FCFP4. 1757 1758The user can also provide their own atom invariants using the optional 1759invariants argument to 1760:py:func:`rdkit.Chem.rdMolDescriptors.GetMorganFingerprint`. Here's a 1761simple example that uses a constant for the invariant; the resulting 1762fingerprints compare the topology of molecules: 1763 1764.. doctest:: 1765 1766 >>> m1 = Chem.MolFromSmiles('Cc1ccccc1') 1767 >>> m2 = Chem.MolFromSmiles('Cc1ncncn1') 1768 >>> fp1 = AllChem.GetMorganFingerprint(m1,2,invariants=[1]*m1.GetNumAtoms()) 1769 >>> fp2 = AllChem.GetMorganFingerprint(m2,2,invariants=[1]*m2.GetNumAtoms()) 1770 >>> fp1==fp2 1771 True 1772 1773Note that bond order is by default still considered: 1774 1775.. doctest:: 1776 1777 >>> m3 = Chem.MolFromSmiles('CC1CCCCC1') 1778 >>> fp3 = AllChem.GetMorganFingerprint(m3,2,invariants=[1]*m3.GetNumAtoms()) 1779 >>> fp1==fp3 1780 False 1781 1782But this can also be turned off: 1783 1784.. doctest:: 1785 1786 >>> fp1 = AllChem.GetMorganFingerprint(m1,2,invariants=[1]*m1.GetNumAtoms(), 1787 ... useBondTypes=False) 1788 >>> fp3 = AllChem.GetMorganFingerprint(m3,2,invariants=[1]*m3.GetNumAtoms(), 1789 ... useBondTypes=False) 1790 >>> fp1==fp3 1791 True 1792 1793 1794Explaining bits from Morgan Fingerprints 1795---------------------------------------- 1796 1797Information is available about the atoms that contribute to particular 1798bits in the Morgan fingerprint via the bitInfo argument. The 1799dictionary provided is populated with one entry per bit set in the 1800fingerprint, the keys are the bit ids, the values are lists of (atom 1801index, radius) tuples. 1802 1803 1804.. doctest:: 1805 1806 >>> m = Chem.MolFromSmiles('c1cccnc1C') 1807 >>> info={} 1808 >>> fp = AllChem.GetMorganFingerprint(m,2,bitInfo=info) 1809 >>> len(fp.GetNonzeroElements()) 1810 16 1811 >>> len(info) 1812 16 1813 >>> info[98513984] 1814 ((1, 1), (2, 1)) 1815 >>> info[4048591891] 1816 ((5, 2),) 1817 1818Interpreting the above: bit 98513984 is set twice: once by atom 1 and 1819once by atom 2, each at radius 1. Bit 4048591891 is set once by atom 5 1820at radius 2. 1821 1822Focusing on bit 4048591891, we can extract the submolecule consisting 1823of all atoms within a radius of 2 of atom 5: 1824 1825.. doctest:: 1826 1827 >>> env = Chem.FindAtomEnvironmentOfRadiusN(m,2,5) 1828 >>> amap={} 1829 >>> submol=Chem.PathToSubmol(m,env,atomMap=amap) 1830 >>> submol.GetNumAtoms() 1831 6 1832 >>> amap 1833 {0: 0, 1: 1, 3: 2, 4: 3, 5: 4, 6: 5} 1834 1835And then “explain” the bit by generating SMILES for that submolecule: 1836 1837.. doctest:: 1838 1839 >>> Chem.MolToSmiles(submol) 1840 'ccc(C)nc' 1841 1842This is more useful when the SMILES is rooted at the central atom: 1843 1844.. doctest:: 1845 1846 >>> Chem.MolToSmiles(submol,rootedAtAtom=amap[5],canonical=False) 1847 'c(cc)(nc)C' 1848 1849An alternate (and faster, particularly for large numbers of molecules) 1850approach to do the same thing, using the function :py:func:`rdkit.Chem.MolFragmentToSmiles` : 1851 1852.. doctest:: 1853 1854 >>> atoms=set() 1855 >>> for bidx in env: 1856 ... atoms.add(m.GetBondWithIdx(bidx).GetBeginAtomIdx()) 1857 ... atoms.add(m.GetBondWithIdx(bidx).GetEndAtomIdx()) 1858 ... 1859 >>> Chem.MolFragmentToSmiles(m,atomsToUse=list(atoms),bondsToUse=env,rootedAtAtom=5) 1860 'c(C)(cc)nc' 1861 1862Generating images of fingerprint bits 1863===================================== 1864 1865For the Morgan and RDKit fingerprint types, it's possible to generate images of 1866the atom environment that defines the bit using the functions 1867:py:func:`rdkit.Chem.Draw.DrawMorganBit()` and :py:func:`rdkit.Chem.Draw.DrawRDKitBit()` 1868 1869.. doctest:: 1870 1871 >>> from rdkit.Chem import Draw 1872 >>> mol = Chem.MolFromSmiles('c1ccccc1CC1CC1') 1873 >>> bi = {} 1874 >>> fp = AllChem.GetMorganFingerprintAsBitVect(mol, radius=2, bitInfo=bi) 1875 >>> bi[872] 1876 ((6, 2),) 1877 >>> mfp2_svg = Draw.DrawMorganBit(mol, 872, bi, useSVG=True) 1878 >>> rdkbi = {} 1879 >>> rdkfp = Chem.RDKFingerprint(mol, maxPath=5, bitInfo=rdkbi) 1880 >>> rdkbi[1553] 1881 [[0, 1, 9, 5, 4], [2, 3, 4, 9, 5]] 1882 >>> rdk_svg = Draw.DrawRDKitBit(mol, 1553, rdkbi, useSVG=True) 1883 1884Producing these images: 1885 1886+-----------------------------------+-----------------------------------+ 1887| .. image:: images/mfp2_bit872.svg | .. image:: images/rdk_bit1553.svg | 1888+-----------------------------------+-----------------------------------+ 1889| Morgan bit | RDKit bit | 1890+-----------------------------------+-----------------------------------+ 1891 1892The default highlight colors for the Morgan bits indicate: 1893 1894 - blue: the central atom in the environment 1895 - yellow: aromatic atoms 1896 - gray: aliphatic ring atoms 1897 1898The default highlight colors for the RDKit bits indicate: 1899 1900 - yellow: aromatic atoms 1901 1902Note that in cases where the same bit is set by multiple atoms in the molecule 1903(as for bit 1553 for the RDKit fingerprint in the example above), the drawing 1904functions will display the first example. You can change this by specifying which 1905example to show: 1906 1907.. doctest:: 1908 1909 >>> rdk_svg = Draw.DrawRDKitBit(mol, 1553, rdkbi, whichExample=1, useSVG=True) 1910 1911Producing this image: 1912 1913+-------------------------------------+ 1914| .. image:: images/rdk_bit1553_2.svg | 1915+-------------------------------------+ 1916| RDKit bit | 1917+-------------------------------------+ 1918 1919 1920Picking Diverse Molecules Using Fingerprints 1921============================================ 1922 1923A common task is to pick a small subset of diverse molecules from a larger set. 1924The RDKit provides a number of approaches for doing this in the 1925:py:mod:`rdkit.SimDivFilters` module. The most efficient of these uses the 1926MaxMin algorithm. [#ashton]_ Here's an example: 1927 1928Start by reading in a set of molecules and generating Morgan fingerprints: 1929 1930.. doctest:: 1931 1932 >>> from rdkit import Chem 1933 >>> from rdkit.Chem.rdMolDescriptors import GetMorganFingerprint 1934 >>> from rdkit import DataStructs 1935 >>> from rdkit.SimDivFilters.rdSimDivPickers import MaxMinPicker 1936 >>> with Chem.SDMolSupplier('data/actives_5ht3.sdf') as suppl: 1937 ... ms = [x for x in suppl if x is not None] 1938 >>> fps = [GetMorganFingerprint(x,3) for x in ms] 1939 >>> nfps = len(fps) 1940 1941The algorithm requires a function to calculate distances between 1942objects, we'll do that using DiceSimilarity: 1943 1944.. doctest:: 1945 1946 >>> def distij(i,j,fps=fps): 1947 ... return 1-DataStructs.DiceSimilarity(fps[i],fps[j]) 1948 1949Now create a picker and grab a set of 10 diverse molecules: 1950 1951.. doctest:: 1952 1953 >>> picker = MaxMinPicker() 1954 >>> pickIndices = picker.LazyPick(distij,nfps,10,seed=23) 1955 >>> list(pickIndices) 1956 [93, 109, 154, 6, 95, 135, 151, 61, 137, 139] 1957 1958Note that the picker just returns indices of the fingerprints; we can 1959get the molecules themselves as follows: 1960 1961.. doctest:: 1962 1963 >>> picks = [ms[x] for x in pickIndices] 1964 1965Generating Similarity Maps Using Fingerprints 1966============================================= 1967 1968Similarity maps are a way to visualize the atomic contributions to 1969the similarity between a molecule and a reference molecule. The 1970methodology is described in Ref. [#riniker]_ . 1971They are in the :py:mod:`rdkit.Chem.Draw.SimilarityMaps` module : 1972 1973Start by creating two molecules: 1974 1975.. doctest:: 1976 1977 >>> from rdkit import Chem 1978 >>> mol = Chem.MolFromSmiles('COc1cccc2cc(C(=O)NCCCCN3CCN(c4cccc5nccnc54)CC3)oc21') 1979 >>> refmol = Chem.MolFromSmiles('CCCN(CCCCN1CCN(c2ccccc2OC)CC1)Cc1ccc2ccccc2c1') 1980 1981The SimilarityMaps module supports three kind of fingerprints: 1982atom pairs, topological torsions and Morgan fingerprints. 1983 1984.. doctest:: 1985 1986 >>> from rdkit.Chem import Draw 1987 >>> from rdkit.Chem.Draw import SimilarityMaps 1988 >>> fp = SimilarityMaps.GetAPFingerprint(mol, fpType='normal') 1989 >>> fp = SimilarityMaps.GetTTFingerprint(mol, fpType='normal') 1990 >>> fp = SimilarityMaps.GetMorganFingerprint(mol, fpType='bv') 1991 1992The types of atom pairs and torsions are normal (default), hashed and bit vector (bv). 1993The types of the Morgan fingerprint are bit vector (bv, default) and count vector (count). 1994 1995The function generating a similarity map for two fingerprints requires the 1996specification of the fingerprint function and optionally the similarity metric. 1997The default for the latter is the Dice similarity. Using all the default arguments 1998of the Morgan fingerprint function, the similarity map can be generated like this: 1999 2000.. doctest:: 2001 2002 >>> fig, maxweight = SimilarityMaps.GetSimilarityMapForFingerprint(refmol, mol, SimilarityMaps.GetMorganFingerprint) 2003 2004Producing this image: 2005 2006.. image:: images/similarity_map_fp1.png 2007 2008For a different type of Morgan (e.g. count) and radius = 1 instead of 2, as well as a different 2009similarity metric (e.g. Tanimoto), the call becomes: 2010 2011.. doctest:: 2012 2013 >>> from rdkit import DataStructs 2014 >>> fig, maxweight = SimilarityMaps.GetSimilarityMapForFingerprint(refmol, mol, lambda m,idx: SimilarityMaps.GetMorganFingerprint(m, atomId=idx, radius=1, fpType='count'), metric=DataStructs.TanimotoSimilarity) 2015 2016Producing this image: 2017 2018.. image:: images/similarity_map_fp2.png 2019 2020The convenience function GetSimilarityMapForFingerprint involves the normalisation 2021of the atomic weights such that the maximum absolute weight is 1. Therefore, the 2022function outputs the maximum weight that was found when creating the map. 2023 2024.. doctest:: 2025 2026 >>> print(maxweight) 2027 0.05747... 2028 2029If one does not want the normalisation step, the map can be created like: 2030 2031.. doctest:: 2032 2033 >>> weights = SimilarityMaps.GetAtomicWeightsForFingerprint(refmol, mol, SimilarityMaps.GetMorganFingerprint) 2034 >>> print(["%.2f " % w for w in weights]) 2035 ['0.05 ', ... 2036 >>> fig = SimilarityMaps.GetSimilarityMapFromWeights(mol, weights) 2037 2038Producing this image: 2039 2040.. image:: images/similarity_map_fp3.png 2041 2042 2043Descriptor Calculation 2044********************** 2045 2046A variety of descriptors are available within the RDKit. 2047The complete list is provided in `List of Available Descriptors`_. 2048 2049Most of the descriptors are straightforward to use from Python via the 2050centralized :py:mod:`rdkit.Chem.Descriptors` module : 2051 2052.. doctest:: 2053 2054 >>> from rdkit.Chem import Descriptors 2055 >>> m = Chem.MolFromSmiles('c1ccccc1C(=O)O') 2056 >>> Descriptors.TPSA(m) 2057 37.3 2058 >>> Descriptors.MolLogP(m) 2059 1.3848 2060 2061Partial charges are handled a bit differently: 2062 2063.. doctest:: 2064 2065 >>> m = Chem.MolFromSmiles('c1ccccc1C(=O)O') 2066 >>> AllChem.ComputeGasteigerCharges(m) 2067 >>> m.GetAtomWithIdx(0).GetDoubleProp('_GasteigerCharge') 2068 -0.047... 2069 2070 2071Visualization of Descriptors 2072============================ 2073 2074Similarity maps can be used to visualize descriptors that can be divided into 2075atomic contributions. 2076 2077The Gasteiger partial charges can be visualized as (using a different color scheme): 2078 2079.. doctest:: 2080 2081 >>> from rdkit.Chem.Draw import SimilarityMaps 2082 >>> mol = Chem.MolFromSmiles('COc1cccc2cc(C(=O)NCCCCN3CCN(c4cccc5nccnc54)CC3)oc21') 2083 >>> AllChem.ComputeGasteigerCharges(mol) 2084 >>> contribs = [mol.GetAtomWithIdx(i).GetDoubleProp('_GasteigerCharge') for i in range(mol.GetNumAtoms())] 2085 >>> fig = SimilarityMaps.GetSimilarityMapFromWeights(mol, contribs, colorMap='jet', contourLines=10) 2086 2087Producing this image: 2088 2089.. image:: images/similarity_map_charges.png 2090 2091Or for the Crippen contributions to logP: 2092 2093.. doctest:: 2094 2095 >>> from rdkit.Chem import rdMolDescriptors 2096 >>> contribs = rdMolDescriptors._CalcCrippenContribs(mol) 2097 >>> fig = SimilarityMaps.GetSimilarityMapFromWeights(mol,[x for x,y in contribs], colorMap='jet', contourLines=10) 2098 2099Producing this image: 2100 2101.. image:: images/similarity_map_crippen.png 2102 2103Chemical Reactions 2104****************** 2105 2106The RDKit also supports applying chemical reactions to sets of 2107molecules. One way of constructing chemical reactions is to use a 2108SMARTS-based language similar to Daylight's Reaction SMILES 2109[#rxnsmarts]_: 2110 2111.. doctest:: 2112 2113 >>> rxn = AllChem.ReactionFromSmarts('[C:1](=[O:2])-[OD1].[N!H0:3]>>[C:1](=[O:2])[N:3]') 2114 >>> rxn 2115 <rdkit.Chem.rdChemReactions.ChemicalReaction object at 0x...> 2116 >>> rxn.GetNumProductTemplates() 2117 1 2118 >>> ps = rxn.RunReactants((Chem.MolFromSmiles('CC(=O)O'),Chem.MolFromSmiles('NC'))) 2119 >>> len(ps) # one entry for each possible set of products 2120 1 2121 >>> len(ps[0]) # each entry contains one molecule for each product 2122 1 2123 >>> Chem.MolToSmiles(ps[0][0]) 2124 'CNC(C)=O' 2125 >>> ps = rxn.RunReactants((Chem.MolFromSmiles('C(COC(=O)O)C(=O)O'),Chem.MolFromSmiles('NC'))) 2126 >>> len(ps) 2127 2 2128 >>> Chem.MolToSmiles(ps[0][0]) 2129 'CNC(=O)OCCC(=O)O' 2130 >>> Chem.MolToSmiles(ps[1][0]) 2131 'CNC(=O)CCOC(=O)O' 2132 2133Reactions can also be built from MDL rxn files: 2134 2135.. doctest:: 2136 2137 >>> rxn = AllChem.ReactionFromRxnFile('data/AmideBond.rxn') 2138 >>> rxn.GetNumReactantTemplates() 2139 2 2140 >>> rxn.GetNumProductTemplates() 2141 1 2142 >>> ps = rxn.RunReactants((Chem.MolFromSmiles('CC(=O)O'), Chem.MolFromSmiles('NC'))) 2143 >>> len(ps) 2144 1 2145 >>> Chem.MolToSmiles(ps[0][0]) 2146 'CNC(C)=O' 2147 2148It is, of course, possible to do reactions more complex than amide 2149bond formation: 2150 2151.. doctest:: 2152 2153 >>> rxn = AllChem.ReactionFromSmarts('[C:1]=[C:2].[C:3]=[*:4][*:5]=[C:6]>>[C:1]1[C:2][C:3][*:4]=[*:5][C:6]1') 2154 >>> ps = rxn.RunReactants((Chem.MolFromSmiles('OC=C'), Chem.MolFromSmiles('C=CC(N)=C'))) 2155 >>> Chem.MolToSmiles(ps[0][0]) 2156 'NC1=CCCC(O)C1' 2157 2158Note in this case that there are multiple mappings of the reactants 2159onto the templates, so we have multiple product sets: 2160 2161.. doctest:: 2162 2163 >>> len(ps) 2164 4 2165 2166You can use canonical smiles and a python dictionary to get the unique products: 2167 2168.. doctest:: 2169 2170 >>> uniqps = {} 2171 >>> for p in ps: 2172 ... smi = Chem.MolToSmiles(p[0]) 2173 ... uniqps[smi] = p[0] 2174 ... 2175 >>> sorted(uniqps.keys()) 2176 ['NC1=CCC(O)CC1', 'NC1=CCCC(O)C1'] 2177 2178Note that the molecules that are produced by the chemical reaction 2179processing code are not sanitized, as this artificial reaction 2180demonstrates: 2181 2182.. doctest:: 2183 2184 >>> rxn = AllChem.ReactionFromSmarts('[C:1]=[C:2][C:3]=[C:4].[C:5]=[C:6]>>[C:1]1=[C:2][C:3]=[C:4][C:5]=[C:6]1') 2185 >>> ps = rxn.RunReactants((Chem.MolFromSmiles('C=CC=C'), Chem.MolFromSmiles('C=C'))) 2186 >>> Chem.MolToSmiles(ps[0][0]) 2187 'C1=CC=CC=C1' 2188 >>> p0 = ps[0][0] 2189 >>> Chem.SanitizeMol(p0) 2190 rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE 2191 >>> Chem.MolToSmiles(p0) 2192 'c1ccccc1' 2193 2194Drawing Chemical Reactions 2195========================== 2196 2197The RDKit's MolDraw2D-based rendering can also handle chemical reactions. 2198 2199.. doctest:: 2200 2201 >>> from rdkit.Chem import Draw 2202 >>> rxn = AllChem.ReactionFromSmarts('[cH:5]1[cH:6][c:7]2[cH:8][n:9][cH:10][cH:11][c:12]2[c:3]([cH:4]1)[C:2](=[O:1])O.[N-:13]=[N+:14]=[N-:15]>C(Cl)Cl.C(=O)(C(=O)Cl)Cl>[cH:5]1[cH:6][c:7]2[cH:8][n:9][cH:10][cH:11][c:12]2[c:3]([cH:4]1)[C:2](=[O:1])[N:13]=[N+:14]=[N-:15]',useSmiles=True) 2203 >>> d2d = Draw.MolDraw2DCairo(800,300) 2204 >>> d2d.DrawReaction(rxn) 2205 >>> png = d2d.GetDrawingText() 2206 >>> open('./images/reaction1.o.png','wb+').write(png) # doctest: +SKIP 2207 2208the result looks like this: 2209 2210.. image:: images/reaction1.png 2211 2212There's another drawing mode which leaves out the atom map information but which 2213highlights which of the reactants atoms in the products come from: 2214 2215.. doctest:: 2216 2217 >>> d2d = Draw.MolDraw2DCairo(800,300) 2218 >>> d2d.DrawReaction(rxn,highlightByReactant=True) 2219 >>> png = d2d.GetDrawingText() 2220 >>> open('./images/reaction1_highlight.o.png','wb+').write(png) # doctest: +SKIP 2221 2222.. image:: images/reaction1_highlight.png 2223 2224As of the 2020.09 release, PNG images of reactions include metadata allowing the 2225reaction to be reconstructed: 2226 2227.. doctest:: 2228 2229 >>> newRxn = AllChem.ReactionFromPNGString(png) 2230 >>> AllChem.ReactionToSmarts(newRxn) 2231 '[#6H:5]1:[#6H:6]:[#6:7]2:[#6H:8]:[#7:9]:[#6H:10]:[#6H:11]:[#6:12]:2:[#6:3](:[#6H:4]:1)-[#6:2](=[#8:1])-[#8].[#7-:13]=[#7+:14]=[#7-:15]>[#6](-[#17])-[#17].[#6](=[#8])(-[#6](=[#8])-[#17])-[#17]>[#6H:5]1:[#6H:6]:[#6:7]2:[#6H:8]:[#7:9]:[#6H:10]:[#6H:11]:[#6:12]:2:[#6:3](:[#6H:4]:1)-[#6:2](=[#8:1])-[#7:13]=[#7+:14]=[#7-:15]' 2232 2233Advanced Reaction Functionality 2234=============================== 2235 2236Protecting Atoms 2237---------------- 2238 2239Sometimes, particularly when working with rxn files, it is difficult 2240to express a reaction exactly enough to not end up with extraneous 2241products. The RDKit provides a method of "protecting" atoms to 2242disallow them from taking part in reactions. 2243 2244This can be demonstrated re-using the amide-bond formation reaction used 2245above. The query for amines isn't specific enough, so it matches any 2246nitrogen that has at least one H attached. So if we apply the reaction 2247to a molecule that already has an amide bond, the amide N is also 2248treated as a reaction site: 2249 2250.. doctest:: 2251 2252 >>> rxn = AllChem.ReactionFromRxnFile('data/AmideBond.rxn') 2253 >>> acid = Chem.MolFromSmiles('CC(=O)O') 2254 >>> base = Chem.MolFromSmiles('CC(=O)NCCN') 2255 >>> ps = rxn.RunReactants((acid,base)) 2256 >>> len(ps) 2257 2 2258 >>> Chem.MolToSmiles(ps[0][0]) 2259 'CC(=O)N(CCN)C(C)=O' 2260 >>> Chem.MolToSmiles(ps[1][0]) 2261 'CC(=O)NCCNC(C)=O' 2262 2263The first product corresponds to the reaction at the amide N. 2264 2265We can prevent this from happening by protecting all amide Ns. Here we 2266do it with a substructure query that matches amides and thioamides and 2267then set the "_protected" property on matching atoms: 2268 2269.. doctest:: 2270 2271 >>> amidep = Chem.MolFromSmarts('[N;$(NC=[O,S])]') 2272 >>> for match in base.GetSubstructMatches(amidep): 2273 ... base.GetAtomWithIdx(match[0]).SetProp('_protected','1') 2274 2275 2276Now the reaction only generates a single product: 2277 2278.. doctest:: 2279 2280 >>> ps = rxn.RunReactants((acid,base)) 2281 >>> len(ps) 2282 1 2283 >>> Chem.MolToSmiles(ps[0][0]) 2284 'CC(=O)NCCNC(C)=O' 2285 2286 2287Recap Implementation 2288==================== 2289 2290Associated with the chemical reaction functionality is an 2291implementation of the Recap algorithm. [#lewell]_ Recap uses a set of 2292chemical transformations mimicking common reactions carried out in the 2293lab in order to decompose a molecule into a series of reasonable 2294fragments. 2295 2296The RDKit :py:mod:`rdkit.Chem.Recap` implementation keeps track of the hierarchy of 2297transformations that were applied: 2298 2299.. doctest:: 2300 2301 >>> from rdkit import Chem 2302 >>> from rdkit.Chem import Recap 2303 >>> m = Chem.MolFromSmiles('c1ccccc1OCCOC(=O)CC') 2304 >>> hierarch = Recap.RecapDecompose(m) 2305 >>> type(hierarch) 2306 <class 'rdkit.Chem.Recap.RecapHierarchyNode'> 2307 2308The hierarchy is rooted at the original molecule: 2309 2310.. doctest:: 2311 2312 >>> hierarch.smiles 2313 'CCC(=O)OCCOc1ccccc1' 2314 2315and each node tracks its children using a dictionary keyed by SMILES: 2316 2317.. doctest:: 2318 2319 >>> ks=hierarch.children.keys() 2320 >>> sorted(ks) 2321 ['*C(=O)CC', '*CCOC(=O)CC', '*CCOc1ccccc1', '*OCCOc1ccccc1', '*c1ccccc1'] 2322 2323The nodes at the bottom of the hierarchy (the leaf nodes) are easily 2324accessible, also as a dictionary keyed by SMILES: 2325 2326.. doctest:: 2327 2328 >>> ks=hierarch.GetLeaves().keys() 2329 >>> ks=sorted(ks) 2330 >>> ks 2331 ['*C(=O)CC', '*CCO*', '*CCOc1ccccc1', '*c1ccccc1'] 2332 2333Notice that dummy atoms are used to mark points where the molecule was fragmented. 2334 2335The nodes themselves have associated molecules: 2336 2337.. doctest:: 2338 2339 >>> leaf = hierarch.GetLeaves()[ks[0]] 2340 >>> Chem.MolToSmiles(leaf.mol) 2341 '*C(=O)CC' 2342 2343 2344BRICS Implementation 2345==================== 2346 2347The RDKit also provides an implementation of the BRICS 2348algorithm. [#degen]_ BRICS provides another 2349method for fragmenting molecules along synthetically accessible bonds: 2350 2351.. doctest:: 2352 2353 >>> from rdkit.Chem import BRICS 2354 >>> with Chem.SDMolSupplier('data/cdk2.sdf') as cdk2mols: 2355 ... m1 = cdk2mols[0] 2356 ... m2 = cdk2mols[20] 2357 >>> sorted(BRICS.BRICSDecompose(m1)) 2358 ['[14*]c1nc(N)nc2[nH]cnc12', '[3*]O[3*]', '[4*]CC(=O)C(C)C'] 2359 >>> sorted(BRICS.BRICSDecompose(m2)) 2360 ['[1*]C(=O)NN(C)C', '[14*]c1[nH]nc2c1C(=O)c1c([16*])cccc1-2', '[16*]c1ccc([16*])cc1', '[3*]OC', '[5*]N[5*]'] 2361 2362Notice that RDKit BRICS implementation returns the unique fragments 2363generated from a molecule and that the dummy atoms are tagged to 2364indicate which type of reaction applies. 2365 2366It's quite easy to generate the list of all fragments for a 2367group of molecules: 2368 2369.. doctest:: 2370 2371 >>> allfrags=set() 2372 >>> with Chem.SDMolSupplier('data/cdk2.sdf') as cdk2mols: 2373 ... for m in cdk2mols: 2374 ... if m is None: 2375 ... continue 2376 ... pieces = BRICS.BRICSDecompose(m) 2377 ... allfrags.update(pieces) 2378 >>> len(allfrags) 2379 90 2380 >>> sorted(allfrags)[:5] 2381 ['NS(=O)(=O)c1ccc(N/N=C2\\C(=O)Nc3ccc(Br)cc32)cc1', '[1*]C(=O)C(C)C', '[1*]C(=O)NN(C)C', '[1*]C(=O)NN1CC[NH+](C)CC1', '[1*]C(C)=O'] 2382 2383The BRICS module also provides an option to apply the BRICS rules to a 2384set of fragments to create new molecules: 2385 2386.. doctest:: 2387 2388 >>> import random 2389 >>> random.seed(127) 2390 >>> fragms = [Chem.MolFromSmiles(x) for x in sorted(allfrags)] 2391 >>> random.seed(0xf00d) 2392 >>> ms = BRICS.BRICSBuild(fragms) 2393 2394The result is a generator object: 2395 2396.. doctest:: 2397 2398 >>> ms 2399 <generator object BRICSBuild at 0x...> 2400 2401That returns molecules on request: 2402 2403.. doctest:: 2404 2405 >>> prods = [next(ms) for x in range(10)] 2406 >>> prods[0] 2407 <rdkit.Chem.rdchem.Mol object at 0x...> 2408 2409The molecules have not been sanitized, so it's a good idea to at least update the valences before continuing: 2410 2411.. doctest:: 2412 2413 >>> for prod in prods: 2414 ... prod.UpdatePropertyCache(strict=False) 2415 ... 2416 >>> Chem.MolToSmiles(prods[0],True) 2417 'CC(C)C(=O)N/C=C1\\C(=O)Nc2ccc3ncsc3c21' 2418 >>> Chem.MolToSmiles(prods[1],True) 2419 'CC(C)C(=O)N/C=C1\\C(=O)Nc2ccccc21' 2420 >>> Chem.MolToSmiles(prods[2],True) 2421 'CNC(=O)C(C)C' 2422 2423 2424By default those results come back in a random order (technically the example 2425above will always return the same results since we seeded Python's random number 2426generator just before calling BRICSBuild()). If you want the results to be 2427returned in a consistent order use the scrambleReagents argument: 2428 2429 >>> ms = BRICS.BRICSBuild(fragms, scrambleReagents=False) 2430 >>> prods = [next(ms) for x in range(10)] 2431 >>> for prod in prods: 2432 ... prod.UpdatePropertyCache(strict=False) 2433 ... 2434 >>> Chem.MolToSmiles(prods[0],True) 2435 'COC(=O)C(C)C' 2436 >>> Chem.MolToSmiles(prods[1],True) 2437 'CNC(=O)C(C)C' 2438 >>> Chem.MolToSmiles(prods[2],True) 2439 'CC(C)C(=O)NC(=N)N' 2440 2441Other fragmentation approaches 2442============================== 2443 2444In addition to the methods described above, the RDKit provide a very 2445flexible generic function for fragmenting molecules along 2446user-specified bonds. 2447 2448Here's a quick demonstration of using that to break all bonds between 2449atoms in rings and atoms not in rings. We start by finding all the 2450atom pairs: 2451 2452.. doctest:: 2453 2454 >>> m = Chem.MolFromSmiles('CC1CC(O)C1CCC1CC1') 2455 >>> bis = m.GetSubstructMatches(Chem.MolFromSmarts('[!R][R]')) 2456 >>> bis 2457 ((0, 1), (4, 3), (6, 5), (7, 8)) 2458 2459then we get the corresponding bond indices: 2460 2461.. doctest:: 2462 2463 >>> bs = [m.GetBondBetweenAtoms(x,y).GetIdx() for x,y in bis] 2464 >>> bs 2465 [0, 3, 5, 7] 2466 2467then we use those bond indices as input to the fragmentation function: 2468 2469.. doctest:: 2470 2471 >>> nm = Chem.FragmentOnBonds(m,bs) 2472 2473the output is a molecule that has dummy atoms marking the places where 2474bonds were broken: 2475 2476.. doctest:: 2477 2478 >>> Chem.MolToSmiles(nm,True) 2479 '*C1CC([4*])C1[6*].[1*]C.[3*]O.[5*]CC[8*].[7*]C1CC1' 2480 2481By default the attachment points are labelled (using isotopes) with 2482the index of the atom that was removed. We can also provide our own set of 2483atom labels in the form of pairs of unsigned integers. The first value 2484in each pair is used as the label for the dummy that replaces the 2485bond's begin atom, the second value in each pair is for the dummy that 2486replaces the bond's end atom. Here's an example, repeating the 2487analysis above and marking the positions where the non-ring atoms were 2488with the label 10 and marking the positions where the ring atoms were 2489with label 1: 2490 2491.. doctest:: 2492 2493 >>> bis = m.GetSubstructMatches(Chem.MolFromSmarts('[!R][R]')) 2494 >>> bs = [] 2495 >>> labels=[] 2496 >>> for bi in bis: 2497 ... b = m.GetBondBetweenAtoms(bi[0],bi[1]) 2498 ... if b.GetBeginAtomIdx()==bi[0]: 2499 ... labels.append((10,1)) 2500 ... else: 2501 ... labels.append((1,10)) 2502 ... bs.append(b.GetIdx()) 2503 >>> nm = Chem.FragmentOnBonds(m,bs,dummyLabels=labels) 2504 >>> Chem.MolToSmiles(nm,True) 2505 '[1*]C.[1*]CC[1*].[1*]O.[10*]C1CC([10*])C1[10*].[10*]C1CC1' 2506 2507 2508Chemical Features and Pharmacophores 2509************************************ 2510 2511 2512Chemical Features 2513================= 2514 2515Chemical features in the RDKit are defined using a SMARTS-based feature 2516definition language (described in detail in the RDKit book). To identify 2517chemical features in molecules, you first must build a feature factory: 2518 2519.. doctest:: 2520 2521 >>> from rdkit import Chem 2522 >>> from rdkit.Chem import ChemicalFeatures 2523 >>> from rdkit import RDConfig 2524 >>> import os 2525 >>> fdefName = os.path.join(RDConfig.RDDataDir,'BaseFeatures.fdef') 2526 >>> factory = ChemicalFeatures.BuildFeatureFactory(fdefName) 2527 2528and then use the factory to search for features: 2529 2530.. doctest:: 2531 2532 >>> m = Chem.MolFromSmiles('OCc1ccccc1CN') 2533 >>> feats = factory.GetFeaturesForMol(m) 2534 >>> len(feats) 2535 8 2536 2537The individual features carry information about their family (e.g. donor, 2538acceptor, etc.), type (a more detailed description), and the atom(s) that is/are 2539associated with the feature: 2540 2541.. doctest:: 2542 2543 >>> feats[0].GetFamily() 2544 'Donor' 2545 >>> feats[0].GetType() 2546 'SingleAtomDonor' 2547 >>> feats[0].GetAtomIds() 2548 (0,) 2549 >>> feats[4].GetFamily() 2550 'Aromatic' 2551 >>> feats[4].GetAtomIds() 2552 (2, 3, 4, 5, 6, 7) 2553 2554If the molecule has coordinates, then the features will also have reasonable locations: 2555 2556.. doctest:: 2557 2558 >>> from rdkit.Chem import AllChem 2559 >>> AllChem.Compute2DCoords(m) 2560 0 2561 >>> feats[0].GetPos() 2562 <rdkit.Geometry.rdGeometry.Point3D object at 0x...> 2563 >>> list(feats[0].GetPos()) 2564 [2.07..., -2.335..., 0.0] 2565 2566 25672D Pharmacophore Fingerprints 2568============================= 2569 2570Combining a set of chemical features with the 2D (topological) 2571distances between them gives a 2D pharmacophore. When the distances 2572are binned, unique integer ids can be assigned to each of these 2573pharmacophores and they can be stored in a fingerprint. Details of 2574the encoding are in the :doc:`RDKit_Book`. 2575 2576Generating pharmacophore fingerprints requires chemical features 2577generated via the usual RDKit feature-typing mechanism: 2578 2579.. doctest:: 2580 2581 >>> from rdkit import Chem 2582 >>> from rdkit.Chem import ChemicalFeatures 2583 >>> fdefName = 'data/MinimalFeatures.fdef' 2584 >>> featFactory = ChemicalFeatures.BuildFeatureFactory(fdefName) 2585 2586The fingerprints themselves are calculated using a signature 2587(fingerprint) factory, which keeps track of all the parameters 2588required to generate the pharmacophore: 2589 2590.. doctest:: 2591 2592 >>> from rdkit.Chem.Pharm2D.SigFactory import SigFactory 2593 >>> sigFactory = SigFactory(featFactory,minPointCount=2,maxPointCount=3) 2594 >>> sigFactory.SetBins([(0,2),(2,5),(5,8)]) 2595 >>> sigFactory.Init() 2596 >>> sigFactory.GetSigSize() 2597 885 2598 2599The signature factory is now ready to be used to generate 2600fingerprints, a task which is done using the 2601:py:mod:`rdkit.Chem.Pharm2D.Generate` module: 2602 2603.. doctest:: 2604 2605 >>> from rdkit.Chem.Pharm2D import Generate 2606 >>> mol = Chem.MolFromSmiles('OCC(=O)CCCN') 2607 >>> fp = Generate.Gen2DFingerprint(mol,sigFactory) 2608 >>> fp 2609 <rdkit.DataStructs.cDataStructs.SparseBitVect object at 0x...> 2610 >>> len(fp) 2611 885 2612 >>> fp.GetNumOnBits() 2613 57 2614 2615Details about the bits themselves, including the features that are 2616involved and the binned distance matrix between the features, can be 2617obtained from the signature factory: 2618 2619.. doctest:: 2620 2621 >>> list(fp.GetOnBits())[:5] 2622 [1, 2, 6, 7, 8] 2623 >>> sigFactory.GetBitDescription(1) 2624 'Acceptor Acceptor |0 1|1 0|' 2625 >>> sigFactory.GetBitDescription(2) 2626 'Acceptor Acceptor |0 2|2 0|' 2627 >>> sigFactory.GetBitDescription(8) 2628 'Acceptor Donor |0 2|2 0|' 2629 >>> list(fp.GetOnBits())[-5:] 2630 [704, 706, 707, 708, 714] 2631 >>> sigFactory.GetBitDescription(707) 2632 'Donor Donor PosIonizable |0 1 2|1 0 1|2 1 0|' 2633 >>> sigFactory.GetBitDescription(714) 2634 'Donor Donor PosIonizable |0 2 2|2 0 0|2 0 0|' 2635 2636For the sake of convenience (to save you from having to edit the fdef 2637file every time) it is possible to disable particular feature types 2638within the SigFactory: 2639 2640.. doctest:: 2641 2642 >>> sigFactory.skipFeats=['PosIonizable'] 2643 >>> sigFactory.Init() 2644 >>> sigFactory.GetSigSize() 2645 510 2646 >>> fp2 = Generate.Gen2DFingerprint(mol,sigFactory) 2647 >>> fp2.GetNumOnBits() 2648 36 2649 2650Another possible set of feature definitions for 2D pharmacophore 2651fingerprints in the RDKit are those published by Gobbi and 2652Poppinger. [#gobbi]_ The module 2653:py:mod:`rdkit.Chem.Pharm2D.Gobbi_Pharm2D` has a pre-configured signature 2654factory for these fingerprint types. Here's an example of using it: 2655 2656.. doctest:: 2657 2658 >>> from rdkit import Chem 2659 >>> from rdkit.Chem.Pharm2D import Gobbi_Pharm2D,Generate 2660 >>> m = Chem.MolFromSmiles('OCC=CC(=O)O') 2661 >>> fp = Generate.Gen2DFingerprint(m,Gobbi_Pharm2D.factory) 2662 >>> fp 2663 <rdkit.DataStructs.cDataStructs.SparseBitVect object at 0x...> 2664 >>> fp.GetNumOnBits() 2665 8 2666 >>> list(fp.GetOnBits()) 2667 [23, 30, 150, 154, 157, 185, 28878, 30184] 2668 >>> Gobbi_Pharm2D.factory.GetBitDescription(157) 2669 'HA HD |0 3|3 0|' 2670 >>> Gobbi_Pharm2D.factory.GetBitDescription(30184) 2671 'HA HD HD |0 3 0|3 0 3|0 3 0|' 2672 2673 2674Molecular Fragments 2675******************* 2676 2677The RDKit contains a collection of tools for fragmenting molecules and 2678working with those fragments. Fragments are defined to be made up of 2679a set of connected atoms that may have associated functional groups. 2680This is more easily demonstrated than explained: 2681 2682.. doctest:: 2683 2684 >>> fName=os.path.join(RDConfig.RDDataDir,'FunctionalGroups.txt') 2685 >>> from rdkit.Chem import FragmentCatalog 2686 >>> fparams = FragmentCatalog.FragCatParams(1,6,fName) 2687 >>> fparams.GetNumFuncGroups() 2688 39 2689 >>> fcat=FragmentCatalog.FragCatalog(fparams) 2690 >>> fcgen=FragmentCatalog.FragCatGenerator() 2691 >>> m = Chem.MolFromSmiles('OCC=CC(=O)O') 2692 >>> fcgen.AddFragsFromMol(m,fcat) 2693 3 2694 >>> fcat.GetEntryDescription(0) 2695 'C<-O>C' 2696 >>> fcat.GetEntryDescription(1) 2697 'C=C<-C(=O)O>' 2698 >>> fcat.GetEntryDescription(2) 2699 'C<-C(=O)O>=CC<-O>' 2700 2701The fragments are stored as entries in a 2702:py:class:`rdkit.Chem.rdfragcatalog.FragCatalog`. Notice that the 2703entry descriptions include pieces in angular brackets (e.g. between 2704'<' and '>'). These describe the functional groups attached to the 2705fragment. For example, in the above example, the catalog entry 0 2706corresponds to an ethyl fragment with an alcohol attached to one of 2707the carbons and entry 1 is an ethylene with a carboxylic acid on one 2708carbon. Detailed information about the functional groups can be 2709obtained by asking the fragment for the ids of the functional groups 2710it contains and then looking those ids up in the 2711:py:class:`rdkit.Chem.rdfragcatalog.FragCatParams` 2712object: 2713 2714.. doctest:: 2715 2716 >>> list(fcat.GetEntryFuncGroupIds(2)) 2717 [34, 1] 2718 >>> fparams.GetFuncGroup(1) 2719 <rdkit.Chem.rdchem.Mol object at 0x...> 2720 >>> Chem.MolToSmarts(fparams.GetFuncGroup(1)) 2721 '*-C(=O)[O&D1]' 2722 >>> Chem.MolToSmarts(fparams.GetFuncGroup(34)) 2723 '*-[O&D1]' 2724 >>> fparams.GetFuncGroup(1).GetProp('_Name') 2725 '-C(=O)O' 2726 >>> fparams.GetFuncGroup(34).GetProp('_Name') 2727 '-O' 2728 2729The catalog is hierarchical: smaller fragments are combined to form 2730larger ones. From a small fragment, one can find the larger fragments 2731to which it contributes using the 2732:py:meth:`rdkit.Chem.rdfragcatalog.FragCatalog.GetEntryDownIds` 2733method: 2734 2735.. doctest:: 2736 2737 >>> fcat=FragmentCatalog.FragCatalog(fparams) 2738 >>> m = Chem.MolFromSmiles('OCC(NC1CC1)CCC') 2739 >>> fcgen.AddFragsFromMol(m,fcat) 2740 15 2741 >>> fcat.GetEntryDescription(0) 2742 'C<-O>C' 2743 >>> fcat.GetEntryDescription(1) 2744 'CN<-cPropyl>' 2745 >>> list(fcat.GetEntryDownIds(0)) 2746 [3, 4] 2747 >>> fcat.GetEntryDescription(3) 2748 'C<-O>CC' 2749 >>> fcat.GetEntryDescription(4) 2750 'C<-O>CN<-cPropyl>' 2751 2752The fragments from multiple molecules can be added to a catalog: 2753 2754.. doctest:: 2755 2756 >>> with Chem.SmilesMolSupplier('data/bzr.smi') as suppl: 2757 ... ms = [x for x in suppl] 2758 >>> fcat=FragmentCatalog.FragCatalog(fparams) 2759 >>> for m in ms: nAdded=fcgen.AddFragsFromMol(m,fcat) 2760 >>> fcat.GetNumEntries() 2761 1169 2762 >>> fcat.GetEntryDescription(0) 2763 'Cc' 2764 >>> fcat.GetEntryDescription(100) 2765 'cc-nc(C)n' 2766 2767The fragments in a catalog are unique, so adding a molecule a second 2768time doesn't add any new entries: 2769 2770.. doctest:: 2771 2772 >>> fcgen.AddFragsFromMol(ms[0],fcat) 2773 0 2774 >>> fcat.GetNumEntries() 2775 1169 2776 2777Once a :py:class:`rdkit.Chem.rdfragcatalog.FragCatalog` has been 2778generated, it can be used to fingerprint molecules: 2779 2780.. doctest:: 2781 2782 >>> fpgen = FragmentCatalog.FragFPGenerator() 2783 >>> fp = fpgen.GetFPForMol(ms[8],fcat) 2784 >>> fp 2785 <rdkit.DataStructs.cDataStructs.ExplicitBitVect object at 0x...> 2786 >>> fp.GetNumOnBits() 2787 189 2788 2789The rest of the machinery associated with fingerprints can now be 2790applied to these fragment fingerprints. For example, it's easy to 2791find the fragments that two molecules have in common by taking the 2792intersection of their fingerprints: 2793 2794.. doctest:: 2795 2796 >>> fp2 = fpgen.GetFPForMol(ms[7],fcat) 2797 >>> andfp = fp&fp2 2798 >>> obl = list(andfp.GetOnBits()) 2799 >>> fcat.GetEntryDescription(obl[-1]) 2800 'ccc(cc)NC<=O>' 2801 >>> fcat.GetEntryDescription(obl[-5]) 2802 'c<-X>ccc(N)cc' 2803 2804or we can find the fragments that distinguish one molecule from 2805another: 2806 2807.. doctest:: 2808 2809 >>> combinedFp=fp&(fp^fp2) # can be more efficient than fp&(!fp2) 2810 >>> obl = list(combinedFp.GetOnBits()) 2811 >>> fcat.GetEntryDescription(obl[-1]) 2812 'cccc(N)cc' 2813 2814Or we can use the bit ranking functionality from the 2815:py:class:`rdkit.ML.InfoTheory.rdInfoTheory.InfoBitRanker` class to identify fragments 2816that distinguish actives from inactives: 2817 2818.. doctest:: 2819 2820 >>> with Chem.SDMolSupplier('data/bzr.sdf') as suppl: 2821 ... sdms = [x for x in suppl] 2822 >>> fps = [fpgen.GetFPForMol(x,fcat) for x in sdms] 2823 >>> from rdkit.ML.InfoTheory import InfoBitRanker 2824 >>> ranker = InfoBitRanker(len(fps[0]),2) 2825 >>> acts = [x.GetDoubleProp('ACTIVITY') for x in sdms] 2826 >>> for i,fp in enumerate(fps): 2827 ... act = int(acts[i]>7) 2828 ... ranker.AccumulateVotes(fp,act) 2829 ... 2830 >>> top5 = ranker.GetTopN(5) 2831 >>> for id,gain,n0,n1 in top5: 2832 ... print(int(id),'%.3f'%gain,int(n0),int(n1)) 2833 ... 2834 702 0.081 20 17 2835 328 0.073 23 25 2836 341 0.073 30 43 2837 173 0.073 30 43 2838 1034 0.069 5 53 2839 2840The columns above are: bitId, infoGain, nInactive, nActive. Note that 2841this approach isn't particularly effective for this artificial 2842example. 2843 2844 2845R-Group Decomposition 2846********************* 2847 2848Let's look at how it works. We'll read in a group of molecules (these were taken 2849ChEMBL), define a core with labelled R groups, and then use the simplest call to 2850do R-group decomposition: 2851:py:func:`rdkit.Chem.rdRGroupDecomposition.RGroupDecompose` 2852 2853.. doctest:: 2854 2855 >>> from rdkit import Chem 2856 >>> from rdkit.Chem import rdRGroupDecomposition as rdRGD 2857 >>> with Chem.SmilesMolSupplier('data/s1p_chembldoc89753.txt',delimiter=",", 2858 ... smilesColumn=9,nameColumn=10) as suppl: 2859 ... ms = [x for x in suppl if x is not None] 2860 >>> len(ms) 2861 40 2862 >>> core = Chem.MolFromSmarts('[*:1]c1nc([*:2])on1') 2863 >>> res,unmatched = rdRGD.RGroupDecompose([core],ms,asSmiles=True) 2864 >>> unmatched 2865 [] 2866 >>> len(res) 2867 40 2868 >>> res[:2] # doctest: +NORMALIZE_WHITESPACE 2869 [{'Core': 'n1oc([*:2])nc1[*:1]', 'R1': 'O=C(O)CCCC1NCCOc2c1cccc2[*:1]', 'R2': 'CC(C)Oc1ccc([*:2])cc1Cl'}, 2870 {'Core': 'n1oc([*:2])nc1[*:1]', 'R1': 'O=C(O)CCC1NCCOc2c1cccc2[*:1]', 'R2': 'CC(C)Oc1ccc([*:2])cc1Cl'}] 2871 2872The `unmatched` return value has the indices of the molecules that did not match 2873a core; in this case there are none. The other result is a list with one dict 2874for each molecule; each dict contains the core that matched the molecule (in 2875this case there was only one) and the molecule's R groups. 2876 2877As an aside, if you are a Pandas user, it's very easy to get the R-group 2878decomposition results into a DataFrame: 2879 2880.. doctest:: 2881 2882 >>> import pandas as pd 2883 >>> res,unmatched = rdRGD.RGroupDecompose([core],ms,asSmiles=True,asRows=False) 2884 >>> df= pd.DataFrame(res) 2885 >>> df.head() 2886 Core R1 R2 2887 0 n1oc([*:2])nc1[*:1] O=C(O)CCCC1NCCOc2c1cccc2[*:1] CC(C)Oc1ccc([*:2])cc1Cl 2888 1 n1oc([*:2])nc1[*:1] O=C(O)CCC1NCCOc2c1cccc2[*:1] CC(C)Oc1ccc([*:2])cc1Cl 2889 2 n1oc([*:2])nc1[*:1] O=C(O)CCC1COc2ccc([*:1])cc2CN1 CC(C)Oc1ccc([*:2])cc1Cl 2890 3 n1oc([*:2])nc1[*:1] O=C(O)CCCC1NCCOc2c1cccc2[*:1] CC(C)Oc1ncc([*:2])cc1Cl 2891 4 n1oc([*:2])nc1[*:1] O=C(O)CCCC1NCCOc2c1cccc2[*:1] CC(C)Oc1ncc([*:2])cc1Cl 2892 2893It's not necessary to label the attachment points on the core, if you leave them 2894out the code will automatically assign labels: 2895 2896.. doctest:: 2897 2898 >>> core2 = Chem.MolFromSmarts('c1ncon1') 2899 >>> res,unmatched = rdRGD.RGroupDecompose([core2],ms,asSmiles=True) 2900 >>> res[:2] # doctest: +NORMALIZE_WHITESPACE 2901 [{'Core': 'n1oc([*:1])nc1[*:2]', 'R1': 'CC(C)Oc1ccc([*:1])cc1Cl', 'R2': 'O=C(O)CCCC1NCCOc2c1cccc2[*:2]'}, 2902 {'Core': 'n1oc([*:1])nc1[*:2]', 'R1': 'CC(C)Oc1ccc([*:1])cc1Cl', 'R2': 'O=C(O)CCC1NCCOc2c1cccc2[*:2]'}] 2903 2904R-group decomposition is actually pretty complex, so there's a lot more there. 2905Hopefully this is enough to get you started. 2906 2907Non-Chemical Functionality 2908************************** 2909 2910 2911Bit vectors 2912=========== 2913 2914Bit vectors are containers for efficiently storing a set number of binary values, e.g. for fingerprints. 2915The RDKit includes two types of fingerprints differing in how they store the values internally; the two types are easily interconverted but are best used for different purpose: 2916 2917- SparseBitVects store only the list of bits set in the vector; they are well suited for storing very large, very sparsely occupied vectors like pharmacophore fingerprints. 2918 Some operations, such as retrieving the list of on bits, are quite fast. 2919 Others, such as negating the vector, are very, very slow. 2920 2921- ExplicitBitVects keep track of both on and off bits. 2922 They are generally faster than SparseBitVects, but require more memory to store. 2923 2924 2925Discrete value vectors 2926====================== 2927 2928 29293D grids 2930======== 2931 2932 2933Points 2934====== 2935 2936 2937Getting Help 2938************ 2939 2940There is a reasonable amount of documentation available within from the RDKit's docstrings. 2941These are accessible using Python's help command: 2942 2943.. doctest:: 2944 2945 >>> m = Chem.MolFromSmiles('Cc1ccccc1') 2946 >>> m.GetNumAtoms() 2947 7 2948 >>> help(m.GetNumAtoms) 2949 Help on method GetNumAtoms: 2950 <BLANKLINE> 2951 GetNumAtoms(...) method of rdkit.Chem.rdchem.Mol instance 2952 GetNumAtoms( (Mol)arg1 [, (int)onlyHeavy=-1 [, (bool)onlyExplicit=True]]) -> int : 2953 Returns the number of atoms in the molecule. 2954 <BLANKLINE> 2955 ARGUMENTS: 2956 - onlyExplicit: (optional) include only explicit atoms (atoms in the molecular graph) 2957 defaults to 1. 2958 NOTE: the onlyHeavy argument is deprecated 2959 <BLANKLINE> 2960 <BLANKLINE> 2961 C++ signature : 2962 int GetNumAtoms(...) 2963 <BLANKLINE> 2964 >>> m.GetNumAtoms(onlyExplicit=False) 2965 15 2966 2967When working in an environment that does command completion or tooltips, one can see the available methods quite easily. 2968Here's a sample screenshot from within the Jupyter notebook: 2969 2970.. image:: images/picture_6.png 2971 2972 2973Advanced Topics/Warnings 2974************************ 2975 2976 2977Editing Molecules 2978================= 2979 2980Some of the functionality provided allows molecules to be edited “in place”: 2981 2982.. doctest:: 2983 2984 >>> m = Chem.MolFromSmiles('c1ccccc1') 2985 >>> m.GetAtomWithIdx(0).SetAtomicNum(7) 2986 >>> Chem.SanitizeMol(m) 2987 rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE 2988 >>> Chem.MolToSmiles(m) 2989 'c1ccncc1' 2990 2991Do not forget the sanitization step, without it one can end up with results that look ok (so long as you don't think): 2992 2993.. doctest:: 2994 2995 >>> m = Chem.MolFromSmiles('c1ccccc1') 2996 >>> m.GetAtomWithIdx(0).SetAtomicNum(8) 2997 >>> Chem.MolToSmiles(m) 2998 'c1ccocc1' 2999 3000but that are, of course, complete nonsense, as sanitization will indicate: 3001 3002.. doctest:: 3003 3004 >>> Chem.SanitizeMol(m) 3005 Traceback (most recent call last): 3006 File "/usr/lib/python3.6/doctest.py", line 1253, in __run 3007 compileflags, 1) in test.globs 3008 File "<doctest default[0]>", line 1, in <module> 3009 Chem.SanitizeMol(m) 3010 rdkit.Chem.rdchem.KekulizeException: Can't kekulize mol. Unkekulized atoms: 1 2 3 4 5 3011 <BLANKLINE> 3012 3013More complex transformations can be carried out using the 3014:py:class:`rdkit.Chem.rdchem.RWMol` class: 3015 3016.. doctest:: 3017 3018 >>> m = Chem.MolFromSmiles('CC(=O)C=CC=C') 3019 >>> mw = Chem.RWMol(m) 3020 >>> mw.ReplaceAtom(4,Chem.Atom(7)) 3021 >>> mw.AddAtom(Chem.Atom(6)) 3022 7 3023 >>> mw.AddAtom(Chem.Atom(6)) 3024 8 3025 >>> mw.AddBond(6,7,Chem.BondType.SINGLE) 3026 7 3027 >>> mw.AddBond(7,8,Chem.BondType.DOUBLE) 3028 8 3029 >>> mw.AddBond(8,3,Chem.BondType.SINGLE) 3030 9 3031 >>> mw.RemoveAtom(0) 3032 >>> mw.GetNumAtoms() 3033 8 3034 3035The RWMol can be used just like an ROMol: 3036 3037.. doctest:: 3038 3039 >>> Chem.MolToSmiles(mw) 3040 'O=CC1=NC=CC=C1' 3041 >>> Chem.SanitizeMol(mw) 3042 rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE 3043 >>> Chem.MolToSmiles(mw) 3044 'O=Cc1ccccn1' 3045 3046The RDKit also has functionality enabling batch edits of molecules which provides a 3047more efficient way to remove multiple atoms or bonds at once. 3048 3049.. doctest:: 3050 3051 >>> m = Chem.MolFromSmiles('CC(=O)C=CC=C') 3052 >>> mw = Chem.RWMol(m) 3053 >>> mw.BeginBatchEdit() 3054 >>> mw.RemoveAtom(3) 3055 >>> mw.RemoveBond(1,2) #<- these are the begin and end atoms of the bond 3056 3057None of the changes actually happen until we "commit" them: 3058.. doctest:: 3059>>> Chem.MolToSmiles(mw) 3060'C=CC=CC(C)=O' 3061>>> mw.CommitBatchEdit() 3062>>> Chem.MolToSmiles(mw) 3063'C=CC.CC.O' 3064 3065You can make this more concise using a context manager, which takes care of the commit for you: 3066>>> with Chem.RWMol(m) as mw: 3067... mw.RemoveAtom(3) 3068... mw.RemoveBond(1,2) 3069... 3070>>> Chem.MolToSmiles(mw) 3071'C=CC.CC.O' 3072 3073It is even easier to generate nonsense using the RWMol than it 3074is with standard molecules. If you need chemically reasonable 3075results, be certain to sanitize the results. 3076 3077Miscellaneous Tips and Hints 3078**************************** 3079 3080 3081Chem vs AllChem 3082=============== 3083 3084The majority of “basic” chemical functionality (e.g. reading/writing 3085molecules, substructure searching, molecular cleanup, etc.) is in the 3086:py:mod:`rdkit.Chem` module. More advanced, or less frequently used, 3087functionality is in :py:mod:`rdkit.Chem.AllChem`. The distinction has 3088been made to speed startup and lower import times; there's no sense in 3089loading the 2D->3D library and force field implementation if one is 3090only interested in reading and writing a couple of molecules. If you 3091find the Chem/AllChem thing annoying or confusing, you can use 3092python's “import ... as ...” syntax to remove the irritation: 3093 3094.. doctest:: 3095 3096 >>> from rdkit.Chem import AllChem as Chem 3097 >>> m = Chem.MolFromSmiles('CCC') 3098 3099 3100The SSSR Problem 3101================ 3102 3103As others have ranted about with more energy and eloquence than I 3104intend to, the definition of a molecule's smallest set of smallest 3105rings is not unique. In some high symmetry molecules, a “true” SSSR 3106will give results that are unappealing. For example, the SSSR for 3107cubane only contains 5 rings, even though there are 3108“obviously” 6. This problem can be fixed by implementing a *small* 3109(instead of *smallest*) set of smallest rings algorithm that returns 3110symmetric results. This is the approach that we took with the RDKit. 3111 3112Because it is sometimes useful to be able to count how many SSSR rings 3113are present in the molecule, there is a 3114:py:func:`rdkit.Chem.rdmolops.GetSSSR` function, but this only returns the 3115SSSR count, not the potentially non-unique set of rings. 3116 3117 3118List of Available Descriptors 3119***************************** 3120 3121 3122+-----------------------------------------------------+------------------------------------------------------------+----------+ 3123|Descriptor/Descriptor |Notes | Language | 3124|Family | | | 3125+-----------------------------------------------------+------------------------------------------------------------+----------+ 3126|Gasteiger/Marsili |*Tetrahedron* | C++ | 3127|Partial Charges |**36**:3219\-28 | | 3128| |(1980) | | 3129+-----------------------------------------------------+------------------------------------------------------------+----------+ 3130|BalabanJ |*Chem. Phys. Lett.* | Python | 3131| |**89**:399\-404 | | 3132| |(1982) | | 3133+-----------------------------------------------------+------------------------------------------------------------+----------+ 3134|BertzCT |*J. Am. Chem. Soc.* | Python | 3135| |**103**:3599\-601 | | 3136| |(1981) | | 3137+-----------------------------------------------------+------------------------------------------------------------+----------+ 3138|Ipc |*J. Chem. Phys.* | Python | 3139| |**67**:4517\-33 | | 3140| |(1977) | | 3141+-----------------------------------------------------+------------------------------------------------------------+----------+ 3142|HallKierAlpha |*Rev. Comput. Chem.* | C++ | 3143| |**2**:367\-422 | | 3144| |(1991) | | 3145+-----------------------------------------------------+------------------------------------------------------------+----------+ 3146|Kappa1 \- Kappa3 |*Rev. Comput. Chem.* | C++ | 3147| |**2**:367\-422 | | 3148| |(1991) | | 3149+-----------------------------------------------------+------------------------------------------------------------+----------+ 3150|Phi |New in 2021.03 release | C++ | 3151| |*Quant. Struct.-Act. Rel.* | | 3152| |**8**:221\-224 | | 3153| |(1989) | | 3154+-----------------------------------------------------+------------------------------------------------------------+----------+ 3155|Chi0, Chi1 |*Rev. Comput. Chem.* | Python | 3156| |**2**:367\-422 | | 3157| |(1991) | | 3158+-----------------------------------------------------+------------------------------------------------------------+----------+ 3159|Chi0n \- Chi4n |*Rev. Comput. Chem.* | C++ | 3160| |**2**:367\-422 | | 3161| |(1991) | | 3162+-----------------------------------------------------+------------------------------------------------------------+----------+ 3163|Chi0v \- Chi4v |*Rev. Comput. Chem.* | C++ | 3164| |**2**:367\-422 | | 3165| |(1991) | | 3166+-----------------------------------------------------+------------------------------------------------------------+----------+ 3167|MolLogP |Wildman and Crippen | C++ | 3168| |*JCICS* | | 3169| |**39**:868\-73 | | 3170| |(1999) | | 3171+-----------------------------------------------------+------------------------------------------------------------+----------+ 3172|MolMR |Wildman and Crippen | C++ | 3173| |*JCICS* | | 3174| |**39**:868\-73 | | 3175| |(1999) | | 3176+-----------------------------------------------------+------------------------------------------------------------+----------+ 3177|MolWt | | C++ | 3178+-----------------------------------------------------+------------------------------------------------------------+----------+ 3179|ExactMolWt | | C++ | 3180+-----------------------------------------------------+------------------------------------------------------------+----------+ 3181|HeavyAtomCount | | C++ | 3182+-----------------------------------------------------+------------------------------------------------------------+----------+ 3183|HeavyAtomMolWt | | C++ | 3184+-----------------------------------------------------+------------------------------------------------------------+----------+ 3185|NHOHCount | | C++ | 3186+-----------------------------------------------------+------------------------------------------------------------+----------+ 3187|NOCount | | C++ | 3188+-----------------------------------------------------+------------------------------------------------------------+----------+ 3189|NumHAcceptors | | C++ | 3190+-----------------------------------------------------+------------------------------------------------------------+----------+ 3191|NumHDonors | | C++ | 3192+-----------------------------------------------------+------------------------------------------------------------+----------+ 3193|NumHeteroatoms | | C++ | 3194+-----------------------------------------------------+------------------------------------------------------------+----------+ 3195|NumRotatableBonds | | C++ | 3196+-----------------------------------------------------+------------------------------------------------------------+----------+ 3197|NumValenceElectrons | | C++ | 3198+-----------------------------------------------------+------------------------------------------------------------+----------+ 3199|NumAmideBonds | | C++ | 3200+-----------------------------------------------------+------------------------------------------------------------+----------+ 3201|Num{Aromatic,Saturated,Aliphatic}Rings | | C++ | 3202+-----------------------------------------------------+------------------------------------------------------------+----------+ 3203|Num{Aromatic,Saturated,Aliphatic}{Hetero,Carbo}cycles| | C++ | 3204+-----------------------------------------------------+------------------------------------------------------------+----------+ 3205|RingCount | | C++ | 3206+-----------------------------------------------------+------------------------------------------------------------+----------+ 3207|FractionCSP3 | | C++ | 3208+-----------------------------------------------------+------------------------------------------------------------+----------+ 3209|NumSpiroAtoms |Number of spiro atoms | C++ | 3210| |(atoms shared between rings that share | | 3211| |exactly one atom) | | 3212+-----------------------------------------------------+------------------------------------------------------------+----------+ 3213|NumBridgeheadAtoms |Number of bridgehead atoms | C++ | 3214| |(atoms shared between rings that share | | 3215| |at least two bonds) | | 3216+-----------------------------------------------------+------------------------------------------------------------+----------+ 3217|TPSA |*J. Med. Chem.* | C++ | 3218| |**43**:3714\-7, | | 3219| |(2000) | | 3220| |See the section in the RDKit book describing differences | | 3221| |to the original publication. | | 3222+-----------------------------------------------------+------------------------------------------------------------+----------+ 3223|LabuteASA |*J. Mol. Graph. Mod.* | C++ | 3224| |**18**:464\-77 (2000) | | 3225+-----------------------------------------------------+------------------------------------------------------------+----------+ 3226|PEOE_VSA1 \- PEOE_VSA14 |MOE\-type descriptors using partial charges | C++ | 3227| |and surface area contributions | | 3228| |http://www.chemcomp.com/journal/vsadesc.htm | | 3229+-----------------------------------------------------+------------------------------------------------------------+----------+ 3230|SMR_VSA1 \- SMR_VSA10 |MOE\-type descriptors using MR | C++ | 3231| |contributions and surface area | | 3232| |contributions | | 3233| |http://www.chemcomp.com/journal/vsadesc.htm | | 3234+-----------------------------------------------------+------------------------------------------------------------+----------+ 3235|SlogP_VSA1 \- SlogP_VSA12 |MOE\-type descriptors using LogP | C++ | 3236| |contributions and surface area | | 3237| |contributions | | 3238| |http://www.chemcomp.com/journal/vsadesc.htm | | 3239+-----------------------------------------------------+------------------------------------------------------------+----------+ 3240|EState_VSA1 \- EState_VSA11 |MOE\-type descriptors using EState indices | Python | 3241| |and surface area contributions (developed | | 3242| |at RD, not described in the CCG paper) | | 3243+-----------------------------------------------------+------------------------------------------------------------+----------+ 3244|VSA_EState1 \- VSA_EState10 |MOE\-type descriptors using EState indices | Python | 3245| |and surface area contributions (developed | | 3246| |at RD, not described in the CCG paper) | | 3247+-----------------------------------------------------+------------------------------------------------------------+----------+ 3248|MQNs |Nguyen et al. *ChemMedChem* **4**:1803\-5 | C++ | 3249| |(2009) | | 3250+-----------------------------------------------------+------------------------------------------------------------+----------+ 3251|Topliss fragments |implemented using a set of SMARTS | Python | 3252| |definitions in | | 3253| |$(RDBASE)/Data/FragmentDescriptors.csv | | 3254+-----------------------------------------------------+------------------------------------------------------------+----------+ 3255|Autocorr2D |New in 2017.09 release. Todeschini and Consoni "Descriptors | C++ | 3256| |from Molecular Geometry" Handbook of Chemoinformatics | | 3257| |https://doi.org/10.1002/9783527618279.ch37 | | 3258+-----------------------------------------------------+------------------------------------------------------------+----------+ 3259|BCUT2D |New in 2020.09 release. Pearlman and Smith in "3D-QSAR and | C++ | 3260| |Drug design: Recent Advances" (1997) | | 3261+-----------------------------------------------------+------------------------------------------------------------+----------+ 3262 3263 3264List of Available 3D Descriptors 3265******************************** 3266 3267These all require the molecule to have a 3D conformer. 3268 3269+-----------------------------------------------------+-------------------------------------------------------------+----------+ 3270|Descriptor/Descriptor |Notes | Language | 3271|Family | | | 3272+-----------------------------------------------------+-------------------------------------------------------------+----------+ 3273|Plane of best fit (PBF) |Nicholas C. Firth, Nathan Brown, and Julian | C++ | 3274| |Blagg, *JCIM* **52**:2516\-25 | | 3275+-----------------------------------------------------+-------------------------------------------------------------+----------+ 3276|PMI1, PMI2, PMI3 |Principal moments of inertia | C++ | 3277+-----------------------------------------------------+-------------------------------------------------------------+----------+ 3278|NPR1, NPR2 |Normalized principal moments ratios Sauer | C++ | 3279| |and Schwarz *JCIM* **43**:987\-1003 (2003) | | 3280+-----------------------------------------------------+-------------------------------------------------------------+----------+ 3281|Radius of gyration |G. A. Arteca "Molecular Shape Descriptors" | C++ | 3282| |Reviews in Computational Chemistry vol 9 | | 3283| |https://doi.org/10.1002/9780470125861.ch5 | | 3284+-----------------------------------------------------+-------------------------------------------------------------+----------+ 3285|Inertial shape factor |Todeschini and Consoni "Descriptors from Molecular Geometry" | C++ | 3286| |Handbook of Chemoinformatics | | 3287| |https://doi.org/10.1002/9783527618279.ch37 | | 3288+-----------------------------------------------------+-------------------------------------------------------------+----------+ 3289|Eccentricity |G. A. Arteca "Molecular Shape Descriptors" | C++ | 3290| |Reviews in Computational Chemistry vol 9 | | 3291| |https://doi.org/10.1002/9780470125861.ch5 | | 3292+-----------------------------------------------------+-------------------------------------------------------------+----------+ 3293|Asphericity |A. Baumgaertner, "Shapes of flexible vesicles" | C++ | 3294| |J. Chem. Phys. 98:7496 | | 3295| |(1993) | | 3296| |https://doi.org/10.1063/1.464689 | | 3297+-----------------------------------------------------+-------------------------------------------------------------+----------+ 3298|Spherocity Index |Todeschini and Consoni "Descriptors from Molecular Geometry" | C++ | 3299| |Handbook of Chemoinformatics | | 3300| |https://doi.org/10.1002/9783527618279.ch37 | | 3301+-----------------------------------------------------+-------------------------------------------------------------+----------+ 3302|Autocorr3D |New in 2017.09 release. Todeschini and Consoni "Descriptors | C++ | 3303| |from Molecular Geometry" Handbook of Chemoinformatics | | 3304| |https://doi.org/10.1002/9783527618279.ch37 | | 3305+-----------------------------------------------------+-------------------------------------------------------------+----------+ 3306|RDF |New in 2017.09 release. Todeschini and Consoni "Descriptors | C++ | 3307| |from Molecular Geometry" Handbook of Chemoinformatics | | 3308| |https://doi.org/10.1002/9783527618279.ch37 | | 3309+-----------------------------------------------------+-------------------------------------------------------------+----------+ 3310|MORSE |New in 2017.09 release. Todeschini and Consoni "Descriptors | C++ | 3311| |from Molecular Geometry" Handbook of Chemoinformatics | | 3312| |https://doi.org/10.1002/9783527618279.ch37 | | 3313+-----------------------------------------------------+-------------------------------------------------------------+----------+ 3314|WHIM |New in 2017.09 release. Todeschini and Consoni "Descriptors | C++ | 3315| |from Molecular Geometry" Handbook of Chemoinformatics | | 3316| |https://doi.org/10.1002/9783527618279.ch37 | | 3317| | | | 3318| |**Note** insufficient information is available to exactly | | 3319| |reproduce values from DRAGON for these descriptors. We | | 3320| |believe that this is close. | | 3321+-----------------------------------------------------+-------------------------------------------------------------+----------+ 3322|GETAWAY |New in 2017.09 release. Todeschini and Consoni "Descriptors | C++ | 3323| |from Molecular Geometry" Handbook of Chemoinformatics | | 3324| |https://doi.org/10.1002/9783527618279.ch37 | | 3325| | | | 3326| |**Note** insufficient information is available to exactly | | 3327| |reproduce values from DRAGON for these descriptors. We | | 3328| |believe that this is close. | | 3329+-----------------------------------------------------+-------------------------------------------------------------+----------+ 3330 3331 3332 3333List of Available Fingerprints 3334****************************** 3335 3336+----------------------+-----------------------------------------------------------------------------------------------------------+----------+ 3337| Fingerprint Type | Notes | Language | 3338+----------------------+-----------------------------------------------------------------------------------------------------------+----------+ 3339| RDKit | a Daylight\-like fingerprint based on hashing molecular subgraphs | C++ | 3340+----------------------+-----------------------------------------------------------------------------------------------------------+----------+ 3341| Atom Pairs | *JCICS* **25**:64\-73 (1985) | C++ | 3342+----------------------+-----------------------------------------------------------------------------------------------------------+----------+ 3343| Topological Torsions | *JCICS* **27**:82\-5 (1987) | C++ | 3344+----------------------+-----------------------------------------------------------------------------------------------------------+----------+ 3345| MACCS keys | Using the 166 public keys implemented as SMARTS | C++ | 3346+----------------------+-----------------------------------------------------------------------------------------------------------+----------+ 3347| Morgan/Circular | Fingerprints based on the Morgan algorithm, similar to the ECFP/FCFP fingerprints | C++ | 3348| | *JCIM* **50**:742\-54 (2010). | | 3349+----------------------+-----------------------------------------------------------------------------------------------------------+----------+ 3350| 2D Pharmacophore | Uses topological distances between pharmacophoric points. | C++ | 3351+----------------------+-----------------------------------------------------------------------------------------------------------+----------+ 3352| Pattern | a topological fingerprint optimized for substructure screening | C++ | 3353+----------------------+-----------------------------------------------------------------------------------------------------------+----------+ 3354| Extended Reduced | Derived from the ErG fingerprint published by Stiefl et al. in | C++ | 3355| Graphs | *JCIM* **46**:208\–20 (2006). | | 3356| | NOTE: these functions return an array of floats, not the usual fingerprint types | | 3357+----------------------+-----------------------------------------------------------------------------------------------------------+----------+ 3358| MHFP and SECFP | Derived from the ErG fingerprint published by Probst et al. in | C++ | 3359| | *J Cheminformatics* **10** (2018). | | 3360| | NOTE: these functions return different types of values | | 3361+----------------------+-----------------------------------------------------------------------------------------------------------+----------+ 3362 3363 3364Feature Definitions Used in the Morgan Fingerprints 3365*************************************************** 3366 3367These are adapted from the definitions in Gobbi, A. & Poppinger, D. “Genetic optimization of combinatorial libraries.” *Biotechnology and Bioengineering* **61**, 47-54 (1998). 3368 3369+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 3370| Feature | SMARTS | 3371+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 3372| Donor | ``[$([N;!H0;v3,v4&+1]),$([O,S;H1;+0]),n&H1&+0]`` | 3373+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 3374| Acceptor | ``[$([O,S;H1;v2;!$(*-*=[O,N,P,S])]),$([O,S;H0;v2]),$([O,S;-]),$([N;v3;!$(N-*=[O,N,P,S])]),n&H0&+0,$([o,s;+0;!$([o,s]:n);!$([o,s]:c:n)])]`` | 3375+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 3376| Aromatic | ``[a]`` | 3377+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 3378| Halogen | ``[F,Cl,Br,I]`` | 3379+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 3380| Basic | ``[#7;+,$([N;H2&+0][$([C,a]);!$([C,a](=O))]),$([N;H1&+0]([$([C,a]);!$([C,a](=O))])[$([C,a]);!$([C,a](=O))]),$([N;H0&+0]([C;!$(C(=O))])([C;!$(C(=O))])[C;!$(C(=O))])]`` | 3381+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 3382| Acidic | ``[$([C,S](=[O,S,P])-[O;H1,-1])]`` | 3383+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 3384 3385.. rubric:: Footnotes 3386 3387.. [#blaney] Blaney, J. M.; Dixon, J. S. "Distance Geometry in Molecular Modeling". *Reviews in Computational Chemistry*; VCH: New York, 1994. 3388.. [#rappe] Rappé, A. K.; Casewit, C. J.; Colwell, K. S.; Goddard III, W. A.; Skiff, W. M. "UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations". *J. Am. Chem. Soc.* **114**:10024-35 (1992) . 3389.. [#carhart] Carhart, R.E.; Smith, D.H.; Venkataraghavan R. “Atom Pairs as Molecular Features in Structure-Activity Studies: Definition and Applications” *J. Chem. Inf. Comp. Sci.* **25**:64-73 (1985). 3390.. [#nilakantan] Nilakantan, R.; Bauman N.; Dixon J.S.; Venkataraghavan R. “Topological Torsion: A New Molecular Descriptor for SAR Applications. Comparison with Other Desciptors.” *J. Chem.Inf. Comp. Sci.* **27**:82-5 (1987). 3391.. [#rogers] Rogers, D.; Hahn, M. “Extended-Connectivity Fingerprints.” *J. Chem. Inf. and Model.* **50**:742-54 (2010). 3392.. [#ashton] Ashton, M. et al. “Identification of Diverse Database Subsets using Property-Based and Fragment-Based Molecular Descriptions.” *Quantitative Structure-Activity Relationships* **21**:598-604 (2002). 3393.. [#bemis1] Bemis, G. W.; Murcko, M. A. "The Properties of Known Drugs. 1. Molecular Frameworks." *J. Med. Chem.* **39**:2887-93 (1996). 3394.. [#lewell] Lewell, X.Q.; Judd, D.B.; Watson, S.P.; Hann, M.M. “RECAP-Retrosynthetic Combinatorial Analysis Procedure: A Powerful New Technique for Identifying Privileged Molecular Fragments with Useful Applications in Combinatorial Chemistry” *J. Chem. Inf. Comp. Sci.* **38**:511-22 (1998). 3395.. [#degen] Degen, J.; Wegscheid-Gerlach, C.; Zaliani, A; Rarey, M. "On the Art of Compiling and Using ‘Drug-Like’ Chemical Fragment Spaces." *ChemMedChem* **3**:1503–7 (2008). 3396.. [#gobbi] Gobbi, A. & Poppinger, D. "Genetic optimization of combinatorial libraries." *Biotechnology and Bioengineering* **61**:47-54 (1998). 3397.. [#rxnsmarts] A more detailed description of reaction smarts, as defined by the rdkit, is in the :doc:`RDKit_Book`. 3398.. [#mmff1] Halgren, T. A. "Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94." *J. Comp. Chem.* **17**:490–19 (1996). 3399.. [#mmff2] Halgren, T. A. "Merck molecular force field. II. MMFF94 van der Waals and electrostatic parameters for intermolecular interactions." *J. Comp. Chem.* **17**:520–52 (1996). 3400.. [#mmff3] Halgren, T. A. "Merck molecular force field. III. Molecular geometries and vibrational frequencies for MMFF94." *J. Comp. Chem.* **17**:553–86 (1996). 3401.. [#mmff4] Halgren, T. A. & Nachbar, R. B. "Merck molecular force field. IV. conformational energies and geometries for MMFF94." *J. Comp. Chem.* **17**:587-615 (1996). 3402.. [#mmffs] Halgren, T. A. "MMFF VI. MMFF94s option for energy minimization studies." *J. Comp. Chem.* **20**:720–9 (1999). 3403.. [#riniker] Riniker, S.; Landrum, G. A. "Similarity Maps - A Visualization Strategy for Molecular Fingerprints and Machine-Learning Methods" *J. Cheminf.* **5**:43 (2013). 3404.. [#riniker2] Riniker, S.; Landrum, G. A. "Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation" *J. Chem. Inf. Comp. Sci.* **55**:2562-74 (2015) 3405 3406 3407 3408 3409License 3410******* 3411 3412.. image:: images/picture_5.png 3413 3414This document is copyright (C) 2007-2021 by Greg Landrum 3415 3416This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 License. 3417To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/4.0/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA. 3418 3419 3420The intent of this license is similar to that of the RDKit itself. 3421In simple words: “Do whatever you want with it, but please give us some credit.” 3422 3423.. |picture_0| image:: images/picture_0.png 3424 :scale: 75 % 3425 3426.. |picture_1| image:: images/picture_1.png 3427 :scale: 75 % 3428 3429.. |picture_3| image:: images/picture_3.png 3430 :scale: 75 % 3431 3432.. |picture_2| image:: images/picture_2.png 3433 :scale: 50 % 3434 3435.. |picture_4| image:: images/picture_4.png 3436 :scale: 75 % 3437