• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

Wrap/H03-May-2022-1,190869

FilterCatalog.cppH A D24-Aug-20217.4 KiB255190

FilterCatalog.hH A D24-Aug-20218.9 KiB26197

FilterCatalogEntry.cppH A D24-Aug-20213.9 KiB11362

FilterCatalogEntry.hH A D24-Aug-20218.8 KiB276141

FilterCatalogRunner.cppH A D24-Aug-20212.4 KiB8767

FilterMatcherBase.hH A D24-Aug-20215.3 KiB15172

FilterMatchers.cppH A D24-Aug-20215.6 KiB155103

FilterMatchers.hH A D24-Aug-202119.4 KiB566335

Filters.cppH A D24-Aug-202139.6 KiB876764

Filters.cpp.inH A D24-Aug-202139.4 KiB870842

Filters.hH A D24-Aug-20212.5 KiB6426

FunctionalGroupHierarchy.cppH A D24-Aug-20219.9 KiB263195

FunctionalGroupHierarchy.hH A D24-Aug-20212.4 KiB5310

READMEH A D24-Aug-20216.7 KiB179135

filtercatalogtest.cppH A D24-Aug-20218.5 KiB257195

pains_a.inH A D24-Aug-20211.7 KiB1818

pains_b.inH A D24-Aug-20216.5 KiB5757

pains_c.inH A D24-Aug-202153.3 KiB411411

update_pains.pyH A D24-Aug-202111.9 KiB188167

README

1FilterCatalogs give RDKit the ability to screen out or reject undesirable molecules
2based on various criteria.  Supplied with RDKIt are the following filter sets:
3
4  * PAINS - Pan assay interference patterns.  These are separated into three
5    sets PAINS_A, PAINS_B and PAINS_C.
6    Reference: Baell JB, Holloway GA. New Substructure Filters for Removal of Pan Assay
7               Interference Compounds (PAINS) from Screening Libraries and for Their
8               Exclusion in Bioassays.
9               J Med Chem 53 (2010) 2719Ð40. doi:10.1021/jm901137j.
10
11  * BRENK - filters unwanted functionality due to potential tox reasons or unfavorable
12     pharmacokinetics.
13    Reference: Brenk R et al. Lessons Learnt from Assembling Screening Libraries for
14               Drug Discovery for Neglected Diseases.
15               ChemMedChem 3 (2008) 435-444. doi:10.1002/cmdc.200700139.
16
17  * NIH - annotated compounds with problematic functional groups
18     Reference: Doveston R, et al. A Unified Lead-oriented Synthesis of over Fifty
19                Molecular Scaffolds. Org Biomol Chem 13 (2014) 859Ð65.
20                doi:10.1039/C4OB02287D.
21     Reference: Jadhav A, et al. Quantitative Analyses of Aggregation, Autofluorescence,
22                and Reactivity Artifacts in a Screen for Inhibitors of a Thiol Protease.
23                J Med Chem 53 (2009) 37Ð51. doi:10.1021/jm901070c.
24
25  * ZINC - Filtering based on drug-likeness and unwanted functional groups
26    Reference: http://blaster.docking.org/filtering/
27
28The following is C++ and Python examples of how to filter molecules.
29
30[C++]
31
32#include <GraphMol/FilterCatalog.h>
33using namespace RDKit;
34
35    SmilesMolSupplier suppl(…);
36
37    // setup the desired catalogs
38    FilterCatalogParams params;
39    params.addCatalog(FilterCatalogParams::PAINS_A);
40    params.addCatalog(FilterCatalogParams::PAINS_B);
41    params.addCatalog(FilterCatalogParams::PAINS_C);
42
43    // create the catalog
44    FilterCatalog catalog(params);
45
46    unique_ptr<ROMol> mol; // automatically cleans up after us
47    int count = 0;
48    while(!suppl.atEnd()){
49      mol.reset(suppl.next());
50      TEST_ASSERT(mol.get());
51
52      // Does a PAINS filter hit?
53      if (catalog.hasMatch(*mol)) {
54        std::cerr << "Warning: molecule failed filter " << std::endl;
55      }
56
57      // More detailed data by retrieving the catalog entry
58      const FilterCatalogEntry *entry = catalog.getFirstMatch(*mol);
59      if (entry) {
60        std::cerr << "Warning: molecule failed filter: reason " <<
61          entry->getDescription() << std::endl;
62
63        // get the matched substructure atoms for visualization
64        std::vector<FilterMatch> matches;
65        if (entry->getFilterMatches(*mol, matches)) {
66          for(std::vector<FilterMatch>::const_iterator it = matches.begin();
67              it != matches.end(); ++it) {
68            // Get the SmartsMatcherBase that matched
69            const FilterMatch & fm = (*it);
70            boost::shared_ptr<SmartsMatcherBase> matchingFilter = \
71              fm.filterMatch;
72
73            // Get the matching atom indices
74            const MatchVectType &vect = fm.atomPairs;
75            for (MatchVectType::const_iterator it=vect.begin();
76                 it != vect.end(); ++it) {
77                 int atomIdx = it->second;
78            }
79
80          }
81        }
82      }
83      count ++;
84    } // end while
85
86Python API
87
88  import sys
89  from rdkit.Chem import FilterCatalog
90
91  params = FilterCatalog.FilterCatalogParams()
92  params.AddCatalog(FilterCatalogParams.FilterCatalogs.PAINS_A)
93  params.AddCatalog(FilterCatalogParams.FilterCatalogs.PAINS_B)
94  params.AddCatalog(FilterCatalogParams.FilterCatalogs.PAINS_C)
95  catalog = FilterCatalog.FilterCatalog(params)
96
97  ...
98  for mol in mols:
99      if catalog.HasMatch(mol):
100         print("Warning: molecule failed filter", file=sys.stderr)
101      # more detailed
102      entry = catalog.GetFirstMatch(mol)
103      if entry:
104         print("Warning: molecule failed filter: reason %s"%(
105           entry.GetDescription()), file=sys.stderr)
106
107         # get to the atoms involved in the substructure
108         #  there ma be many matching filters here...
109         for filterMatch in entry.getFilterMatches(mol):
110             filter = filterMatch.filterMatch
111             # get a description of the matching filter
112             print(filter)
113             for queryAtomIdx, atomIdx in filterMatch.atomPairs:
114                 # do something with the substructure matches
115
116Advanced
117
118 FilterCatalogs are fully serializable and can be stored for later use.
119
120  To serialize a catalog, use the catalog.Serialize() method.
121     std::string pickle = catalog.Serialize();
122
123  To unserialize, send the resulting string into the constructor
124     FilterCatalog catalog(pickle);
125
126
127 The underlying matchers can be arbitrarily complicated and new
128  ones with more complicated semantics can be created.  The default
129  matching objects are:
130
131  SmartsMatcher - match a smarts pattern or query molecule with a minimum and maximum count
132  ExclusionList - returns false if any of the supplied matches exist
133
134  And - combine two matchers
135  Or  - true if any of two matchers are true
136  Not - invert the match (note that this can have confusing semantics
137          when dealing with substructure matches)
138
139  Entries can be added at any time to a catalog:
140
141   ExclusionList excludedList;
142
143    excludedList.addPattern(SmartsMatcher("Pattern 1", smarts));
144    excludedList.addPattern(SmartsMatcher("Pattern 2", smarts2));
145
146
147  A FilterCatalog supports a few different types of matching.  One is
148  a traditional rejection filter where if a substructure exists in
149  the target molecule, the molecule is rejected.
150
151  These types of queries can indicate the substructure that triggered
152  the rejection through the FilterCatalogEntry::GetMatch(mol)
153  function.
154
155  The FilterCatalog also supports acceptance filters, that are
156  designed to indicate which molecules are ok.  These have
157  to be transformed into rejection filters or simply wrapped in a Not( acceptanceFilter )
158  when entered into the catalog.  For example, from Zinc:
159
160    carbons [#6] 40
161
162  means that we have a maximum of 40 carbon atoms.  We can write this by
163  converting the max count to a min count (i.e. the pattern is triggered
164  when the molecule has mincount atoms);
165
166    const unsigned int minCount = 40+1;
167    SmartsMatcher( "Too many carbons", "[#6"], minCount );
168
169  This can be properly substructure searched.
170
171  Or we can wrap this in a not:
172
173    const unsigned int minCount = 0;
174    const unsigned int maxCount = 40;
175    Not( SmartsMatcher( "ok number of carbons", "[#6]", minCount, maxCount) );
176
177  Note: Wrapping in a Not loses the ability to highlight the rejecting
178    pattern when visualizing the molecule.
179