README
1FilterCatalogs give RDKit the ability to screen out or reject undesirable molecules
2based on various criteria. Supplied with RDKIt are the following filter sets:
3
4 * PAINS - Pan assay interference patterns. These are separated into three
5 sets PAINS_A, PAINS_B and PAINS_C.
6 Reference: Baell JB, Holloway GA. New Substructure Filters for Removal of Pan Assay
7 Interference Compounds (PAINS) from Screening Libraries and for Their
8 Exclusion in Bioassays.
9 J Med Chem 53 (2010) 2719Ð40. doi:10.1021/jm901137j.
10
11 * BRENK - filters unwanted functionality due to potential tox reasons or unfavorable
12 pharmacokinetics.
13 Reference: Brenk R et al. Lessons Learnt from Assembling Screening Libraries for
14 Drug Discovery for Neglected Diseases.
15 ChemMedChem 3 (2008) 435-444. doi:10.1002/cmdc.200700139.
16
17 * NIH - annotated compounds with problematic functional groups
18 Reference: Doveston R, et al. A Unified Lead-oriented Synthesis of over Fifty
19 Molecular Scaffolds. Org Biomol Chem 13 (2014) 859Ð65.
20 doi:10.1039/C4OB02287D.
21 Reference: Jadhav A, et al. Quantitative Analyses of Aggregation, Autofluorescence,
22 and Reactivity Artifacts in a Screen for Inhibitors of a Thiol Protease.
23 J Med Chem 53 (2009) 37Ð51. doi:10.1021/jm901070c.
24
25 * ZINC - Filtering based on drug-likeness and unwanted functional groups
26 Reference: http://blaster.docking.org/filtering/
27
28The following is C++ and Python examples of how to filter molecules.
29
30[C++]
31
32#include <GraphMol/FilterCatalog.h>
33using namespace RDKit;
34
35 SmilesMolSupplier suppl(…);
36
37 // setup the desired catalogs
38 FilterCatalogParams params;
39 params.addCatalog(FilterCatalogParams::PAINS_A);
40 params.addCatalog(FilterCatalogParams::PAINS_B);
41 params.addCatalog(FilterCatalogParams::PAINS_C);
42
43 // create the catalog
44 FilterCatalog catalog(params);
45
46 unique_ptr<ROMol> mol; // automatically cleans up after us
47 int count = 0;
48 while(!suppl.atEnd()){
49 mol.reset(suppl.next());
50 TEST_ASSERT(mol.get());
51
52 // Does a PAINS filter hit?
53 if (catalog.hasMatch(*mol)) {
54 std::cerr << "Warning: molecule failed filter " << std::endl;
55 }
56
57 // More detailed data by retrieving the catalog entry
58 const FilterCatalogEntry *entry = catalog.getFirstMatch(*mol);
59 if (entry) {
60 std::cerr << "Warning: molecule failed filter: reason " <<
61 entry->getDescription() << std::endl;
62
63 // get the matched substructure atoms for visualization
64 std::vector<FilterMatch> matches;
65 if (entry->getFilterMatches(*mol, matches)) {
66 for(std::vector<FilterMatch>::const_iterator it = matches.begin();
67 it != matches.end(); ++it) {
68 // Get the SmartsMatcherBase that matched
69 const FilterMatch & fm = (*it);
70 boost::shared_ptr<SmartsMatcherBase> matchingFilter = \
71 fm.filterMatch;
72
73 // Get the matching atom indices
74 const MatchVectType &vect = fm.atomPairs;
75 for (MatchVectType::const_iterator it=vect.begin();
76 it != vect.end(); ++it) {
77 int atomIdx = it->second;
78 }
79
80 }
81 }
82 }
83 count ++;
84 } // end while
85
86Python API
87
88 import sys
89 from rdkit.Chem import FilterCatalog
90
91 params = FilterCatalog.FilterCatalogParams()
92 params.AddCatalog(FilterCatalogParams.FilterCatalogs.PAINS_A)
93 params.AddCatalog(FilterCatalogParams.FilterCatalogs.PAINS_B)
94 params.AddCatalog(FilterCatalogParams.FilterCatalogs.PAINS_C)
95 catalog = FilterCatalog.FilterCatalog(params)
96
97 ...
98 for mol in mols:
99 if catalog.HasMatch(mol):
100 print("Warning: molecule failed filter", file=sys.stderr)
101 # more detailed
102 entry = catalog.GetFirstMatch(mol)
103 if entry:
104 print("Warning: molecule failed filter: reason %s"%(
105 entry.GetDescription()), file=sys.stderr)
106
107 # get to the atoms involved in the substructure
108 # there ma be many matching filters here...
109 for filterMatch in entry.getFilterMatches(mol):
110 filter = filterMatch.filterMatch
111 # get a description of the matching filter
112 print(filter)
113 for queryAtomIdx, atomIdx in filterMatch.atomPairs:
114 # do something with the substructure matches
115
116Advanced
117
118 FilterCatalogs are fully serializable and can be stored for later use.
119
120 To serialize a catalog, use the catalog.Serialize() method.
121 std::string pickle = catalog.Serialize();
122
123 To unserialize, send the resulting string into the constructor
124 FilterCatalog catalog(pickle);
125
126
127 The underlying matchers can be arbitrarily complicated and new
128 ones with more complicated semantics can be created. The default
129 matching objects are:
130
131 SmartsMatcher - match a smarts pattern or query molecule with a minimum and maximum count
132 ExclusionList - returns false if any of the supplied matches exist
133
134 And - combine two matchers
135 Or - true if any of two matchers are true
136 Not - invert the match (note that this can have confusing semantics
137 when dealing with substructure matches)
138
139 Entries can be added at any time to a catalog:
140
141 ExclusionList excludedList;
142
143 excludedList.addPattern(SmartsMatcher("Pattern 1", smarts));
144 excludedList.addPattern(SmartsMatcher("Pattern 2", smarts2));
145
146
147 A FilterCatalog supports a few different types of matching. One is
148 a traditional rejection filter where if a substructure exists in
149 the target molecule, the molecule is rejected.
150
151 These types of queries can indicate the substructure that triggered
152 the rejection through the FilterCatalogEntry::GetMatch(mol)
153 function.
154
155 The FilterCatalog also supports acceptance filters, that are
156 designed to indicate which molecules are ok. These have
157 to be transformed into rejection filters or simply wrapped in a Not( acceptanceFilter )
158 when entered into the catalog. For example, from Zinc:
159
160 carbons [#6] 40
161
162 means that we have a maximum of 40 carbon atoms. We can write this by
163 converting the max count to a min count (i.e. the pattern is triggered
164 when the molecule has mincount atoms);
165
166 const unsigned int minCount = 40+1;
167 SmartsMatcher( "Too many carbons", "[#6"], minCount );
168
169 This can be properly substructure searched.
170
171 Or we can wrap this in a not:
172
173 const unsigned int minCount = 0;
174 const unsigned int maxCount = 40;
175 Not( SmartsMatcher( "ok number of carbons", "[#6]", minCount, maxCount) );
176
177 Note: Wrapping in a Not loses the ability to highlight the rejecting
178 pattern when visualizing the molecule.
179