1) The researchers redesigned their fragment library to enhance chemical diversity and 3D content by combining commercial and in-house databases.
2) They analyzed shape, chemical, and pharmacophore diversity of the new library and found it had greater diversity than the original library and a commercial diversity library while maintaining good physicochemical properties and lower complexity.
3) Future plans include synthesizing fragments from biologically active molecules and enumerated scaffolds to further increase diversity of the library.
Expanding the Scope of Fragment Screening Libraries
1. Expanding the Scope of Fragment Screening Libraries
– Thinking in
The Beatson Drug Discovery Unit, CRUK Beatson Institute, Switchback Road, Glasgow, G61 1BD, UK
Background
• Fragment screening is an established method to generate high quality hits.
• Current fragment libraries are largely composed of flat molecules.
• Increasing 3D content of fragment libraries will increase diversity – opportunities for “challenging targets” as protein-protein
interactions.
• Known PPI inhibitors are less flat than typical drugs.
Aims
• Redesign our fragment library to enhance chemical diversity and pharmacophoric content using commercial and
in-house chemical space
• Enrich the fragment screening library with 3-dimensionally-enhanced molecules in order to increase diversity
• Maintain excellent physicochemical properties and chemical tractability without increasing molecule complexity
• Synthesise molecules to fill gaps in unexploited chemical and 3-dimensional space
Commercial Fragment Space Analysis
Maximising Diversity
Fragment Complexity
Key Achievements
• Built a fragment set with enhanced chemical, pharmacophore and shape diversity compared to our current library
and a typical “diverse” commercial library
• Shape coverage comparable with a shape-biased commercial library but achieving greater chemical diversity
• Pharmacophore rich but size and molecular complexity not compromised
• Designed and implemented in silico protocols to achieve all the above
A B
Shape diversity of the Beatson fragment library (A) and fragmented biologically active molecules (B) showing the increase in 3-
dimesionality going from the left to the right of the PMI plot. Fragmented molecules were generated from the InMan subset (all molecules
that reached clinical trials including marketed drugs) of the ZINC database using the RECAP method of fragmentation.
Filter:
1. Duplicates
2. Quality
3. PC properties
Combine Databases
eMolecules
ZINC
AldrichMarketSelect
Combined DB
~20M molecules
~236K “clean”
fragments
71%
0 – 1.1
29%
1.1 – 2.0
A
B C
(A) Process defined to select a “clean” fragments database. PMI plot (B) binned by
PMIsum or Flat Distance (NPR1+NPR2), and pie chart of Flat Distance distribution
(C) of the ~236000 commercially available fragments.
Future Plans
We have carried out an analysis of compounds that report biological activity. The output from this work will form the basis of future
synthetic chemistry work that will be undertaken in collaboration with other members of the 3D fragment consortium. In detail, we have
compiled a set of filtered compounds from ChEMBL, ZINC “InMan” subset and SURECHEM (compounds from patent literature), totalling
around 2.5 million. From this initial set 730 molecules were selected using the process outlined below. The PMI plot shows the shape
coverage of the set, visibly highlighting how fragments derived from biologically active compounds cover a vast range of shape space,
considerably more than that covered by a commercial “diversity” and a shape-biased library.
(A) Process defined to select fragments to enhance diversity. PMI plots highlighting
shape space covered by initial BDDP library (B) and diversity enhanced library (C). (D)
Examples of compounds selected to enhance the library (taken from PMI plot C); flat
distances are reported under the structures.
Diversity selection:
1. Shape
2. Chemistry
3. Pharmacophore
4. Intra/inter-set
redundancy~236K “clean”
fragments
Med Chem
eyeballing
198
fragments
661
fragments
Further diversity refinement:
1. In-house chemistry
addition
2. Adding 3DFrag core-set
3. Adding FS2 “to keep”4729
fragments
A
B C
D
1.15 1.20 1.38
Principal moments of inertia plots (PMI) are a
simple way to evaluate the three dimensional
diversity:
• Flat molecules occupy the extreme left hand
diagonal axis with increasingly three-
dimensional compounds extending to the right.
Application of fragment property space filters
removes the vast majority of molecules from the
20 million compound set, demonstrating the
more “drug-like” and lead-like nature of
commercial vendor collections. This set of
fragment molecules not only acted as a source
of compounds for our analysis, but also as a
readily searchable database from which we can
undertake fast near neighbour and sub-
structure searches of fragment hits. The PMI
plot of this filtered set binned by the range of
sum of normalized principal moments of inertia
(NPR1+NPR2 or “Flat Distance) indicates a
good spread of shape diversity from which to
select diverse fragments. A similar process has
been used to identify approximately 2 million
lead-like molecules that will be used for future
virtual screening campaigns against suitable
drug discovery targets. As can be seen from
this analysis, approximately 75% of commercial
fragment property space is populated with “flat”
molecules, defined as having a PMIsum or flat
distance of <1.1. Whilst this may still provide
chemically diverse molecules it limits the
potential to access chemically attractive, 3-
dimensionally biased molecules.
Fragment Library Design
In order to maintain a “usable” fragment screening library we undertook a phased approach to compound selection and initially focussed
our analysis on ~50% of the library, named FS2 (Fragment Set 2).
The first step we undertook was to improve the
intra-set chemical diversity. Following
comparison of FS2 with the 3D fragment
foundation library, clustering and a subsequent
diversity selection, 181 compounds were kept
to be still part of the new, enhanced fragment
library. In order to maintain a balanced shape
distribution of our fragment library we selected
chemically diverse, pharmacophore-rich and
chemically attractive molecules from across the
shape diversity space defined by the 236K
commercial set. In part, we achieved this
through restricting shape space to a PMIsum
</=1.5, which also enabled us to maintain an
acceptable level of complexity for the selected
molecules. In addition to selecting molecules
fitting the above criteria we also developed
Pipeline Pilot protocols to identify fragments
which have a higher number of commercially
available elaborated analogues in our virtual
lead-like library. After removing all compounds
too similar to the FS2 and 3D fragment
foundation library followed by medicinal
chemistry eyeballing, 198 compounds were
selected. Following further diversity refinement
and the addition of in-house synthesised
fragments this was boosted to 290 molecules.
In total 661 fragments have been selected to
enhance the BDDP fragment screening set. The
3-dimensional space exemplified by the
enhanced fragment library (661 fragments, PMI
plot C) is more comprehensive than the initial
BDDP fragment library (PMI plot B) with
approximately one third (33%) of the fragments
having a flat distance > 1.1, compared with only
13% in the original set.
We have significantly enhanced the chemical diversity (diversity coefficient (Maccs Keys) of 0.72, compared to 0.82 for the entire original
fragment library and 0.80 for the original FS2 only), providing more singletons and thus greater coverage of chemical space. Application of
radial fingerprints (ECFP4, more feature independent compared with Maccs Keys) also indicates an improved diversity coefficient (0.24)
over the original set (0.37) with a similar difference in similarity values distribution. The above (right) figure shows the maximum similarity
plots for our original FS2 library (A) and our enhanced version (B); the original FS2 visibly presents the greatest similarity.
Pharmacophore diversity of the enhanced FS2 (based on 3D pharmacophore fingerprints) was also improved with a diversity coefficient of
0.4 compared with a value of 0.51 for the original entire set. Chemical and pharmacophore diversity were also favourable compared with a
commercially available “diversity” library composed of 1000 fragments. The above Table shows the comparison between the two libraries,
where 3 different diversity metrics (coefficients) are considered and the enhanced FS2 profile is favourable in all measures of diversity.
We also analysed a commercially available 3D-biased fragment library and, in this case the majority (134, 65%) of the 205 fragments
composing the set had a flat distance >1.1 but this is achieved at the expense of lower chemical diversity and greater molecule complexity
(see fragment complexity section). In fact, the fingerprint-based (using Maccs Keys) diversity coefficient is 0.82 compared with 0.72 of our
enhanced FS2.
Original FS2 Enhanced FS2
A B
Diversity Enhanced FS2 Commercial library
Chemical - Maccs Keys 0.72 0.78
Chemical - ECFP4 0.24 0.33
3D Pharmacophore 0.40 0.44
BDDP
Avg. Complexity: 38.0
Avg. MW: 189.8
Avg. Complexity: 42.4
Avg. MW: 193.1
BDDP enhanced
Avg. Complexity: 30.3
Avg. MW: 180.0
“Diversity” Library
Avg. Complexity: 59.4
Avg. MW: 223.0
Shape-biased
A B
C D
Increasing the diversity of a fragment library has clear
advantages and we were keen to improve this facet but
not at the expense of dramatically increasing molecule
complexity. It is well known that reduced “interaction”
complexity correlates with the increased probability of
achieving binding to a chosen target. We measured the
complexity of our libraries using the method of
Nilakantan which is independent of atom, bond types
and connectivity and follows the instinctive view of
molecular complexity. The left figure indicates this
complexity measure plotted against the flat distance of 4
libraries; the original BDDP fragment library (A), the
enhanced fragment library (B), a commercially available
diversity fragment library (C) and a commercially
available shape-biased fragment library (D). As can be
seen, although we have increased shape diversity this
has not significantly increased molecule complexity
(BDDP enhanced mean complexity score 42.4)
compared to our original fragment set (BDDP mean
complexity score 38.0) and this new library is
considerably less complex than a commercially available
shape-biased library (mean complexity score 59.4).
A recent publication highlights data for 145
fragment hits against a range of targets and the
subsequent elaboration of these hits into leads.
We have calculated the complexity of these
published fragment hits (removing molecules lying
outside of fragment property space in order to
maintain consistency with our fragment library).
The plots ion the right highlight the complexity
distribution of both sets of compounds indicating
that our enhanced library (A) has a favourable
profile versus quantified fragment hits against a
broad range of drug discovery targets .
BDDP enhanced 127 fragments hits
Avg. Complexity: 42.4
Avg. MW: 193.1
Avg. Complexity: 42.1
Avg. MW: 206.0
A B
56857 Enumenrated Fragments
Removed duplicates and
unwanted groups
54006 frags left
26994 frags
MaxSim < 0.7 (Tan.) vs
3Dcore-set
1000 frags
Diversity Coefficient:
0.70
Med Chem eyeballing
730 selected
Inter-set
diversity
selection
Inter &Intra-set
diversity
/properties
optimization
Select diverse
structures, fill holes in
existing library, and
optimise a set of
properties
Properties biasing:
Flat distance 1.2 to 1.8
MW 110 to 200
ChEMBL- InMan- SURECHEM
Recap fragmentation, filtering,
enumeration
Synthesis
In addition, we have undertaken a fragmentation of the Broad Institute DOS collection identifying 114 unique scaffolds that lie within
fragment property space and 118 scaffolds that lie outside (MW >300Da). For the scaffolds that lie within fragment property space (114)
we have undertaken an enumeration of these molecules capping basic amine, acid and alcohol functionalities. This set will be filtered
(fragment property criteria) and the shape diversity will be plotted. We will identify and synthesise, in collaboration with the Broad Institute,
all scaffolds sitting in fragment property space and those that we select from the “capped” molecules within the enumerated set that
present different chemical and shape diversity profiles.
We are also working on developing a complexity score (potentially called “Interaction Efficiency”) that should take into account synthetic
accessibility and interaction complexity (e.g. Hann type complexity).
DIVERSITY
COMPLEXITY
Angelo Pugliese on behalf of the Team