The structure elucidation of natural product structures from analytical data, specifically NMR and MS, remains a major challenge. With an enormous palette of NMR experiments to choose from, and supported by breakthrough technologies in hardware, the generation of high quality data to enable even the most complex of natural product structures to be determined is no longer the major hurdle. The challenge is in the analysis of the data. We are in a new era in terms of approaches to structure elucidation: one where computers, databases, and a synergy between scientists and algorithms can offer an accelerated path forward. Software tools are capable of digesting spectroscopic data to elucidate extremely complex natural products. Scientists can now elucidate chemical structures utilizing multinuclear chemical shift data, correlation data from an array of 2D NMR experiments and utilize existing data sets for the purpose of dereplication and computer-assisted structure elucidation. With the explosion of online data especially, in public databases such as PubChem and ChemSpider, many tens of millions of chemical structures are available to seed fragment databases to include in the elucidation process. This presentation will provide an overview of how cheminformatics and chemical databases have been brought together to assist in the identification of natural products. It will include an examination of the state-of-the-art developments in Computer-Assisted Structure Elucidation.
Big Data Helps Elucidate Natural Product Structures
1. Cheminformatics and the Structure
Elucidation of Natural Products
(or can Big Data help elucidate structures!)
Antony Williams
5th
Brazilian Conference of Natural Products
October 27th
2015
ORCID ID:0000-0002-2668-4821
2. A Bit About MeâŚ
⢠NMR spectroscopist by training
⢠Chief Science Officer ACD/Labs Software
⢠One of founders of ChemSpider database
⢠VP for Cheminformatics at RSC
3. Why is this important?
⢠Structure verification and elucidation of
1000s of compounds
⢠NMR predictors with >2,000,000 shifts &
Computer-Assisted Structure Elucidation
⢠Made >20,000,000 chemical compounds
& data freely accessible to the community
⢠Grew the dataset to over >30,000,000
chemicals & used for structure elucidation
⢠Big data can assist structure identification
4. The AgendaâŚ
⢠Dereplication using prior knowledge
⢠The increasing prevalence of online content
⢠Data generation is not the issue. Analysis is.
⢠Computer-assisted structure elucidation
⢠New experiments to improve elucidation
⢠Rethink data-sharing through publications!
5. The AgendaâŚ
⢠Dereplication using prior knowledge
⢠The increasing prevalence of online content
⢠Data generation is not the issue. Analysis is.
⢠Computer-assisted structure elucidation
⢠New experiments to improve elucidation
⢠Rethink data-sharing through publications!
6. âŚfor each natural product dereplicated, at an
average cost of $300 ⌠a savings of $50,000 is
incurred in isolation and identification time.
7. Dereplication
⢠There are ca. 200,000 known natural products
⢠The chance for rediscovery is very high!
⢠We need efficient âdereplicationâ processes
⢠Most general approach â acquire analytical
data and search existing databasesâŚ
8. Scale of Dereplication Exercise
0.5 â 2 mg extract
4 mL agar slope Petri dish
Bioassay & HPLC/UV/MS/NMR evaluation
100 mg sponge
With gratitude to John Blunt
9. Approaches to Dereplication
Desirable to know:
For each compound isolated:
If new then acquire data:
Fully elucidate structure
Taxonomy of organism
Molecular Wt/formula
UV Spectrum
1H NMR Spectrum
[13C NMR Spectrum if possible]
1D and 2D NMR array, MS with
fragmentation, IR, [Îą]D, ORD
Identify as known or new compound. If known STOP.
10. What Databases are Available?
Public
ChemSpider
CSLS
PubChem
NMRShift DB
Naproc-13
SuperNatural
SDBS
Private
All Pharma
GVK Biosciences NPD
UC UV DB
DTU UV DB
Marine NP DB
GVK NP DB
InterMed UV DB
InterMed NMR DB
Novartis IR DB
Natl. Centre Plant Metabol.
CH-NMR-NP
Commercial
SciFinder
SpecInfo
(Crossfire) Beilstein
Crossfire Gmelin
Reaxys
ACD Spectral Libraries
NaprAlert
Dict. Natural Products
Dict. Marine Nat. Prods
AntiBase
MarinLit
AntiMarin
With gratitude to John Blunt
23. Dereplication in MarinLit Online
⢠Can be achieved using
⢠1
H NMR features e.g. number of Me groups
⢠13
C and 1
H chemical shifts
⢠Molecular formula (complete or partial)
⢠UV maxima
⢠Exact mass
⢠OR a combination of any or all of the above.
24.
25. 1
H NMR Spectrum - new or known?
9 Me groups are obvious (from integrals)
Search of MarinLit: 9 Me gave 628 answers
26. 4 Me singlets
4 Me doublets
1 OMe singlet
Aromatic protons
Characterizing the spectrum further
Search MarinLit for 9 total methyls: 4 singlets, 4 doublets,
1 OMe there were 39 answers,
27. COSY spectrum
This implies a 1,2,4-
trisubstituted
aromatic system
A broad singlet coupled/on-coupled to 2 doublets
28. 4 Me singlets 4 Me doublets
1 OMe singlet
4 singlets, 4 doublets, 1 OMe, 1,2,4-trisubstituted aromatic
2 answers only
38. The AgendaâŚ
⢠Dereplication using prior knowledge
⢠Increasing prevalence of free online content
⢠Data generation is not the issue. Analysis is.
⢠Computer-assisted structure elucidation
⢠New experiments to improve elucidation
⢠Rethink data-sharing through publications!
39. Online content also available!
NMRShiftDB http://nmrshiftdb.nmr.uni-koeln.de/
40. Online content also available!
NMRShiftDB http://nmrshiftdb.nmr.uni-koeln.de/
42. ⢠~35 million chemicals and growing
⢠Data sourced from ~500 different sources
⢠Structure centric hub for web-searching
⢠Already used many mass spectrometry
software packages for structure ID
Mining Big Data for
Natural Products???
49. 1
2
⢠fC = full composition (C0-100
H0-100 O0-20 N0-10)
⢠lC= limited composition
(C10-30 H25-40 O0-15 N0-5)
NMR Predictions on ChemSpider
Data for Dereplication
Compound 1 Compound 2
50. Large Fragments can be found
Top 2 hits searched by 1
H chemical shifts. Hits ranked by the
1
H NMR deviation and filtered with C10-30 H25-40 O0-15 N0-
5,Good List and Bad List. Good List was determined from 1
H
shifts, integrals and 1
H-1
H COSY
51. ⢠Search nominal mass 490-491 gave the following results:
ChemSpider : 46,234
SciFinder: 171,904
Dictionary of Natural Products: 537
Dictionary of Marine Natural Products 90
MarinLit: 94
AntiMarin: 131
⢠Molecular formula obtained C30H50O5 (490.3658):
ChemSpider: 208
SciFinder 2,366
Dictionary of Natural Products 238
Dictionary of Marine Natural Products 43
MarinLit 43
AntiMarin 48
Marine Natural Product Example
52. ⢠Search nominal mass 490-491 gave the following results:
ChemSpider : 46,234
SciFinder: 171,904
Dictionary of Natural Products: 537
Dictionary of Marine Natural Products 90
MarinLit: 94
AntiMarin: 131
⢠Molecular formula obtained C30H50O5 (490.3658):
ChemSpider: 208
SciFinder 2,366
Dictionary of Natural Products 238
Dictionary of Marine Natural Products 43
MarinLit 43
AntiMarin 48
Marine Natural Product Example
Focused
Datasets
Valuable
53. Approaches to Dereplication
Desirable to know:
For each compound isolated:
If new then acquire data:
Fully elucidate structure
Taxonomy of organism
Molecular wt/formula
UV Spectrum
1H NMR Spectrum
[13C NMR Spectrum]
1D and 2D NMR array, MS with
fragmentation, IR, [Îą]D, ORD
Identify as known or new compound. If known STOP.
54. The AgendaâŚ
⢠Dereplication using prior knowledge
⢠The increasing prevalence of online content
⢠Data generation is not the issue. Analysis is.
⢠Computer-assisted structure elucidation
⢠New experiments to improve elucidation
⢠Rethink data-sharing through publications!
55. Modern NMR Technologies
⢠Even a basic array of 1D/2D experiments can
provide the relevant data in the majority of cases
⢠The past few years have seen improvements in:
⢠Hardware: Magnets, Probes and RF
⢠Software: Data acquisition and processing
⢠Pulse sequences to probe direct and (very) long-
range homo- and heteronuclear correlations
57. NMR Developments â
30 years of improvements
⢠1984 â First report of cryogenic NMR probe
⢠1986 â HMBC experiment reported
⢠1991 â First commercial 3 mm gradient inverse probes.
⢠1996 â ADEQUATE NMR experiments first reported.
⢠1996 â 1
H-15
N HMBC applications reported.
⢠1998 â Commercial 1.7 mm gradient inverse triple probes.
⢠1999 â First commercial cryogenic NMR probes delivered.
⢠2000 â First 3 mm prototype cryoprobe developed.
⢠2006 â First 1.7 mm MicroCryoProbes⢠delivered.
⢠2009 â Pure shift HSQC experiments developed.
⢠2014 â1,1- and -1,n-HD-ADEQUATE experiments
With gratitude to Gary E. Martin
58. COSY Correlations
Vicinal H-H couplings
Geminal H-H couplings
9
19
N
N
O
O
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
HHH
H
H
H
H
1
2
3
4
5
6
7
8
10
11
1213
14
16
17
18
20
21
22
23
59. HMBC Correlations (8Hz Optimized)
9
17a/b
N
N
O
O
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
HHH
H
H
H
H
1
2
3
4
5
6
7
8
10
11a
1213
1416
18a
20a
21
22
23a
23b
18b
20b
11b
60. Always new sequences coming:
1,1- and -1,n-HD-ADEQUATE
Examples show all three scenarios for 1,1- and 1,n-HD- ADEQUATE
correlations for cryptospirolepine.
61. Adoption can take a long time
HSQC vs. HMQC took > 20 years!
⢠HMQC is an older technique and affords lower F1 resolution.
⢠HSQC is a better technique but SLOWLY supplanted HMQC!
Year Range #HMQC
reports
#HSQC
reports
1990-94 52 10
1995-99 177 39
2000-04 346 111
2005-09 358 266
2010-14 345 423
Totals 1278 849
From: A. Williams, G.E. Martin, & D.J. Rovnyak, âIncreasing the Adoption of Advanced Techniques for the Structure Elucidation of Natural
Products,â from Modern NMR Approaches to the Structure Elucidation of Natural Products, vol. 1, A.J. Williams, G.E. Martin, and D.J. Rovnyak,
Eds., RSC, London, 2015.
62. The AgendaâŚ
⢠Dereplication using prior knowledge
⢠The increasing prevalence of online content
⢠Data generation is not the issue. Analysis is.
⢠Computer-assisted structure elucidation
⢠New experiments to improve elucidation
⢠Rethink data-sharing through publications!
64. 50 years of iterative development
DENDRAL
NMR-SAMS
SENECA
SpecInfo
ACD/Labs
CMC-SE
LSD
OthersâŚ
65. Computer Assisted Structure
Elucidation: Methodology
⢠Interpret data to extract knowledge
⢠Molecular Formula
⢠Integrals
⢠Chemical shifts
⢠Multiplicity
⢠Connectivity
⢠Known fragments
⢠Known exclusions
⢠Search structure space to derive all structures
⢠Rank-order based on set criteria
⢠Predicted chemical shift
⢠Mass Spec Fragmentation
66. Remember how many isomers
C10H17Br2ClO2, 50,502,293 C15H22O2, 138,136,211,624
C15H20O1, 37,568,150,635 C12H12O3, 68,930,547,646
C13H20O3, 14,431,269,166 C11H12N2O2, 3â 1011
<nď 1012
67. Computer-Aided Structure Elucidation
⢠Eliminate âsuperfluousâ isomers by
imposing different structural constraints
⢠Structural constraints are from:
⢠Spectral data of various types:
⢠NMR shifts/multiplicity constrain atom
types; Correlations constrain connectivities
⢠MS constrains formula and fragments
⢠IR constrains functional groups
⢠Prior information â sample origin
⢠Chemical rules â valence, ring size,
charge, etc.
70. 1D & 2D NMR Synchronized
Processing
The Software displays correlations for assigned spectra and structures, and highlights
correlations that are likely to be erroneous.
72. Not that easy thoughâŚ
âNonstandard Correlationsâ
âStandardâ and âNonstandardâ
correlations are experimentally
indistinguishable
If 2D NMR data contain both
âStandardâ and âNonstandardâ
correlations we see
contradictions in interpretation
H
Ă
Ă
Ă
Ă
Ă
H
H
H
H
Ă
Ă
Ă
Ă
Ă
Ă
COSY
HMBC
Standard
75. Structure Generation combined with
Structural and Spectral Filtering
⢠Internal Badlist
⢠User Badlist
⢠User Goodlist
⢠Rings: Obligatory,
Forbidden
⢠Bredtâs Rule
⢠Maximum Match
Factor
⢠Filter Tolerance: Tight,
Medium, Loose
76. Selection of the Preferable Structure
⢠Remove duplicates
⢠1
H and 13
C shift calculation for all output structures
⢠Rank structures in ascending order of average
chemical shift deviation
⢠Structure with minimum d is the most probable.
77. Low Structural Information in 2D
Spectral Data: Use Fragment DB
⢠Number of observed 2D NMR correlations is
smaller than expected
⢠Deficit of hydrogen atoms results in a low number of
correlations
⢠Search in Fragment Library using the 13C NMR
spectrum and embed in the MCD
78.
79. Example of Fragment Usage.
Symmetric molecule C56H78O12S1
CH
5.76
CH
6.42
CH
C
C
C
CH
2.661.38
CH
1.10
1.60
CH2
CH2
CH
CH2
CH
H2C
CH3
0.65HC
CH3
0.88
CH
4.29
CH2
2.36C
C
OC
OH
5.35OH
3.73
CH3
1.12
CH3
1.99
CH2
4.13
OH
4.18
O
O
S
CH
5.76
CH
6.42
CH
C
C
C
CH
2.66
1.38
CH
1.10 1.60
H2C
CH2
CH
CH2
CH
CH2
CH
CH3
0.88
CH
4.29
CH2
2.36
C
C
O C
O
CH2
4.13
OH
4.18
CH3
1.99
CH3
0.65
CH3
1.12
O
OH
5.35
OH
3.73
Ashwaganhanolide
Small number of
correlations
80. 13
C NMR Fragment search - 5524 found
Exp.
Frag.
Fragment # 1
ĐĄ17Đ22Đ2
81. Solution
⢠960 MCDs were created using fragment #1
⢠Structure Generation from 960 MCDs gave 24
structures after filtering and 6 output structures.
⢠Total time was tg= 29 m 30 s
84. Wrong Molecular Formula
Only CHNO in formula assumed
J. Am. Chem. Soc., 2001, 123, 10870-10876.
Tetrahedron Letters, 2002, 43, 5707-5710.
FAB-MS: C31H54N4O8 ESI-MS: C31H54N4SO6
85. Wrong Molecular Formula
Only CHNO in formula assumed
J. Am. Chem. Soc., 2001, 123, 10870-10876.
Tetrahedron Letters, 2002, 43, 5707-5710.
FAB-MS: C31H54N4O8 ESI-MS: C31H54N4SO6
86. Wrong Initial Suggestion
13C shift at 173.50 ppm is O-C=O group
J. Nat. Prod., 2000, 63, 1677-1678.
J. Nat. Prod., 2003, 66, 716-718.
13
C signal at 173 ppm led to COO bias Data compared to a similar compound
87. J. Nat. Prod., 2000, 63, 1677-1678.
J. Nat. Prod., 2003, 66, 716-718.
13
C signal at 173 ppm led to COO
bias
Data compared to a similar compound
Wrong Initial Suggestion
13C shift at 173.50 ppm is O-C=O group
13
C signal at 173 ppm led to COO bias Data compared to a similar compound
88. Misinterpretation of 2D NMR Data
Presence of a guanidine group substituted with 2xCH3 groups
was hypothesized. Absence of an expected HMBC correlation
from methyls to C(159.0) ignored.
J. Org. Chem., 2004, 69,9025-9029.
J. Org. Chem., 2008, 73, 8719-8722.
Misinterpreted HMBC signal Verified by X-ray crystallography
89. Misinterpretation of 2D NMR Data
Presence of a guanidine group substituted with 2xCH3 groups
was hypothesized. Absence of an expected HMBC correlation
from methyls to C(159.0) ignored.
J. Org. Chem., 2004, 69,9025-9029
J. Org. Chem., 2008, 73, 8719-8722
Misinterpreted HMBC signal Verified by X-ray crystallography
94. The AgendaâŚ
⢠Dereplication using prior knowledge
⢠The increasing prevalence of online content
⢠Data generation is not the issue. Analysis is.
⢠Computer-assisted structure elucidation
⢠New experiments to improve elucidation
⢠Rethink data-sharing through publications!
95. New Experiments Influence CASE!
Cervinomycin
O
NO
O
O
OO
OH
O
O
1
4
7
9
10
12
14 16
1922
26
29
30
CH3
(fb)
CH2
CH2CH2
(ob)
C
(ob)
C
CH C
CH
CCH
C
CC
C
(ob)
C
(ob)
C
(ob)
C
C
O O
O
O
O
H
CH3
(ob)
CH3
(ob)
CH
CH
C C
(ob)
C
(ob)
C
(ob)
C
O
O
O
O
96. The Influence of Data on
Elucidation Time: Cervinomycin
COSY,
HSQC
1
H-13
C
HMBC
1
H-13
C
LR-HSQMBC
Structure
Generation
Time
# of
Structure
s
Generated
8 Hz 4 Hz 4 Hz 2 Hz
+ + + 49 h 314
+ + + + 37 h 4
+ + + + 150 s 7
+ + + + + 104 s 1
97. New Experiments
Cryptospirolepine over 20 years!
Inexplicably,
the vinyl proton has no
evident 2
JCH correlation
to the carbonyl! DFT
predicted ~0.3 Hz
coupling!
Synergistic interpretation and
CASE applied to an array of 2D
data elucidated this compound.
Included new 1,1-ADEQUATE
and 1,n-ADEQUATE data.
The absence of a 2
JCH correlation
from the vinyl proton to the
adjacent carbonyl is perplexing.
A new long-range heteronuclear
correlation NMR experiment was
acquired: LR-HSQMBC.
98. Key 1,1-HD-ADEQUATE Correlations
⢠Experiment was
optimized for 60 Hz
⢠Typical range for 1
JCC sp2
couplings is 60-75 Hz
⢠The 2
JCC coupling from
C13 to C1/C11â was
calculated (DFT) to be
15.4 Hz, which would give
a calculated intensity of
0.16 in this experiment.
99. ⢠Experiment optimized for 7 Hz
⢠Typical range for n
JCC couplings is
approximately 2-7 Hz
⢠2
JCC correlations across
carbonyls are typically 10-16 Hz
⢠Correlations were observed,
including the 1
JCC correlations
from C13 to C2 and C13a that
unavoidably âleakâ into all 1,n-
ADEQUATE spectra.
Key 1,1-HD-ADEQUATE Correlations
100. Revision of the [7.5.5] Core of
Cryptospirolepine to a [6.6.5] System
⢠Based on correlations from the 1,1- and -1,n-HD-ADEQUATE spectra,
the [7.5.5] core shown in red was revised to a [6.6.5] system.
⢠The γ-lactam was rearranged to a dehydropiperidinone.
⢠Key correlations were the 1
JCC correlation from the vinyl CH to the
flanking carbonyl and quaternary carbons.
101. Could CASE methods sort out the
structure?
1,1-
ADEQUATE
1,n-
ADEQUATE
1
H-13
C HMBC
IDR
HSQC-
TOCSY
1
H-13
C LR-
HSQMBC
1
H-15
N LR-
HSQMBC
GENERATION
60 Hz 7 Hz 8 Hz 4 Hz 15 ms 2 Hz 4 Hz 2 Hz Time (s)
#
Structures
+ >420 h >10,400
+ + + 140 6816
+ + + + 142 3360
+ + + + 40 522
+ + + + + 45 258
+ + + + + + + + 7 24
⢠Modern â1993â data set used as input failed to lead to
the generation of the structure in 3 week calculation!
⢠More complete input data reduced calculation to secs!
102. The AgendaâŚ
⢠Dereplication using prior knowledge
⢠The increasing prevalence of online content
⢠Data generation is not the issue. Analysis is.
⢠Computer-assisted structure elucidation
⢠New experiments to improve elucidation
⢠Rethink data-sharing through publications!
110. What would it take???
⢠PDFs containing text descriptions of spectra
are problematic for reinterpretation of data
⢠Publishers should host at least high
resolution images of all spectra
⢠Really we need the data files!!!
111. Conclusions
⢠Dereplication is increasingly feasible using
online content
⢠Analysis of data is generally a bigger issue
than data generation itself
⢠Computer-assisted structure elucidation works
⢠Data-sharing associated with publications
needs rethinking
Number of possible isomers can be extremely large. Impossible to create all isomers to relatively simple compounds (number of stars in our galaxy 1011)
Number of possible isomers can be extremely large. Impossible to create all isomers to relatively simple compounds (number of stars in our galaxy 1011)
This is a natural product dataset and software provided a possible molecule within 30 seconds
Magnetic field strength has grown year on year with the related increase in dispersion and sensitivity
A variety of methods have been employed, using IR, NMR, and MS. Different philosophies, methodologies, interfaces.
A variety of methods have been employed, using IR, NMR, and MS. Different philosophies, methodologies, interfaces.
Number of possible isomers can be extremely large. Impossible to create all isomers to relatively simple compounds (number of stars in our galaxy 1011)
We can formulate a general CASE strategy:
Molecular Conneciivity diagram is automatically generated. This can be used for an alternative check on structures.
ĐŃОП 1 to 17: 6-bond ĐĐĐĐĄ ĐŃОП 2 to 7: Đ°=6 COSY
Fragments are ranked in descending order of numbers of carbon atoms. Carbon atoms already possess chemical shifts.
Even when you believe that you are confident of a structure, it can still be helpful to have further confirmation of its rectitude. A CASE program can ensure that all appropriate candidates for a given set of structure space are considered.
Randazzo et al4 isolated a new compound named Halipeptin A. An elemental formula containing only CHNO was assumed: C31H54N4O9 (calculated 627.3969 for C31H55N4O9 with ďm=0.0104, i.e., 16.6 ppm). Structure A contains an unusual fragment (colored in red) in Fig. 1, and was suggested from 2D NMR data. In a follow-up article,5 the same group found the C31H54N4SO7 formula from HRMS, and the correct structure B was suggested.
Both molecular formulae and 2D NMR data were input into ACD/Structure Elucidator. The software generated 303 structures in 36 seconds. Ranking the generated structures using 13C chemical shift prediction placed the correct structure (B) in the first position
Randazzo, A.; Bifulco, G.; Giannini, C.; Bucci, M.; Debitus, C.; Cirino, G.; Gomez-Paloma, L., 4. J. Am. Chem. Soc., 123:10870-10876, 2001.
Monica, C. D.; Randazzo, A.; Bifulco, G.; Cimino, P.; Aquino, M.; Izzo, I.; De Riccardisc, F.; Gomez-Paloma, L., 5. Tetrahedron Letters, 43:5707-5710, 2002.
Poster: Poster: Are Pitfalls Unavoidable During the Structure Elucidation of New Organic Compounds? M. E. Elyashberg, K. A. Blinov, S.G. Molodtsov, A.J. Williams, Ryan Sasaki.
Randazzo et al4 isolated a new compound named Halipeptin A. An elemental formula containing only CHNO was assumed: C31H54N4O9 (calculated 627.3969 for C31H55N4O9 with ďm=0.0104, i.e., 16.6 ppm). Structure A contains an unusual fragment (colored in red) in Fig. 1, and was suggested from 2D NMR data. In a follow-up article,5 the same group found the C31H54N4SO7 formula from HRMS, and the correct structure B was suggested.
Both molecular formulae and 2D NMR data were input into ACD/Structure Elucidator. The software generated 303 structures in 36 seconds. Ranking the generated structures using 13C chemical shift prediction placed the correct structure (B) in the first position
Randazzo, A.; Bifulco, G.; Giannini, C.; Bucci, M.; Debitus, C.; Cirino, G.; Gomez-Paloma, L., 4. J. Am. Chem. Soc., 123:10870-10876, 2001.
Monica, C. D.; Randazzo, A.; Bifulco, G.; Cimino, P.; Aquino, M.; Izzo, I.; De Riccardisc, F.; Gomez-Paloma, L., 5. Tetrahedron Letters, 43:5707-5710, 2002.
Poster: Poster: Are Pitfalls Unavoidable During the Structure Elucidation of New Organic Compounds? M. E. Elyashberg, K. A. Blinov, S.G. Molodtsov, A.J. Williams, Ryan Sasaki.
Sakuno et al6 isolated a natural product with molecular formula C20H18O6. Authors9 postulated that the 13C chemical shift at 173.50 ppm was associated with the resonance of the O-C=O group, and with this assumption structure A (Fig. 2) was suggested. Wipf and Kerekes7 compared the NMR and IR spectra of this compound with a number of spectra of its structural relatives and proved that it was identical with viridol (structure B).
The 2D NMR data from article6 were input into ACD/Structure Elucidator. No assumptions were used. The software generated 272 structures in 1 min 40 sec. Ranking the generated structures using 13C chemical shift prediction placed the correct structure Viridol (B) in the first position. The originally proposed structure A was placed in the second position but with a large difference in chemical shift deviation.
Sakuno, E.; Yabe, K.; Hamasaki, T.; Nakajima, H., 6. J. Nat. Prod., 63:1677-1678, 2000.
Wipf, P.; Kerekes, A. D., 7. J. Nat. Prod., 66:716-718, 2003.
Sakuno et al6 isolated a natural product with molecular formula C20H18O6. Authors9 postulated that the 13C chemical shift at 173.50 ppm was associated with the resonance of the O-C=O group, and with this assumption structure A (Fig. 2) was suggested. Wipf and Kerekes7 compared the NMR and IR spectra of this compound with a number of spectra of its structural relatives and proved that it was identical with viridol (structure B).
The 2D NMR data from article6 were input into ACD/Structure Elucidator. No assumptions were used. The software generated 272 structures in 1 min 40 sec. Ranking the generated structures using 13C chemical shift prediction placed the correct structure Viridol (B) in the first position. The originally proposed structure A was placed in the second position but with a large difference in chemical shift deviation.
Sakuno, E.; Yabe, K.; Hamasaki, T.; Nakajima, H., 6. J. Nat. Prod., 63:1677-1678, 2000.
Wipf, P.; Kerekes, A. D., 7. J. Nat. Prod., 66:716-718, 2003.
Ralifo and Crews8 reported on the separation of (-)-spiroleucettadine (C20H23N3O4), structure A (Fig. 3). The presence of a guanidine group (ď¤C 159.0) substituted with two CH3 groups was hypothesized. The absence of an expected HMBC correlation from one of methyls to C(159.0) was ignored. Several attempts to synthesize this compound were undertaken but without any success. Questions regarding the original structure elucidation process therefore arose. Crewsâs group9 fulfilled a successful re-isolation of spiroleucettadine, and X-ray analysis established the correct structure of spiroleucettadine, shown as B, Fig. 3. It was revealed that the postulation of the presence of a guanidine group was erroneous, and one HMBC correlation was misinterpreted in the previous work.
When the old 2D NMR data were used in ACD/Structure Elucidator, it was immediately found that the original structure produced deviations that were too large for a positive identification. When the 2D NMR data from the latter study were used with the software, the correct structure was generated and present in the first position after ranking using 13C chemical shift prediction.
Ralifo, P.; Crews, P., 8. J. Org. Chem., 69:9025-9029, 2004.
White, K. N.; Amagata, T.; Oliver, A. G.; Tenney, K.; Wenzel, P. J.; Crews, P., 9. J. Org. Chem., 73:8719-8722, 2008.
Ralifo and Crews8 reported on the separation of (-)-spiroleucettadine (C20H23N3O4), structure A (Fig. 3). The presence of a guanidine group (ď¤C 159.0) substituted with two CH3 groups was hypothesized. The absence of an expected HMBC correlation from one of methyls to C(159.0) was ignored. Several attempts to synthesize this compound were undertaken but without any success. Questions regarding the original structure elucidation process therefore arose. Crewsâs group9 fulfilled a successful re-isolation of spiroleucettadine, and X-ray analysis established the correct structure of spiroleucettadine, shown as B, Fig. 3. It was revealed that the postulation of the presence of a guanidine group was erroneous, and one HMBC correlation was misinterpreted in the previous work.
When the old 2D NMR data were used in ACD/Structure Elucidator, it was immediately found that the original structure produced deviations that were too large for a positive identification. When the 2D NMR data from the latter study were used with the software, the correct structure was generated and present in the first position after ranking using 13C chemical shift prediction.
Ralifo, P.; Crews, P., 8. J. Org. Chem., 69:9025-9029, 2004.
White, K. N.; Amagata, T.; Oliver, A. G.; Tenney, K.; Wenzel, P. J.; Crews, P., 9. J. Org. Chem., 73:8719-8722, 2008.
Characteristics of known drug space. Natural products, their derivatives and synthetic drugs
Results obtained from various Structure Elucidator CASE program computation runs for various sets of input data for the xanthone antibiotic cervinomycin A2 (see Figure X.17B for the structure). As can be readily seen from the first two rows of the table, restricing the input data file to data that is likely to have primarily 2JCH and 3JCH correlations with perhaps only sparse 4JCH correlations (rows 1 and 2) leads to lengthy computation runs. However, when 2 Hz optimized LR-HSQMBC data, which can contain 4JCH â 6JCH correlations (rows 3 and 4), are included in the data input file computation times drop precipitously and the number of structures generated is also significantly reduced.