SlideShare a Scribd company logo
1 of 114
Cheminformatics and the Structure
Elucidation of Natural Products
(or can Big Data help elucidate structures!)
Antony Williams
5th
Brazilian Conference of Natural Products
October 27th
2015
ORCID ID:0000-0002-2668-4821
A Bit About Me…
• NMR spectroscopist by training
• Chief Science Officer ACD/Labs Software
• One of founders of ChemSpider database
• VP for Cheminformatics at RSC
Why is this important?
• Structure verification and elucidation of
1000s of compounds
• NMR predictors with >2,000,000 shifts &
Computer-Assisted Structure Elucidation
• Made >20,000,000 chemical compounds
& data freely accessible to the community
• Grew the dataset to over >30,000,000
chemicals & used for structure elucidation
• Big data can assist structure identification
The Agenda…
• Dereplication using prior knowledge
• The increasing prevalence of online content
• Data generation is not the issue. Analysis is.
• Computer-assisted structure elucidation
• New experiments to improve elucidation
• Rethink data-sharing through publications!
The Agenda…
• Dereplication using prior knowledge
• The increasing prevalence of online content
• Data generation is not the issue. Analysis is.
• Computer-assisted structure elucidation
• New experiments to improve elucidation
• Rethink data-sharing through publications!
…for each natural product dereplicated, at an
average cost of $300 … a savings of $50,000 is
incurred in isolation and identification time.
Dereplication
• There are ca. 200,000 known natural products
• The chance for rediscovery is very high!
• We need efficient “dereplication” processes
• Most general approach – acquire analytical
data and search existing databases…
Scale of Dereplication Exercise
0.5 – 2 mg extract
4 mL agar slope Petri dish
Bioassay & HPLC/UV/MS/NMR evaluation
100 mg sponge
With gratitude to John Blunt
Approaches to Dereplication
Desirable to know:
For each compound isolated:
If new then acquire data:
Fully elucidate structure
Taxonomy of organism
Molecular Wt/formula
UV Spectrum
1H NMR Spectrum
[13C NMR Spectrum if possible]
1D and 2D NMR array, MS with
fragmentation, IR, [Îą]D, ORD
Identify as known or new compound. If known STOP.
What Databases are Available?
Public
ChemSpider
CSLS
PubChem
NMRShift DB
Naproc-13
SuperNatural
SDBS
Private
All Pharma
GVK Biosciences NPD
UC UV DB
DTU UV DB
Marine NP DB
GVK NP DB
InterMed UV DB
InterMed NMR DB
Novartis IR DB
Natl. Centre Plant Metabol.
CH-NMR-NP
Commercial
SciFinder
SpecInfo
(Crossfire) Beilstein
Crossfire Gmelin
Reaxys
ACD Spectral Libraries
NaprAlert
Dict. Natural Products
Dict. Marine Nat. Prods
AntiBase
MarinLit
AntiMarin
With gratitude to John Blunt
PU10-F2
m/z
220 240 260 280 300 320 340 360 380 400 420 440 460 480 500 520 540 560
%
0
100
SSA0006 291 (3.284) Cm (241:343) 1: TOF MS ES+
3.92e4261.564
241.060
241.560
241.974
242.062
262.071
481.122262.517
304.098
263.024
282.074
465.101305.100
482.127
483.122
511.102
M+H
Search MW = 480 in Dict. Nat. Prod.
562 hits out of 230,000 compounds!!!
MW 480
MF = C28H36N2O5
Nominal Mass Searching
Molecular Formula Searching
Search MF=C28H36N2O5
in Dict. Nat. Prod.
2 hits out of 230,000
compounds!!!
Compare UV spectrum and 1H NMR features
How many isomers for a formula?
C10H17Br2ClO2, 50,502,293 C15H22O2, 138,136,211,624
C15H20O1, 37,568,150,635 C12H12O3, 68,930,547,646
C13H20O3, 14,431,269,166 C11H12N2O2, 3⋅1011
<n1012
How many isomers for a formula?
C10H17Br2ClO2, 50,502,293 C15H22O2, 138,136,211,624
C15H20O1, 37,568,150,635 C12H12O3, 68,930,547,646
C13H20O3, 14,431,269,166 C11H12N2O2, 3⋅1011
<n1012
1 x triplet methyl
3 x methoxy
3 x olefinic H
solvent
ppm1234567
6.42
6.27
6.24
19.96
15.15
24.03
21.93
1
H NMR spectrum, CD3OD
Marinlit Dereplication
• 1 of 5 hits from 230,000 compounds
• The ONLY hit if MW = 480 included
NMR Features Dereplication
Marinlit Enhanced Features
1H/13C Predicted Spectra
HSQC-DEPT Predicted Spectrum
Dereplication in MarinLit Online
• Can be achieved using
• 1
H NMR features e.g. number of Me groups
• 13
C and 1
H chemical shifts
• Molecular formula (complete or partial)
• UV maxima
• Exact mass
• OR a combination of any or all of the above.
1
H NMR Spectrum - new or known?
9 Me groups are obvious (from integrals)
Search of MarinLit: 9 Me gave 628 answers
4 Me singlets
4 Me doublets
1 OMe singlet
Aromatic protons
Characterizing the spectrum further
Search MarinLit for 9 total methyls: 4 singlets, 4 doublets,
1 OMe there were 39 answers,
COSY spectrum
This implies a 1,2,4-
trisubstituted
aromatic system
A broad singlet coupled/on-coupled to 2 doublets
4 Me singlets 4 Me doublets
1 OMe singlet
4 singlets, 4 doublets, 1 OMe, 1,2,4-trisubstituted aromatic
2 answers only
Comparison of
NMR data
confirmed that the
unknown had this
structure
Commercial Assigned Databases
>320,000 assigned
chemical structures
>2,500,000 shifts
Searching Assigned Databases
• mI = 306.1 – 306.2
• 591/322,319 hits
Searching Assigned Databases
• 10 13
C shifts to +/- 3.0ppm
• 5 1
H shifts to +/- 0.3ppm
• 7 hits – very different
Including 15
N, 19
F and 31
P data
Experimental vs. Experimental
Differences between C13 shifts are generally small
Experimental vs. Experimental
Differences between C13 shifts are generally small
Searching experimental data
30 seconds from peak-picking to suggested molecules
Experimental vs. Predicted
Differences between exp. and pred. C13 shifts can be
larger – useful to limit number of shifts searched
The Agenda…
• Dereplication using prior knowledge
• Increasing prevalence of free online content
• Data generation is not the issue. Analysis is.
• Computer-assisted structure elucidation
• New experiments to improve elucidation
• Rethink data-sharing through publications!
Online content also available!
NMRShiftDB http://nmrshiftdb.nmr.uni-koeln.de/
Online content also available!
NMRShiftDB http://nmrshiftdb.nmr.uni-koeln.de/
Online content also available!
www.nmrdb.org
• ~35 million chemicals and growing
• Data sourced from ~500 different sources
• Structure centric hub for web-searching
• Already used many mass spectrometry
software packages for structure ID
Mining Big Data for
Natural Products???
ChemSpider Interface – no NMR
26/35,000,000 Million Hits
Ranked by # of References
Top Ranked Hit
What can I find on ChemSpider?
What can I find? All for free…
NMR Predictions on ChemSpider
Data for Dereplication
1
2
• fC = full composition (C0-100
H0-100 O0-20 N0-10)
• lC= limited composition
(C10-30 H25-40 O0-15 N0-5)
NMR Predictions on ChemSpider
Data for Dereplication
Compound 1 Compound 2
Large Fragments can be found
Top 2 hits searched by 1
H chemical shifts. Hits ranked by the
1
H NMR deviation and filtered with C10-30 H25-40 O0-15 N0-
5,Good List and Bad List. Good List was determined from 1
H
shifts, integrals and 1
H-1
H COSY
• Search nominal mass 490-491 gave the following results:
ChemSpider : 46,234
SciFinder: 171,904
Dictionary of Natural Products: 537
Dictionary of Marine Natural Products 90
MarinLit: 94
AntiMarin: 131
• Molecular formula obtained C30H50O5 (490.3658):
ChemSpider: 208
SciFinder 2,366
Dictionary of Natural Products 238
Dictionary of Marine Natural Products 43
MarinLit 43
AntiMarin 48
Marine Natural Product Example
• Search nominal mass 490-491 gave the following results:
ChemSpider : 46,234
SciFinder: 171,904
Dictionary of Natural Products: 537
Dictionary of Marine Natural Products 90
MarinLit: 94
AntiMarin: 131
• Molecular formula obtained C30H50O5 (490.3658):
ChemSpider: 208
SciFinder 2,366
Dictionary of Natural Products 238
Dictionary of Marine Natural Products 43
MarinLit 43
AntiMarin 48
Marine Natural Product Example
Focused
Datasets
Valuable
Approaches to Dereplication
Desirable to know:
For each compound isolated:
If new then acquire data:
Fully elucidate structure
Taxonomy of organism
Molecular wt/formula
UV Spectrum
1H NMR Spectrum
[13C NMR Spectrum]
1D and 2D NMR array, MS with
fragmentation, IR, [Îą]D, ORD
Identify as known or new compound. If known STOP.
The Agenda…
• Dereplication using prior knowledge
• The increasing prevalence of online content
• Data generation is not the issue. Analysis is.
• Computer-assisted structure elucidation
• New experiments to improve elucidation
• Rethink data-sharing through publications!
Modern NMR Technologies
• Even a basic array of 1D/2D experiments can
provide the relevant data in the majority of cases
• The past few years have seen improvements in:
• Hardware: Magnets, Probes and RF
• Software: Data acquisition and processing
• Pulse sequences to probe direct and (very) long-
range homo- and heteronuclear correlations
Magnetic Field Strength over time
NMR Developments –
30 years of improvements
• 1984 – First report of cryogenic NMR probe
• 1986 – HMBC experiment reported
• 1991 – First commercial 3 mm gradient inverse probes.
• 1996 – ADEQUATE NMR experiments first reported.
• 1996 – 1
H-15
N HMBC applications reported.
• 1998 – Commercial 1.7 mm gradient inverse triple probes.
• 1999 – First commercial cryogenic NMR probes delivered.
• 2000 – First 3 mm prototype cryoprobe developed.
• 2006 – First 1.7 mm MicroCryoProbes™ delivered.
• 2009 – Pure shift HSQC experiments developed.
• 2014 –1,1- and -1,n-HD-ADEQUATE experiments
With gratitude to Gary E. Martin
COSY Correlations
Vicinal H-H couplings
Geminal H-H couplings
9
19
N
N
O
O
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
HHH
H
H
H
H
1
2
3
4
5
6
7
8
10
11
1213
14
16
17
18
20
21
22
23
HMBC Correlations (8Hz Optimized)
9
17a/b
N
N
O
O
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
HHH
H
H
H
H
1
2
3
4
5
6
7
8
10
11a
1213
1416
18a
20a
21
22
23a
23b
18b
20b
11b
Always new sequences coming:
1,1- and -1,n-HD-ADEQUATE
Examples show all three scenarios for 1,1- and 1,n-HD- ADEQUATE
correlations for cryptospirolepine.
Adoption can take a long time
HSQC vs. HMQC took > 20 years!
• HMQC is an older technique and affords lower F1 resolution.
• HSQC is a better technique but SLOWLY supplanted HMQC!
Year Range #HMQC
reports
#HSQC
reports
1990-94 52 10
1995-99 177 39
2000-04 346 111
2005-09 358 266
2010-14 345 423
Totals 1278 849
From: A. Williams, G.E. Martin, & D.J. Rovnyak, “Increasing the Adoption of Advanced Techniques for the Structure Elucidation of Natural
Products,” from Modern NMR Approaches to the Structure Elucidation of Natural Products, vol. 1, A.J. Williams, G.E. Martin, and D.J. Rovnyak,
Eds., RSC, London, 2015.
The Agenda…
• Dereplication using prior knowledge
• The increasing prevalence of online content
• Data generation is not the issue. Analysis is.
• Computer-assisted structure elucidation
• New experiments to improve elucidation
• Rethink data-sharing through publications!
AI Research in 1965…
50 years of iterative development
DENDRAL
NMR-SAMS
SENECA
SpecInfo
ACD/Labs
CMC-SE
LSD
Others…
Computer Assisted Structure
Elucidation: Methodology
• Interpret data to extract knowledge
• Molecular Formula
• Integrals
• Chemical shifts
• Multiplicity
• Connectivity
• Known fragments
• Known exclusions
• Search structure space to derive all structures
• Rank-order based on set criteria
• Predicted chemical shift
• Mass Spec Fragmentation
Remember how many isomers
C10H17Br2ClO2, 50,502,293 C15H22O2, 138,136,211,624
C15H20O1, 37,568,150,635 C12H12O3, 68,930,547,646
C13H20O3, 14,431,269,166 C11H12N2O2, 3⋅1011
<n1012
Computer-Aided Structure Elucidation
• Eliminate “superfluous” isomers by
imposing different structural constraints
• Structural constraints are from:
• Spectral data of various types:
• NMR shifts/multiplicity constrain atom
types; Correlations constrain connectivities
• MS constrains formula and fragments
• IR constrains functional groups
• Prior information – sample origin
• Chemical rules – valence, ring size,
charge, etc.
CH3
17.60
CH3
18.13 CH3
20.20
CH3
31.40
18.09
19.10
19.50
19.50
28.20
29.20
41.20
34.30
42.20
63.30
33.40
61.20
67.80
68.10
80.40
174.10
OH
O
O
O
COSY
1
H - 1
H coupling
through 3 bonds
HMBC
1
H – 13
C coupling
through 2/3 bonds
2D NMR spectra: Extraction of
Structural Information: COSY/HMBC
1D & 2D NMR Synchronized
Processing
The Software displays correlations for assigned spectra and structures, and highlights
correlations that are likely to be erroneous.
CH3
17.60(fb)
CH2
18.09(fb)
CH3
18.13(fb)
CH2
19.10(fb)
CH2
19.50(fb)
CH2
19.50(fb)
CH3
20.20(fb)
CH2
28.20(fb)
CH2
29.20(fb)
CH
34.30(fb)
CH2
41.20(fb)
CH
42.20(fb)
C
61.20
CH
63.30 C
67.80
C
68.10
C
80.40
C
174.10
O
H
CH3
31.40(fb)
C
33.40(fb)
O
O
O
Molecular Connectivity Diagram (MCD)
Molecular Formula C20H30O4
Use spectroscopists experience to add bonds:
Create C=O, COOH, Ring systems, etc.
Not that easy though…
“Nonstandard Correlations”
“Standard” and “Nonstandard”
correlations are experimentally
indistinguishable
If 2D NMR data contain both
“Standard” and “Nonstandard”
correlations we see
contradictions in interpretation
H
Ñ
Ñ
Ñ
Ñ
Ñ
H
H
H
H
Ñ
Ñ
Ñ
Ñ
Ñ
Ñ
COSY
HMBC
Standard
CH3
1
CH3
2 CH3
3
CH3
4
5 67
8
9
CH2
10
11
12
13
14
15
16
17
18
19
20
OH
21
OH
22
Non-standard Correlation Example
6-bond
6-bond
Strychnine Non-standard Correlations
9
17a/b
N
N
O
O
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
HHH
H
H
H
H
1
2
3
4
5
6
7
8
10
11a
1213
1416
18a
20a
21
22
23a
23b
18b
20b
11b
19
9
17a/b
N
N
O
O
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
HHH
H
H
H
H
1
2
3
4
5
6
7
8
10
11a
1213
1416
18a
20a
21
22
23a
23b
18b
20b
11b
9
17a/b
N
N
O
O
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
HHH
H
H
H
H
1
2
3
4
5
6
7
8
10
11a
1213
1416
18a
20a
21
22
23a
23b
18b
20b
11b
9
19N
N
O
O
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
HHH
H
H
H
H
1
2
3
4
5
6
7
8
10
11a
12
13
14
16
17
18b
20
21
22
23
11b
2
JC
2
JCH
4
JCH
3
JCH
5
JCH
Structure Generation combined with
Structural and Spectral Filtering
• Internal Badlist
• User Badlist
• User Goodlist
• Rings: Obligatory,
Forbidden
• Bredt’s Rule
• Maximum Match
Factor
• Filter Tolerance: Tight,
Medium, Loose
Selection of the Preferable Structure
• Remove duplicates
• 1
H and 13
C shift calculation for all output structures
• Rank structures in ascending order of average
chemical shift deviation
• Structure with minimum d is the most probable.
Low Structural Information in 2D
Spectral Data: Use Fragment DB
• Number of observed 2D NMR correlations is
smaller than expected
• Deficit of hydrogen atoms results in a low number of
correlations
• Search in Fragment Library using the 13C NMR
spectrum and embed in the MCD
Example of Fragment Usage.
Symmetric molecule C56H78O12S1
CH
5.76
CH
6.42
CH
C
C
C
CH
2.661.38
CH
1.10
1.60
CH2
CH2
CH
CH2
CH
H2C
CH3
0.65HC
CH3
0.88
CH
4.29
CH2
2.36C
C
OC
OH
5.35OH
3.73
CH3
1.12
CH3
1.99
CH2
4.13
OH
4.18
O
O
S
CH
5.76
CH
6.42
CH
C
C
C
CH
2.66
1.38
CH
1.10 1.60
H2C
CH2
CH
CH2
CH
CH2
CH
CH3
0.88
CH
4.29
CH2
2.36
C
C
O C
O
CH2
4.13
OH
4.18
CH3
1.99
CH3
0.65
CH3
1.12
O
OH
5.35
OH
3.73
Ashwaganhanolide
Small number of
correlations
13
C NMR Fragment search - 5524 found
Exp.
Frag.
Fragment # 1
С17Н22О2
Solution
• 960 MCDs were created using fragment #1
• Structure Generation from 960 MCDs gave 24
structures after filtering and 6 output structures.
• Total time was tg= 29 m 30 s
Compare Hypotheses with Data
Wrong Molecular Formula
Only CHNO in formula assumed
J. Am. Chem. Soc., 2001, 123, 10870-10876.
Tetrahedron Letters, 2002, 43, 5707-5710.
FAB-MS: C31H54N4O8 ESI-MS: C31H54N4SO6
Wrong Molecular Formula
Only CHNO in formula assumed
J. Am. Chem. Soc., 2001, 123, 10870-10876.
Tetrahedron Letters, 2002, 43, 5707-5710.
FAB-MS: C31H54N4O8 ESI-MS: C31H54N4SO6
Wrong Initial Suggestion
13C shift at 173.50 ppm is O-C=O group
J. Nat. Prod., 2000, 63, 1677-1678.
J. Nat. Prod., 2003, 66, 716-718.
13
C signal at 173 ppm led to COO bias Data compared to a similar compound
J. Nat. Prod., 2000, 63, 1677-1678.
J. Nat. Prod., 2003, 66, 716-718.
13
C signal at 173 ppm led to COO
bias
Data compared to a similar compound
Wrong Initial Suggestion
13C shift at 173.50 ppm is O-C=O group
13
C signal at 173 ppm led to COO bias Data compared to a similar compound
Misinterpretation of 2D NMR Data
Presence of a guanidine group substituted with 2xCH3 groups
was hypothesized. Absence of an expected HMBC correlation
from methyls to C(159.0) ignored.
J. Org. Chem., 2004, 69,9025-9029.
J. Org. Chem., 2008, 73, 8719-8722.
Misinterpreted HMBC signal Verified by X-ray crystallography
Misinterpretation of 2D NMR Data
Presence of a guanidine group substituted with 2xCH3 groups
was hypothesized. Absence of an expected HMBC correlation
from methyls to C(159.0) ignored.
J. Org. Chem., 2004, 69,9025-9029
J. Org. Chem., 2008, 73, 8719-8722
Misinterpreted HMBC signal Verified by X-ray crystallography
J. Cheminf. 2012, 4:5
Number of Skeletal Atoms
J. Cheminf. 2012, 4:5
MW Distribution
J. Cheminf. 2012, 4:5
The Agenda…
• Dereplication using prior knowledge
• The increasing prevalence of online content
• Data generation is not the issue. Analysis is.
• Computer-assisted structure elucidation
• New experiments to improve elucidation
• Rethink data-sharing through publications!
New Experiments Influence CASE!
Cervinomycin
O
NO
O
O
OO
OH
O
O
1
4
7
9
10
12
14 16
1922
26
29
30
CH3
(fb)
CH2
CH2CH2
(ob)
C
(ob)
C
CH C
CH
CCH
C
CC
C
(ob)
C
(ob)
C
(ob)
C
C
O O
O
O
O
H
CH3
(ob)
CH3
(ob)
CH
CH
C C
(ob)
C
(ob)
C
(ob)
C
O
O
O
O
The Influence of Data on
Elucidation Time: Cervinomycin
COSY,
HSQC
1
H-13
C
HMBC
1
H-13
C
LR-HSQMBC
Structure
Generation
Time
# of
Structure
s
Generated
8 Hz 4 Hz 4 Hz 2 Hz
+ + + 49 h 314
+ + + + 37 h 4
+ + + + 150 s 7
+ + + + + 104 s 1
New Experiments
Cryptospirolepine over 20 years!
Inexplicably,
the vinyl proton has no
evident 2
JCH correlation
to the carbonyl! DFT
predicted ~0.3 Hz
coupling!
Synergistic interpretation and
CASE applied to an array of 2D
data elucidated this compound.
Included new 1,1-ADEQUATE
and 1,n-ADEQUATE data.
The absence of a 2
JCH correlation
from the vinyl proton to the
adjacent carbonyl is perplexing.
A new long-range heteronuclear
correlation NMR experiment was
acquired: LR-HSQMBC.
Key 1,1-HD-ADEQUATE Correlations
• Experiment was
optimized for 60 Hz
• Typical range for 1
JCC sp2
couplings is 60-75 Hz
• The 2
JCC coupling from
C13 to C1/C11’ was
calculated (DFT) to be
15.4 Hz, which would give
a calculated intensity of
0.16 in this experiment.
• Experiment optimized for 7 Hz
• Typical range for n
JCC couplings is
approximately 2-7 Hz
• 2
JCC correlations across
carbonyls are typically 10-16 Hz
• Correlations were observed,
including the 1
JCC correlations
from C13 to C2 and C13a that
unavoidably “leak” into all 1,n-
ADEQUATE spectra.
Key 1,1-HD-ADEQUATE Correlations
Revision of the [7.5.5] Core of
Cryptospirolepine to a [6.6.5] System
• Based on correlations from the 1,1- and -1,n-HD-ADEQUATE spectra,
the [7.5.5] core shown in red was revised to a [6.6.5] system.
• The γ-lactam was rearranged to a dehydropiperidinone.
• Key correlations were the 1
JCC correlation from the vinyl CH to the
flanking carbonyl and quaternary carbons.
Could CASE methods sort out the
structure?
1,1-
ADEQUATE
1,n-
ADEQUATE
1
H-13
C HMBC
IDR
HSQC-
TOCSY
1
H-13
C LR-
HSQMBC
1
H-15
N LR-
HSQMBC
GENERATION
60 Hz 7 Hz 8 Hz 4 Hz 15 ms 2 Hz 4 Hz 2 Hz Time (s)
#
Structures
+ >420 h >10,400
+ + + 140 6816
+ + + + 142 3360
+ + + + 40 522
+ + + + + 45 258
+ + + + + + + + 7 24
• Modern “1993” data set used as input failed to lead to
the generation of the structure in 3 week calculation!
• More complete input data reduced calculation to secs!
The Agenda…
• Dereplication using prior knowledge
• The increasing prevalence of online content
• Data generation is not the issue. Analysis is.
• Computer-assisted structure elucidation
• New experiments to improve elucidation
• Rethink data-sharing through publications!
Errors in published structures…
ESI – Text Spectra
ChemSpider ID 24528095 H1 NMR
ChemSpider ID 24528095 C13 NMR
ChemSpider ID 24528095 HHCOSY
ChemSpider ID 24528095 HSQC
ChemSpider ID 24528095 HMBC
What would it take???
• PDFs containing text descriptions of spectra
are problematic for reinterpretation of data
• Publishers should host at least high
resolution images of all spectra
• Really we need the data files!!!
Conclusions
• Dereplication is increasingly feasible using
online content
• Analysis of data is generally a bigger issue
than data generation itself
• Computer-assisted structure elucidation works
• Data-sharing associated with publications
needs rethinking
Books of Interest?
Acknowledgements
RSC/ChemSpider/Marinlit
•John Blunt
•Serin Dabb
•Valery Tkachenko
NMR (Book) Collaborators
•Gary Martin
•David Rovnyak
ACD/Labs
•Structure Elucidator
•Mikhail Elyashberg
•Kirill Blinov
•Arvin Moser
•Patrick Wheeler
Thank you
ORCID: 0000-0002-2668-4821
Twitter: @ChemConnector
Personal Blog: www.chemconnector.com
SLIDES: www.slideshare.net/AntonyWilliams

More Related Content

What's hot

Radiation detection &amp; measurement
Radiation detection &amp; measurementRadiation detection &amp; measurement
Radiation detection &amp; measurementmahbubul hassan
 
Small angle x ray scattering
Small angle x ray scatteringSmall angle x ray scattering
Small angle x ray scatteringKalyan Jyoti Kalita
 
Chapter-11-Atomic-Mass-Spectrometry (1).ppt
Chapter-11-Atomic-Mass-Spectrometry (1).pptChapter-11-Atomic-Mass-Spectrometry (1).ppt
Chapter-11-Atomic-Mass-Spectrometry (1).pptNhokRean
 
Carbon dots characterization and applications
Carbon dots characterization and applicationsCarbon dots characterization and applications
Carbon dots characterization and applicationspchandrasekaran
 
Department of chemistry and chemical sciences
Department of chemistry and chemical sciencesDepartment of chemistry and chemical sciences
Department of chemistry and chemical sciencesKANUPRIYASINGH19
 
Presentation on nano technology
Presentation on nano technologyPresentation on nano technology
Presentation on nano technologyharshid panchal
 
Mass Spectroscopy
Mass SpectroscopyMass Spectroscopy
Mass SpectroscopyMVS Rao
 
Spectroscopy Introduction
Spectroscopy IntroductionSpectroscopy Introduction
Spectroscopy IntroductionDIVYA_C
 
Quantum calculations and calculational chemistry
Quantum calculations and calculational chemistryQuantum calculations and calculational chemistry
Quantum calculations and calculational chemistrynazanin25
 
Linear combination of tomic orbitals
Linear combination of tomic orbitalsLinear combination of tomic orbitals
Linear combination of tomic orbitalsudhay roopavath
 
Bio nano (Top-down bottom up approach)
Bio nano (Top-down bottom up approach) Bio nano (Top-down bottom up approach)
Bio nano (Top-down bottom up approach) ManojKumar6080
 
LASER SPECTROSCOPY
LASER SPECTROSCOPYLASER SPECTROSCOPY
LASER SPECTROSCOPYPrasanth Nair
 
Introduction to spectroscopy
Introduction to spectroscopyIntroduction to spectroscopy
Introduction to spectroscopyUSTC, Hefei, PRC
 
Applications of raman spectroscopy
Applications of raman spectroscopyApplications of raman spectroscopy
Applications of raman spectroscopykaavyabalachandran
 
Electron Spin Resonance (ESR) Spectroscopy
Electron Spin Resonance (ESR) SpectroscopyElectron Spin Resonance (ESR) Spectroscopy
Electron Spin Resonance (ESR) SpectroscopyHaris Saleem
 

What's hot (20)

Plasma
PlasmaPlasma
Plasma
 
Radiation detection &amp; measurement
Radiation detection &amp; measurementRadiation detection &amp; measurement
Radiation detection &amp; measurement
 
RADIOACTIVE DECAY AND HALF-LIFE CONCEPTS
RADIOACTIVE DECAY AND HALF-LIFE CONCEPTSRADIOACTIVE DECAY AND HALF-LIFE CONCEPTS
RADIOACTIVE DECAY AND HALF-LIFE CONCEPTS
 
Small angle x ray scattering
Small angle x ray scatteringSmall angle x ray scattering
Small angle x ray scattering
 
Chapter-11-Atomic-Mass-Spectrometry (1).ppt
Chapter-11-Atomic-Mass-Spectrometry (1).pptChapter-11-Atomic-Mass-Spectrometry (1).ppt
Chapter-11-Atomic-Mass-Spectrometry (1).ppt
 
Seminar on nmr
Seminar on nmrSeminar on nmr
Seminar on nmr
 
Carbon dots characterization and applications
Carbon dots characterization and applicationsCarbon dots characterization and applications
Carbon dots characterization and applications
 
Metallobiomolecules
MetallobiomoleculesMetallobiomolecules
Metallobiomolecules
 
Department of chemistry and chemical sciences
Department of chemistry and chemical sciencesDepartment of chemistry and chemical sciences
Department of chemistry and chemical sciences
 
Presentation on nano technology
Presentation on nano technologyPresentation on nano technology
Presentation on nano technology
 
Mass Spectroscopy
Mass SpectroscopyMass Spectroscopy
Mass Spectroscopy
 
Zeeman Effect
Zeeman EffectZeeman Effect
Zeeman Effect
 
Spectroscopy Introduction
Spectroscopy IntroductionSpectroscopy Introduction
Spectroscopy Introduction
 
Quantum calculations and calculational chemistry
Quantum calculations and calculational chemistryQuantum calculations and calculational chemistry
Quantum calculations and calculational chemistry
 
Linear combination of tomic orbitals
Linear combination of tomic orbitalsLinear combination of tomic orbitals
Linear combination of tomic orbitals
 
Bio nano (Top-down bottom up approach)
Bio nano (Top-down bottom up approach) Bio nano (Top-down bottom up approach)
Bio nano (Top-down bottom up approach)
 
LASER SPECTROSCOPY
LASER SPECTROSCOPYLASER SPECTROSCOPY
LASER SPECTROSCOPY
 
Introduction to spectroscopy
Introduction to spectroscopyIntroduction to spectroscopy
Introduction to spectroscopy
 
Applications of raman spectroscopy
Applications of raman spectroscopyApplications of raman spectroscopy
Applications of raman spectroscopy
 
Electron Spin Resonance (ESR) Spectroscopy
Electron Spin Resonance (ESR) SpectroscopyElectron Spin Resonance (ESR) Spectroscopy
Electron Spin Resonance (ESR) Spectroscopy
 

Similar to Big Data Helps Elucidate Natural Product Structures

NOMAD
NOMADNOMAD
NOMADJisc RDM
 
Applying Royal Society of Chemistry cheminformatics skills to support the Pha...
Applying Royal Society of Chemistry cheminformatics skills to support the Pha...Applying Royal Society of Chemistry cheminformatics skills to support the Pha...
Applying Royal Society of Chemistry cheminformatics skills to support the Pha...Ken Karapetyan
 
NMR, deep learning and molecular structure: a call for data
NMR, deep learning and molecular structure: a call for dataNMR, deep learning and molecular structure: a call for data
NMR, deep learning and molecular structure: a call for dataJeff White
 
Chemical Analysis Facility
Chemical Analysis FacilityChemical Analysis Facility
Chemical Analysis Facilitychristinejcardin
 
Vcu Chemistry Reasearch Facilities
Vcu Chemistry Reasearch FacilitiesVcu Chemistry Reasearch Facilities
Vcu Chemistry Reasearch FacilitiesJoseph Turner 'Jody'
 
Fei sun chemical presentation 070114 2c [compatibility mode]
Fei sun chemical presentation 070114 2c [compatibility mode]Fei sun chemical presentation 070114 2c [compatibility mode]
Fei sun chemical presentation 070114 2c [compatibility mode]inscore
 
Mass Spectrometry Applications and spectral interpretation: Basics
Mass Spectrometry Applications and spectral interpretation: BasicsMass Spectrometry Applications and spectral interpretation: Basics
Mass Spectrometry Applications and spectral interpretation: BasicsShreekant Deshpande
 
Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)Christoph Steinbeck
 
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...Kamel Mansouri
 
Virtual screening of chemicals for endocrine disrupting activity: Case studie...
Virtual screening of chemicals for endocrine disrupting activity: Case studie...Virtual screening of chemicals for endocrine disrupting activity: Case studie...
Virtual screening of chemicals for endocrine disrupting activity: Case studie...Kamel Mansouri
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Sunghwan Kim
 
Mass spectroscopy
Mass spectroscopyMass spectroscopy
Mass spectroscopyZainab&Sons
 
Mass spectrometry assay optimization using functional programming patterns in...
Mass spectrometry assay optimization using functional programming patterns in...Mass spectrometry assay optimization using functional programming patterns in...
Mass spectrometry assay optimization using functional programming patterns in...Bennett Kalafut
 
CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity
CoMPARA: Collaborative Modeling Project for Androgen Receptor ActivityCoMPARA: Collaborative Modeling Project for Androgen Receptor Activity
CoMPARA: Collaborative Modeling Project for Androgen Receptor ActivityKamel Mansouri
 

Similar to Big Data Helps Elucidate Natural Product Structures (20)

NOMAD
NOMADNOMAD
NOMAD
 
Applying Royal Society of Chemistry cheminformatics skills to support the Pha...
Applying Royal Society of Chemistry cheminformatics skills to support the Pha...Applying Royal Society of Chemistry cheminformatics skills to support the Pha...
Applying Royal Society of Chemistry cheminformatics skills to support the Pha...
 
Applying Royal Society of Chemistry cheminformatics skills to support the Pha...
Applying Royal Society of Chemistry cheminformatics skills to support the Pha...Applying Royal Society of Chemistry cheminformatics skills to support the Pha...
Applying Royal Society of Chemistry cheminformatics skills to support the Pha...
 
NMR, deep learning and molecular structure: a call for data
NMR, deep learning and molecular structure: a call for dataNMR, deep learning and molecular structure: a call for data
NMR, deep learning and molecular structure: a call for data
 
Chemical Analysis Facility
Chemical Analysis FacilityChemical Analysis Facility
Chemical Analysis Facility
 
Vcu Chemistry Reasearch Facilities
Vcu Chemistry Reasearch FacilitiesVcu Chemistry Reasearch Facilities
Vcu Chemistry Reasearch Facilities
 
Fei sun chemical presentation 070114 2c [compatibility mode]
Fei sun chemical presentation 070114 2c [compatibility mode]Fei sun chemical presentation 070114 2c [compatibility mode]
Fei sun chemical presentation 070114 2c [compatibility mode]
 
Mass Spectrometry Applications and spectral interpretation: Basics
Mass Spectrometry Applications and spectral interpretation: BasicsMass Spectrometry Applications and spectral interpretation: Basics
Mass Spectrometry Applications and spectral interpretation: Basics
 
Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)
 
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
 
LC-MS
LC-MSLC-MS
LC-MS
 
Ch08 massspec
Ch08 massspecCh08 massspec
Ch08 massspec
 
Bioalgo 2012-03-massspec
Bioalgo 2012-03-massspecBioalgo 2012-03-massspec
Bioalgo 2012-03-massspec
 
A chemistry data repository to serve them all
A chemistry data repository to serve them allA chemistry data repository to serve them all
A chemistry data repository to serve them all
 
Virtual screening of chemicals for endocrine disrupting activity: Case studie...
Virtual screening of chemicals for endocrine disrupting activity: Case studie...Virtual screening of chemicals for endocrine disrupting activity: Case studie...
Virtual screening of chemicals for endocrine disrupting activity: Case studie...
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...
 
Mass spectroscopy
Mass spectroscopyMass spectroscopy
Mass spectroscopy
 
New Approach Methods - What is That?
New Approach Methods - What is That?New Approach Methods - What is That?
New Approach Methods - What is That?
 
Mass spectrometry assay optimization using functional programming patterns in...
Mass spectrometry assay optimization using functional programming patterns in...Mass spectrometry assay optimization using functional programming patterns in...
Mass spectrometry assay optimization using functional programming patterns in...
 
CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity
CoMPARA: Collaborative Modeling Project for Androgen Receptor ActivityCoMPARA: Collaborative Modeling Project for Androgen Receptor Activity
CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity
 

Recently uploaded

The Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravityThe Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravitySubhadipsau21168
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...SĂŠrgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.k64182334
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSĂŠrgio Sacani
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Jshifa
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 

Recently uploaded (20)

The Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravityThe Black hole shadow in Modified Gravity
The Black hole shadow in Modified Gravity
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 

Big Data Helps Elucidate Natural Product Structures

  • 1. Cheminformatics and the Structure Elucidation of Natural Products (or can Big Data help elucidate structures!) Antony Williams 5th Brazilian Conference of Natural Products October 27th 2015 ORCID ID:0000-0002-2668-4821
  • 2. A Bit About Me… • NMR spectroscopist by training • Chief Science Officer ACD/Labs Software • One of founders of ChemSpider database • VP for Cheminformatics at RSC
  • 3. Why is this important? • Structure verification and elucidation of 1000s of compounds • NMR predictors with >2,000,000 shifts & Computer-Assisted Structure Elucidation • Made >20,000,000 chemical compounds & data freely accessible to the community • Grew the dataset to over >30,000,000 chemicals & used for structure elucidation • Big data can assist structure identification
  • 4. The Agenda… • Dereplication using prior knowledge • The increasing prevalence of online content • Data generation is not the issue. Analysis is. • Computer-assisted structure elucidation • New experiments to improve elucidation • Rethink data-sharing through publications!
  • 5. The Agenda… • Dereplication using prior knowledge • The increasing prevalence of online content • Data generation is not the issue. Analysis is. • Computer-assisted structure elucidation • New experiments to improve elucidation • Rethink data-sharing through publications!
  • 6. …for each natural product dereplicated, at an average cost of $300 … a savings of $50,000 is incurred in isolation and identification time.
  • 7. Dereplication • There are ca. 200,000 known natural products • The chance for rediscovery is very high! • We need efficient “dereplication” processes • Most general approach – acquire analytical data and search existing databases…
  • 8. Scale of Dereplication Exercise 0.5 – 2 mg extract 4 mL agar slope Petri dish Bioassay & HPLC/UV/MS/NMR evaluation 100 mg sponge With gratitude to John Blunt
  • 9. Approaches to Dereplication Desirable to know: For each compound isolated: If new then acquire data: Fully elucidate structure Taxonomy of organism Molecular Wt/formula UV Spectrum 1H NMR Spectrum [13C NMR Spectrum if possible] 1D and 2D NMR array, MS with fragmentation, IR, [Îą]D, ORD Identify as known or new compound. If known STOP.
  • 10. What Databases are Available? Public ChemSpider CSLS PubChem NMRShift DB Naproc-13 SuperNatural SDBS Private All Pharma GVK Biosciences NPD UC UV DB DTU UV DB Marine NP DB GVK NP DB InterMed UV DB InterMed NMR DB Novartis IR DB Natl. Centre Plant Metabol. CH-NMR-NP Commercial SciFinder SpecInfo (Crossfire) Beilstein Crossfire Gmelin Reaxys ACD Spectral Libraries NaprAlert Dict. Natural Products Dict. Marine Nat. Prods AntiBase MarinLit AntiMarin With gratitude to John Blunt
  • 11. PU10-F2 m/z 220 240 260 280 300 320 340 360 380 400 420 440 460 480 500 520 540 560 % 0 100 SSA0006 291 (3.284) Cm (241:343) 1: TOF MS ES+ 3.92e4261.564 241.060 241.560 241.974 242.062 262.071 481.122262.517 304.098 263.024 282.074 465.101305.100 482.127 483.122 511.102 M+H Search MW = 480 in Dict. Nat. Prod. 562 hits out of 230,000 compounds!!! MW 480 MF = C28H36N2O5 Nominal Mass Searching
  • 12. Molecular Formula Searching Search MF=C28H36N2O5 in Dict. Nat. Prod. 2 hits out of 230,000 compounds!!! Compare UV spectrum and 1H NMR features
  • 13. How many isomers for a formula? C10H17Br2ClO2, 50,502,293 C15H22O2, 138,136,211,624 C15H20O1, 37,568,150,635 C12H12O3, 68,930,547,646 C13H20O3, 14,431,269,166 C11H12N2O2, 3⋅1011 <n1012
  • 14. How many isomers for a formula? C10H17Br2ClO2, 50,502,293 C15H22O2, 138,136,211,624 C15H20O1, 37,568,150,635 C12H12O3, 68,930,547,646 C13H20O3, 14,431,269,166 C11H12N2O2, 3⋅1011 <n1012
  • 15.
  • 16. 1 x triplet methyl 3 x methoxy 3 x olefinic H solvent ppm1234567 6.42 6.27 6.24 19.96 15.15 24.03 21.93 1 H NMR spectrum, CD3OD
  • 18. • 1 of 5 hits from 230,000 compounds • The ONLY hit if MW = 480 included NMR Features Dereplication
  • 19.
  • 23. Dereplication in MarinLit Online • Can be achieved using • 1 H NMR features e.g. number of Me groups • 13 C and 1 H chemical shifts • Molecular formula (complete or partial) • UV maxima • Exact mass • OR a combination of any or all of the above.
  • 24.
  • 25. 1 H NMR Spectrum - new or known? 9 Me groups are obvious (from integrals) Search of MarinLit: 9 Me gave 628 answers
  • 26. 4 Me singlets 4 Me doublets 1 OMe singlet Aromatic protons Characterizing the spectrum further Search MarinLit for 9 total methyls: 4 singlets, 4 doublets, 1 OMe there were 39 answers,
  • 27. COSY spectrum This implies a 1,2,4- trisubstituted aromatic system A broad singlet coupled/on-coupled to 2 doublets
  • 28. 4 Me singlets 4 Me doublets 1 OMe singlet 4 singlets, 4 doublets, 1 OMe, 1,2,4-trisubstituted aromatic 2 answers only
  • 29. Comparison of NMR data confirmed that the unknown had this structure
  • 30. Commercial Assigned Databases >320,000 assigned chemical structures >2,500,000 shifts
  • 31. Searching Assigned Databases • mI = 306.1 – 306.2 • 591/322,319 hits
  • 32. Searching Assigned Databases • 10 13 C shifts to +/- 3.0ppm • 5 1 H shifts to +/- 0.3ppm • 7 hits – very different
  • 33. Including 15 N, 19 F and 31 P data
  • 34. Experimental vs. Experimental Differences between C13 shifts are generally small
  • 35. Experimental vs. Experimental Differences between C13 shifts are generally small
  • 36. Searching experimental data 30 seconds from peak-picking to suggested molecules
  • 37. Experimental vs. Predicted Differences between exp. and pred. C13 shifts can be larger – useful to limit number of shifts searched
  • 38. The Agenda… • Dereplication using prior knowledge • Increasing prevalence of free online content • Data generation is not the issue. Analysis is. • Computer-assisted structure elucidation • New experiments to improve elucidation • Rethink data-sharing through publications!
  • 39. Online content also available! NMRShiftDB http://nmrshiftdb.nmr.uni-koeln.de/
  • 40. Online content also available! NMRShiftDB http://nmrshiftdb.nmr.uni-koeln.de/
  • 41. Online content also available! www.nmrdb.org
  • 42. • ~35 million chemicals and growing • Data sourced from ~500 different sources • Structure centric hub for web-searching • Already used many mass spectrometry software packages for structure ID Mining Big Data for Natural Products???
  • 44. 26/35,000,000 Million Hits Ranked by # of References
  • 46. What can I find on ChemSpider?
  • 47. What can I find? All for free…
  • 48. NMR Predictions on ChemSpider Data for Dereplication
  • 49. 1 2 • fC = full composition (C0-100 H0-100 O0-20 N0-10) • lC= limited composition (C10-30 H25-40 O0-15 N0-5) NMR Predictions on ChemSpider Data for Dereplication Compound 1 Compound 2
  • 50. Large Fragments can be found Top 2 hits searched by 1 H chemical shifts. Hits ranked by the 1 H NMR deviation and filtered with C10-30 H25-40 O0-15 N0- 5,Good List and Bad List. Good List was determined from 1 H shifts, integrals and 1 H-1 H COSY
  • 51. • Search nominal mass 490-491 gave the following results: ChemSpider : 46,234 SciFinder: 171,904 Dictionary of Natural Products: 537 Dictionary of Marine Natural Products 90 MarinLit: 94 AntiMarin: 131 • Molecular formula obtained C30H50O5 (490.3658): ChemSpider: 208 SciFinder 2,366 Dictionary of Natural Products 238 Dictionary of Marine Natural Products 43 MarinLit 43 AntiMarin 48 Marine Natural Product Example
  • 52. • Search nominal mass 490-491 gave the following results: ChemSpider : 46,234 SciFinder: 171,904 Dictionary of Natural Products: 537 Dictionary of Marine Natural Products 90 MarinLit: 94 AntiMarin: 131 • Molecular formula obtained C30H50O5 (490.3658): ChemSpider: 208 SciFinder 2,366 Dictionary of Natural Products 238 Dictionary of Marine Natural Products 43 MarinLit 43 AntiMarin 48 Marine Natural Product Example Focused Datasets Valuable
  • 53. Approaches to Dereplication Desirable to know: For each compound isolated: If new then acquire data: Fully elucidate structure Taxonomy of organism Molecular wt/formula UV Spectrum 1H NMR Spectrum [13C NMR Spectrum] 1D and 2D NMR array, MS with fragmentation, IR, [Îą]D, ORD Identify as known or new compound. If known STOP.
  • 54. The Agenda… • Dereplication using prior knowledge • The increasing prevalence of online content • Data generation is not the issue. Analysis is. • Computer-assisted structure elucidation • New experiments to improve elucidation • Rethink data-sharing through publications!
  • 55. Modern NMR Technologies • Even a basic array of 1D/2D experiments can provide the relevant data in the majority of cases • The past few years have seen improvements in: • Hardware: Magnets, Probes and RF • Software: Data acquisition and processing • Pulse sequences to probe direct and (very) long- range homo- and heteronuclear correlations
  • 57. NMR Developments – 30 years of improvements • 1984 – First report of cryogenic NMR probe • 1986 – HMBC experiment reported • 1991 – First commercial 3 mm gradient inverse probes. • 1996 – ADEQUATE NMR experiments first reported. • 1996 – 1 H-15 N HMBC applications reported. • 1998 – Commercial 1.7 mm gradient inverse triple probes. • 1999 – First commercial cryogenic NMR probes delivered. • 2000 – First 3 mm prototype cryoprobe developed. • 2006 – First 1.7 mm MicroCryoProbes™ delivered. • 2009 – Pure shift HSQC experiments developed. • 2014 –1,1- and -1,n-HD-ADEQUATE experiments With gratitude to Gary E. Martin
  • 58. COSY Correlations Vicinal H-H couplings Geminal H-H couplings 9 19 N N O O H H H H H H H H H H H H H H H HHH H H H H 1 2 3 4 5 6 7 8 10 11 1213 14 16 17 18 20 21 22 23
  • 59. HMBC Correlations (8Hz Optimized) 9 17a/b N N O O H H H H H H H H H H H H H H H HHH H H H H 1 2 3 4 5 6 7 8 10 11a 1213 1416 18a 20a 21 22 23a 23b 18b 20b 11b
  • 60. Always new sequences coming: 1,1- and -1,n-HD-ADEQUATE Examples show all three scenarios for 1,1- and 1,n-HD- ADEQUATE correlations for cryptospirolepine.
  • 61. Adoption can take a long time HSQC vs. HMQC took > 20 years! • HMQC is an older technique and affords lower F1 resolution. • HSQC is a better technique but SLOWLY supplanted HMQC! Year Range #HMQC reports #HSQC reports 1990-94 52 10 1995-99 177 39 2000-04 346 111 2005-09 358 266 2010-14 345 423 Totals 1278 849 From: A. Williams, G.E. Martin, & D.J. Rovnyak, “Increasing the Adoption of Advanced Techniques for the Structure Elucidation of Natural Products,” from Modern NMR Approaches to the Structure Elucidation of Natural Products, vol. 1, A.J. Williams, G.E. Martin, and D.J. Rovnyak, Eds., RSC, London, 2015.
  • 62. The Agenda… • Dereplication using prior knowledge • The increasing prevalence of online content • Data generation is not the issue. Analysis is. • Computer-assisted structure elucidation • New experiments to improve elucidation • Rethink data-sharing through publications!
  • 63. AI Research in 1965…
  • 64. 50 years of iterative development DENDRAL NMR-SAMS SENECA SpecInfo ACD/Labs CMC-SE LSD Others…
  • 65. Computer Assisted Structure Elucidation: Methodology • Interpret data to extract knowledge • Molecular Formula • Integrals • Chemical shifts • Multiplicity • Connectivity • Known fragments • Known exclusions • Search structure space to derive all structures • Rank-order based on set criteria • Predicted chemical shift • Mass Spec Fragmentation
  • 66. Remember how many isomers C10H17Br2ClO2, 50,502,293 C15H22O2, 138,136,211,624 C15H20O1, 37,568,150,635 C12H12O3, 68,930,547,646 C13H20O3, 14,431,269,166 C11H12N2O2, 3⋅1011 <n1012
  • 67. Computer-Aided Structure Elucidation • Eliminate “superfluous” isomers by imposing different structural constraints • Structural constraints are from: • Spectral data of various types: • NMR shifts/multiplicity constrain atom types; Correlations constrain connectivities • MS constrains formula and fragments • IR constrains functional groups • Prior information – sample origin • Chemical rules – valence, ring size, charge, etc.
  • 68.
  • 69. CH3 17.60 CH3 18.13 CH3 20.20 CH3 31.40 18.09 19.10 19.50 19.50 28.20 29.20 41.20 34.30 42.20 63.30 33.40 61.20 67.80 68.10 80.40 174.10 OH O O O COSY 1 H - 1 H coupling through 3 bonds HMBC 1 H – 13 C coupling through 2/3 bonds 2D NMR spectra: Extraction of Structural Information: COSY/HMBC
  • 70. 1D & 2D NMR Synchronized Processing The Software displays correlations for assigned spectra and structures, and highlights correlations that are likely to be erroneous.
  • 72. Not that easy though… “Nonstandard Correlations” “Standard” and “Nonstandard” correlations are experimentally indistinguishable If 2D NMR data contain both “Standard” and “Nonstandard” correlations we see contradictions in interpretation H Ñ Ñ Ñ Ñ Ñ H H H H Ñ Ñ Ñ Ñ Ñ Ñ COSY HMBC Standard
  • 75. Structure Generation combined with Structural and Spectral Filtering • Internal Badlist • User Badlist • User Goodlist • Rings: Obligatory, Forbidden • Bredt’s Rule • Maximum Match Factor • Filter Tolerance: Tight, Medium, Loose
  • 76. Selection of the Preferable Structure • Remove duplicates • 1 H and 13 C shift calculation for all output structures • Rank structures in ascending order of average chemical shift deviation • Structure with minimum d is the most probable.
  • 77. Low Structural Information in 2D Spectral Data: Use Fragment DB • Number of observed 2D NMR correlations is smaller than expected • Deficit of hydrogen atoms results in a low number of correlations • Search in Fragment Library using the 13C NMR spectrum and embed in the MCD
  • 78.
  • 79. Example of Fragment Usage. Symmetric molecule C56H78O12S1 CH 5.76 CH 6.42 CH C C C CH 2.661.38 CH 1.10 1.60 CH2 CH2 CH CH2 CH H2C CH3 0.65HC CH3 0.88 CH 4.29 CH2 2.36C C OC OH 5.35OH 3.73 CH3 1.12 CH3 1.99 CH2 4.13 OH 4.18 O O S CH 5.76 CH 6.42 CH C C C CH 2.66 1.38 CH 1.10 1.60 H2C CH2 CH CH2 CH CH2 CH CH3 0.88 CH 4.29 CH2 2.36 C C O C O CH2 4.13 OH 4.18 CH3 1.99 CH3 0.65 CH3 1.12 O OH 5.35 OH 3.73 Ashwaganhanolide Small number of correlations
  • 80. 13 C NMR Fragment search - 5524 found Exp. Frag. Fragment # 1 ĐĄ17Н22О2
  • 81. Solution • 960 MCDs were created using fragment #1 • Structure Generation from 960 MCDs gave 24 structures after filtering and 6 output structures. • Total time was tg= 29 m 30 s
  • 83.
  • 84. Wrong Molecular Formula Only CHNO in formula assumed J. Am. Chem. Soc., 2001, 123, 10870-10876. Tetrahedron Letters, 2002, 43, 5707-5710. FAB-MS: C31H54N4O8 ESI-MS: C31H54N4SO6
  • 85. Wrong Molecular Formula Only CHNO in formula assumed J. Am. Chem. Soc., 2001, 123, 10870-10876. Tetrahedron Letters, 2002, 43, 5707-5710. FAB-MS: C31H54N4O8 ESI-MS: C31H54N4SO6
  • 86. Wrong Initial Suggestion 13C shift at 173.50 ppm is O-C=O group J. Nat. Prod., 2000, 63, 1677-1678. J. Nat. Prod., 2003, 66, 716-718. 13 C signal at 173 ppm led to COO bias Data compared to a similar compound
  • 87. J. Nat. Prod., 2000, 63, 1677-1678. J. Nat. Prod., 2003, 66, 716-718. 13 C signal at 173 ppm led to COO bias Data compared to a similar compound Wrong Initial Suggestion 13C shift at 173.50 ppm is O-C=O group 13 C signal at 173 ppm led to COO bias Data compared to a similar compound
  • 88. Misinterpretation of 2D NMR Data Presence of a guanidine group substituted with 2xCH3 groups was hypothesized. Absence of an expected HMBC correlation from methyls to C(159.0) ignored. J. Org. Chem., 2004, 69,9025-9029. J. Org. Chem., 2008, 73, 8719-8722. Misinterpreted HMBC signal Verified by X-ray crystallography
  • 89. Misinterpretation of 2D NMR Data Presence of a guanidine group substituted with 2xCH3 groups was hypothesized. Absence of an expected HMBC correlation from methyls to C(159.0) ignored. J. Org. Chem., 2004, 69,9025-9029 J. Org. Chem., 2008, 73, 8719-8722 Misinterpreted HMBC signal Verified by X-ray crystallography
  • 90.
  • 92. Number of Skeletal Atoms J. Cheminf. 2012, 4:5
  • 94. The Agenda… • Dereplication using prior knowledge • The increasing prevalence of online content • Data generation is not the issue. Analysis is. • Computer-assisted structure elucidation • New experiments to improve elucidation • Rethink data-sharing through publications!
  • 95. New Experiments Influence CASE! Cervinomycin O NO O O OO OH O O 1 4 7 9 10 12 14 16 1922 26 29 30 CH3 (fb) CH2 CH2CH2 (ob) C (ob) C CH C CH CCH C CC C (ob) C (ob) C (ob) C C O O O O O H CH3 (ob) CH3 (ob) CH CH C C (ob) C (ob) C (ob) C O O O O
  • 96. The Influence of Data on Elucidation Time: Cervinomycin COSY, HSQC 1 H-13 C HMBC 1 H-13 C LR-HSQMBC Structure Generation Time # of Structure s Generated 8 Hz 4 Hz 4 Hz 2 Hz + + + 49 h 314 + + + + 37 h 4 + + + + 150 s 7 + + + + + 104 s 1
  • 97. New Experiments Cryptospirolepine over 20 years! Inexplicably, the vinyl proton has no evident 2 JCH correlation to the carbonyl! DFT predicted ~0.3 Hz coupling! Synergistic interpretation and CASE applied to an array of 2D data elucidated this compound. Included new 1,1-ADEQUATE and 1,n-ADEQUATE data. The absence of a 2 JCH correlation from the vinyl proton to the adjacent carbonyl is perplexing. A new long-range heteronuclear correlation NMR experiment was acquired: LR-HSQMBC.
  • 98. Key 1,1-HD-ADEQUATE Correlations • Experiment was optimized for 60 Hz • Typical range for 1 JCC sp2 couplings is 60-75 Hz • The 2 JCC coupling from C13 to C1/C11’ was calculated (DFT) to be 15.4 Hz, which would give a calculated intensity of 0.16 in this experiment.
  • 99. • Experiment optimized for 7 Hz • Typical range for n JCC couplings is approximately 2-7 Hz • 2 JCC correlations across carbonyls are typically 10-16 Hz • Correlations were observed, including the 1 JCC correlations from C13 to C2 and C13a that unavoidably “leak” into all 1,n- ADEQUATE spectra. Key 1,1-HD-ADEQUATE Correlations
  • 100. Revision of the [7.5.5] Core of Cryptospirolepine to a [6.6.5] System • Based on correlations from the 1,1- and -1,n-HD-ADEQUATE spectra, the [7.5.5] core shown in red was revised to a [6.6.5] system. • The Îł-lactam was rearranged to a dehydropiperidinone. • Key correlations were the 1 JCC correlation from the vinyl CH to the flanking carbonyl and quaternary carbons.
  • 101. Could CASE methods sort out the structure? 1,1- ADEQUATE 1,n- ADEQUATE 1 H-13 C HMBC IDR HSQC- TOCSY 1 H-13 C LR- HSQMBC 1 H-15 N LR- HSQMBC GENERATION 60 Hz 7 Hz 8 Hz 4 Hz 15 ms 2 Hz 4 Hz 2 Hz Time (s) # Structures + >420 h >10,400 + + + 140 6816 + + + + 142 3360 + + + + 40 522 + + + + + 45 258 + + + + + + + + 7 24 • Modern “1993” data set used as input failed to lead to the generation of the structure in 3 week calculation! • More complete input data reduced calculation to secs!
  • 102. The Agenda… • Dereplication using prior knowledge • The increasing prevalence of online content • Data generation is not the issue. Analysis is. • Computer-assisted structure elucidation • New experiments to improve elucidation • Rethink data-sharing through publications!
  • 103. Errors in published structures…
  • 104. ESI – Text Spectra
  • 110. What would it take??? • PDFs containing text descriptions of spectra are problematic for reinterpretation of data • Publishers should host at least high resolution images of all spectra • Really we need the data files!!!
  • 111. Conclusions • Dereplication is increasingly feasible using online content • Analysis of data is generally a bigger issue than data generation itself • Computer-assisted structure elucidation works • Data-sharing associated with publications needs rethinking
  • 113. Acknowledgements RSC/ChemSpider/Marinlit •John Blunt •Serin Dabb •Valery Tkachenko NMR (Book) Collaborators •Gary Martin •David Rovnyak ACD/Labs •Structure Elucidator •Mikhail Elyashberg •Kirill Blinov •Arvin Moser •Patrick Wheeler
  • 114. Thank you ORCID: 0000-0002-2668-4821 Twitter: @ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams

Editor's Notes

  1. Number of possible isomers can be extremely large. Impossible to create all isomers to relatively simple compounds (number of stars in our galaxy 1011)
  2. Number of possible isomers can be extremely large. Impossible to create all isomers to relatively simple compounds (number of stars in our galaxy 1011)
  3. This is a natural product dataset and software provided a possible molecule within 30 seconds
  4. Magnetic field strength has grown year on year with the related increase in dispersion and sensitivity
  5. A variety of methods have been employed, using IR, NMR, and MS. Different philosophies, methodologies, interfaces.
  6. A variety of methods have been employed, using IR, NMR, and MS. Different philosophies, methodologies, interfaces.
  7. Number of possible isomers can be extremely large. Impossible to create all isomers to relatively simple compounds (number of stars in our galaxy 1011)
  8. We can formulate a general CASE strategy:
  9. Molecular Conneciivity diagram is automatically generated. This can be used for an alternative check on structures.
  10. Атом 1 to 17: 6-bond НМВС Атом 2 to 7: а=6 COSY
  11. Fragments are ranked in descending order of numbers of carbon atoms. Carbon atoms already possess chemical shifts.
  12. Even when you believe that you are confident of a structure, it can still be helpful to have further confirmation of its rectitude. A CASE program can ensure that all appropriate candidates for a given set of structure space are considered.
  13. Randazzo et al4 isolated a new compound named Halipeptin A. An elemental formula containing only CHNO was assumed: C31H54N4O9 (calculated 627.3969 for C31H55N4O9 with m=0.0104, i.e., 16.6 ppm). Structure A contains an unusual fragment (colored in red) in Fig. 1, and was suggested from 2D NMR data. In a follow-up article,5 the same group found the C31H54N4SO7 formula from HRMS, and the correct structure B was suggested. Both molecular formulae and 2D NMR data were input into ACD/Structure Elucidator. The software generated 303 structures in 36 seconds. Ranking the generated structures using 13C chemical shift prediction placed the correct structure (B) in the first position Randazzo, A.; Bifulco, G.; Giannini, C.; Bucci, M.; Debitus, C.; Cirino, G.; Gomez-Paloma, L., 4. J. Am. Chem. Soc., 123:10870-10876, 2001. Monica, C. D.; Randazzo, A.; Bifulco, G.; Cimino, P.; Aquino, M.; Izzo, I.; De Riccardisc, F.; Gomez-Paloma, L., 5. Tetrahedron Letters, 43:5707-5710, 2002. Poster: Poster: Are Pitfalls Unavoidable During the Structure Elucidation of New Organic Compounds? M. E. Elyashberg, K. A. Blinov, S.G. Molodtsov, A.J. Williams, Ryan Sasaki.
  14. Randazzo et al4 isolated a new compound named Halipeptin A. An elemental formula containing only CHNO was assumed: C31H54N4O9 (calculated 627.3969 for C31H55N4O9 with m=0.0104, i.e., 16.6 ppm). Structure A contains an unusual fragment (colored in red) in Fig. 1, and was suggested from 2D NMR data. In a follow-up article,5 the same group found the C31H54N4SO7 formula from HRMS, and the correct structure B was suggested. Both molecular formulae and 2D NMR data were input into ACD/Structure Elucidator. The software generated 303 structures in 36 seconds. Ranking the generated structures using 13C chemical shift prediction placed the correct structure (B) in the first position Randazzo, A.; Bifulco, G.; Giannini, C.; Bucci, M.; Debitus, C.; Cirino, G.; Gomez-Paloma, L., 4. J. Am. Chem. Soc., 123:10870-10876, 2001. Monica, C. D.; Randazzo, A.; Bifulco, G.; Cimino, P.; Aquino, M.; Izzo, I.; De Riccardisc, F.; Gomez-Paloma, L., 5. Tetrahedron Letters, 43:5707-5710, 2002. Poster: Poster: Are Pitfalls Unavoidable During the Structure Elucidation of New Organic Compounds? M. E. Elyashberg, K. A. Blinov, S.G. Molodtsov, A.J. Williams, Ryan Sasaki.
  15. Sakuno et al6 isolated a natural product with molecular formula C20H18O6. Authors9 postulated that the 13C chemical shift at 173.50 ppm was associated with the resonance of the O-C=O group, and with this assumption structure A (Fig. 2) was suggested. Wipf and Kerekes7 compared the NMR and IR spectra of this compound with a number of spectra of its structural relatives and proved that it was identical with viridol (structure B). The 2D NMR data from article6 were input into ACD/Structure Elucidator. No assumptions were used. The software generated 272 structures in 1 min 40 sec. Ranking the generated structures using 13C chemical shift prediction placed the correct structure Viridol (B) in the first position. The originally proposed structure A was placed in the second position but with a large difference in chemical shift deviation. Sakuno, E.; Yabe, K.; Hamasaki, T.; Nakajima, H., 6. J. Nat. Prod., 63:1677-1678, 2000. Wipf, P.; Kerekes, A. D., 7. J. Nat. Prod., 66:716-718, 2003.
  16. Sakuno et al6 isolated a natural product with molecular formula C20H18O6. Authors9 postulated that the 13C chemical shift at 173.50 ppm was associated with the resonance of the O-C=O group, and with this assumption structure A (Fig. 2) was suggested. Wipf and Kerekes7 compared the NMR and IR spectra of this compound with a number of spectra of its structural relatives and proved that it was identical with viridol (structure B). The 2D NMR data from article6 were input into ACD/Structure Elucidator. No assumptions were used. The software generated 272 structures in 1 min 40 sec. Ranking the generated structures using 13C chemical shift prediction placed the correct structure Viridol (B) in the first position. The originally proposed structure A was placed in the second position but with a large difference in chemical shift deviation. Sakuno, E.; Yabe, K.; Hamasaki, T.; Nakajima, H., 6. J. Nat. Prod., 63:1677-1678, 2000. Wipf, P.; Kerekes, A. D., 7. J. Nat. Prod., 66:716-718, 2003.
  17. Ralifo and Crews8 reported on the separation of (-)-spiroleucettadine (C20H23N3O4), structure A (Fig. 3). The presence of a guanidine group (C 159.0) substituted with two CH3 groups was hypothesized. The absence of an expected HMBC correlation from one of methyls to C(159.0) was ignored. Several attempts to synthesize this compound were undertaken but without any success. Questions regarding the original structure elucidation process therefore arose. Crews’s group9 fulfilled a successful re-isolation of spiroleucettadine, and X-ray analysis established the correct structure of spiroleucettadine, shown as B, Fig. 3. It was revealed that the postulation of the presence of a guanidine group was erroneous, and one HMBC correlation was misinterpreted in the previous work. When the old 2D NMR data were used in ACD/Structure Elucidator, it was immediately found that the original structure produced deviations that were too large for a positive identification. When the 2D NMR data from the latter study were used with the software, the correct structure was generated and present in the first position after ranking using 13C chemical shift prediction. Ralifo, P.; Crews, P., 8. J. Org. Chem., 69:9025-9029, 2004. White, K. N.; Amagata, T.; Oliver, A. G.; Tenney, K.; Wenzel, P. J.; Crews, P., 9. J. Org. Chem., 73:8719-8722, 2008.
  18. Ralifo and Crews8 reported on the separation of (-)-spiroleucettadine (C20H23N3O4), structure A (Fig. 3). The presence of a guanidine group (C 159.0) substituted with two CH3 groups was hypothesized. The absence of an expected HMBC correlation from one of methyls to C(159.0) was ignored. Several attempts to synthesize this compound were undertaken but without any success. Questions regarding the original structure elucidation process therefore arose. Crews’s group9 fulfilled a successful re-isolation of spiroleucettadine, and X-ray analysis established the correct structure of spiroleucettadine, shown as B, Fig. 3. It was revealed that the postulation of the presence of a guanidine group was erroneous, and one HMBC correlation was misinterpreted in the previous work. When the old 2D NMR data were used in ACD/Structure Elucidator, it was immediately found that the original structure produced deviations that were too large for a positive identification. When the 2D NMR data from the latter study were used with the software, the correct structure was generated and present in the first position after ranking using 13C chemical shift prediction. Ralifo, P.; Crews, P., 8. J. Org. Chem., 69:9025-9029, 2004. White, K. N.; Amagata, T.; Oliver, A. G.; Tenney, K.; Wenzel, P. J.; Crews, P., 9. J. Org. Chem., 73:8719-8722, 2008.
  19. Characteristics of known drug space. Natural products, their derivatives and synthetic drugs
  20. Results obtained from various Structure Elucidator CASE program computation runs for various sets of input data for the xanthone antibiotic cervinomycin A2 (see Figure X.17B for the structure). As can be readily seen from the first two rows of the table, restricing the input data file to data that is likely to have primarily 2JCH and 3JCH correlations with perhaps only sparse 4JCH correlations (rows 1 and 2) leads to lengthy computation runs. However, when 2 Hz optimized LR-HSQMBC data, which can contain 4JCH – 6JCH correlations (rows 3 and 4), are included in the data input file computation times drop precipitously and the number of structures generated is also significantly reduced.
  21. Errors in published structures are rampant.