On Friday September 16th I was honored with the award for the North Carolina American Chemical Society Distinguished Speaker Award and got to review the past 20 years of my career. This was my short intro bio
"Antony Williams is a Ph.D. NMR spectroscopist and cheminformatician who has worked in academia, government, a Fortune 500 company, and two start-ups. He is co-founder of the free online chemical database ChemSpider, originally started as a hobby project and ultimately acquired by the Royal Society of Chemistry (in the UK) and now used by over 50,000 users per day. He is now a computational chemist at the Environmental Protection Agency in the National Center for Computational Toxicology and is focused on developing web applications to support data dissemination and progress efforts in allowing for faster and cheaper approaches to identify potential toxicological effects of chemicals. He has published >180 papers, >25 book chapters and a number of books. He is known as the ChemConnector on social networks. "
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
How One Monkey on a Typewriter Made a Difference to Online Chemistry
1. How One Monkey on a Typewriter
Made a Difference to
Online Chemistry
Antony Williams
ORCID ID:0000-0002-2668-4821
I do know chimpanzees
are not monkeys but the
photos are more fun!
Keyboard
2. Before we begin…
• It’s going to be kind of a random walk…
• The slides will go online at SlideShare
http://www.slideshare.net/AntonyWilliams/.
Some slides for tonight are ca. 10 years old!
• Any offense is unintentional…I am Welsh!
3. What type of chemist am I?
…sometimes colorful
…sometimes a monkey
as you will soon see…
4. Career to date…
• NMR Spectroscopist (PhD) 1985-88
• EPR Spectroscopist (NRC) 1988-90
• NMR Facility Manager (U. of Ottawa) 1990-92
• NMR Leader (Kodak) & cheminformatics 1992-97
• Chief Science Officer (ACD/Labs) 1997-2007
• ChemSpider development (2 years) 2007-09
• VP eScience (RSC) 2009-15
• …and now at NCCT at the EPA
10. >10 Years at ACD/Labs
Analytical data processing
NMR Prediction
CASE Systems
QSAR modeling
PhysChem prediction
Structure Drawing
Nomenclature
11. Structure Drawing and Nomenclature
Free ChemSketch for Home/Education
Understanding structure representation and nomenclature
became ESSENTIAL for building and curating databases!
12. The Web is the Way
Structure Drawing
(First drawing applet)
NMR Prediction ONLINE
PhysChem prediction ONLINE
Nomenclature ONLINE
13. My Greatest Pride – CASE
Anyone struggling
with a Structure
Elucidation??
14. How many isomers for a formula?
C10H17Br2ClO2, 50,502,293 C15H22O2, 138,136,211,624
C15H20O1, 37,568,150,635 C12H12O3, 68,930,547,646
C13H20O3, 14,431,269,166 C11H12N2O2, 3⋅1011
<n1012
15. How many isomers for a formula?
C10H17Br2ClO2, 50,502,293 C15H22O2, 138,136,211,624
C15H20O1, 37,568,150,635 C12H12O3, 68,930,547,646
C13H20O3, 14,431,269,166 C11H12N2O2, 3⋅1011
<n1012
16. COSY Correlations
Vicinal H-H couplings
Geminal H-H couplings
9
19
N
N
O
O
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
HHH
H
H
H
H
1
2
3
4
5
6
7
8
10
11
1213
14
16
17
18
20
21
22
23
17. HMBC Correlations (8Hz Optimized)
9
17a/b
N
N
O
O
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
HHH
H
H
H
H
1
2
3
4
5
6
7
8
10
11a
1213
1416
18a
20a
21
22
23a
23b
18b
20b
11b
24. A hobby gone wild…Year 1
• Hobby-project connecting chemistry data on the web
• Three servers – one purchased, two hand-built
• Software begged and borrowed
• Some late nights – 10pm to 2am for over a year
• Survival of the naysayers in the community
• Taking advantage of a changing world of data
availability and crowdsourcing by willing participants
• NO funding
25.
26. But in WEEK 1 of release…
“…The Zoo is filled with monkeys. (The same
monkeys who are trying to write Shakespeare
by hitting typewriter keys at random).”
30. Building a Structure Centric Community for Chemists
Ability to curate and add to the database
• Add structures and sets
• “Clean” structures
• Add data (spectra, CIFs, images)
• Add links to other pages (URLs)
• Add publication details
•
Year 2 - Will anybody help us?
31. Will anybody help us???
Daily crowdsourced curation underway
• 40 curation emails per day
• 100 identifiers per day removed, approved or added
37. Data Quality just LAST NIGHT!
Carbon felt, 1.27cm (0.5in) thick
Single-walled carbon nanotubes
Multi-walled carbon nanotubes
(MWNTs), 95+%
Graphite rod, 13cm (5.125in) dia x
30.5cm (12in) long
Graphite rod, 6.15mm (0.242in) dia x
152mm (6in) long
Graphite rod, 6.15mm (0.242in) dia x
305mm (12in) long
Carbon powder
acetylene carbon
acetylene carbon
a methyl group
Acheson graphite
C5M
Methylidyne radical
Carbon rods, 5N
308068-56-6
Activated Carbon Powder
GRAPHITE SYNTHETIC
Activated carbon, Graphite
Fullerene soot, as produced
46. Types of Errors Found
• Structure drawing errors
• Misassociation of names and structures
• IUPAC Name Errors
• Links out to databases were to wrong structures
• Property errors/validation
• CAS Number validation
64. What we tried to fix…
What’s the
structure?
What’s the
structure?
Are they in
our file?
Are they in
our file?
What’s
similar?
What’s
similar?
What’s the
target?
What’s the
target?Pharmacology
data?
Pharmacology
data?
Known
Pathways?
Known
Pathways?
Working On
Now?
Working On
Now?Connections
to disease?
Connections
to disease?
Expressed in
right cell type?
Expressed in
right cell type?
Competitors?Competitors?
IP?IP?
68. So what did I learn??
So what did I learn over the years….
•Connecting people, data and systems
•Integration of disparate data sources and
systems can be so enabling
•Data Quality is an overlooked imperative
•Crowdsourcing, even when a small crowd,
shares the load and speeds progress
•Embrace ODOSOS for greater benefit
•And so….to EPA-NCCT
69. What am I involved with at EPA?
https://comptox.epa.gov/dashboard
76. Ways to make an impact…
• Publish, share, validate and curate data
• Publish chemicals, syntheses and data
• “Publish” – Papers, Blogs, Reports, Tweets,
Presentations, Videos
• Contribute to Wikipedia
• Participate in chemistry communities
• Contribute to the Big Data of Chemistry
78. An OLD Monkey on a keyboard...
What I helped with…
•Drawing software on >1,000,000 desktops
•ChemSpider for >50,000 users/day
•Cleaned a lot of Wikipedia chemicals
•Almost fought with the Olympics Committee
•…I hope it’s been useful?
79. Not just 1 monkey on a keyboard!
…so many friends, colleagues, known and
unknown that helped…
Number of possible isomers can be extremely large. Impossible to create all isomers to relatively simple compounds (number of stars in our galaxy 1011)
Number of possible isomers can be extremely large. Impossible to create all isomers to relatively simple compounds (number of stars in our galaxy 1011)