SlideShare a Scribd company logo
1 of 49
Building a Data Repository to 
Manage Chemistry Research 
Data 
Antony Williams 
University of Connecticut 
10/15/2014
Chemistry for the Community 
• The Royal Society of Chemistry as a 
provider of chemistry for the community: 
• As a charity 
• As a scientific publisher 
• As a host of commercial databases 
• As a partner in grant-based projects 
• As the host of ChemSpider 
• And now in development : the RSC Data 
Repository for Chemistry
• ~30 million chemicals and growing 
• Data sourced from hundreds of sources 
• Crowd sourced curation and annotation 
• Ongoing deposition of data from our 
journals and our collaborators 
• Structure centric hub for web-searching 
• …and a really big dictionary!!!
ChemSpider
ChemSpider
Experimental/Predicted Properties
Literature references
Patents references
Books
Vendors and data sources
Aspirin on ChemSpider
Many Names, One Structure
Crowdsourced “Annotations” 
• Users can add 
• Descriptions, Syntheses and Commentaries 
• Links to PubMed articles 
• Links to articles via DOIs 
• Add spectral data 
• Add Crystallographic Information Files 
• Add photos 
• Add MP3 files 
• Add Videos
Crowdsourced Enhancement 
• The community can clean and enhance the 
database by providing Feedback and direct 
curation 
• Tens of thousands of edits made
Adding data to ChemSpider 
• Users can contribute data and use as a shared 
resource for research group
ChemSpider Spectra
ChemSpider SyntheticPages
Micropublishing with Peer Review 
(a chemical synthesis blog?)
Multi-Step Synthesis
Interactive Data
Creating a CSSP Page – ask 
Trevor or Christopher
Creating a CSSP Page
Creating a CSSP page
Searching CSSP 
• Reaction name or type 
• Reactants or products 
• Author
Search for “Leadbeater”
Search by structure
Anatomy of a Synthetic Page 
Reaction type Product 
Reaction 
scheme 
DOI for easy citation 
Chemicals used, 
including source 
(when important) 
Detailed procedure 
with no length limit 
View the structure 
of a compound – 
and search for it on 
other sites
Anatomy of a Synthetic Page 
Valuable tips 
and tricks 
Downloadable spectra and 
structures 
Leave feedback, 
ask questions, 
contribute your 
own experiences 
Analytical data 
(1H NMR at 
minimum)
What are we building now? 
• We are building the “RSC Data Repository” 
• Containers for compounds, reactions, analytical 
data, tabular data 
• Algorithms for data validation and standardization 
• Flexible indexing and search technologies 
• A platform for modeling data and hosting existing 
models and predictive algorithms
Deposition of Data
Compounds
Reactions
Analytical data
Crystallography data
Deposition of Data 
• Developing systems that provides 
feedback to users regarding data quality 
• Validate/standardize chemical compounds 
• Check for balanced reactions 
• Checks spectral data 
• EXAMPLE Future work 
• Properties – compare experimental to pred. 
• Automated structure verification - NMR
Can we get historical data? 
• Text and data can be mined 
• Spectra can be extracted and converted 
• SO MUCH Open Source Code available
Text Mining 
The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4- 
thiadiazol-5-yl)urea prepared in Example 6 , thionyl chloride 
( 5 ml ) and benzene ( 50 ml ) were charged into a glass 
reaction vessel equipped with a mechanical stirrer , 
thermometer and reflux condenser . 
The reaction mixture was heated at reflux with stirring , for a 
period of about one-half hour . 
After this time the benzene and unreacted thionyl chloride 
were stripped from the reaction mixture under reduced 
pressure to yield the desired product N-(β-chloroethyl)-N-methyl- 
N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a 
solid residue
Text Mining 
The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4- 
thiadiazol-5-yl)urea prepared in Example 6 , thionyl chloride 
( 5 ml ) and benzene ( 50 ml ) were charged into a glass 
reaction vessel equipped with a mechanical stirrer , 
thermometer and reflux condenser . 
The reaction mixture was heated at reflux with stirring , for a 
period of about one-half hour . 
After this time the benzene and unreacted thionyl chloride 
were stripped from the reaction mixture under reduced 
pressure to yield the desired product N-(β-chloroethyl)-N-methyl- 
N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a 
solid residue
Text spectra? 
13C NMR (CDCl3, 100 MHz): δ = 14.12 (CH3), 
30.11 (CH, benzylic methane), 30.77 (CH, 
benzylic methane), 66.12 (CH2), 68.49 (CH2), 
117.72, 118.19, 120.29, 122.67, 123.37, 125.69, 
125.84, 129.03, 130.00, 130.53 (ArCH), 99.42, 
123.60, 134.69, 139.23, 147.21, 147.61, 149.41, 
152.62, 154.88 (ArC)
1H NMR (CDCl3, 400 MHz): 
δ = 2.57 (m, 4H, Me, C(5a)H), 4.24 (d, 1H, J = 4.8 Hz, 
C(11b)H), 4.35 (t, 1H, Jb = 10.8 Hz, C(6)H), 4.47 (m, 2H, 
C(5)H), 4.57 (dd, 1H, J = 2.8 Hz, C(6)H), 6.95 (d, 1H, J = 
8.4 Hz, ArH), 7.18–7.94 (m, 11H, ArH)
Turn “Figures” Into Data
Make it interactive
SO MANY reactions!
Extracting our Archive 
• What could we get from our archive? 
• Find chemical names and generate structures 
• Find chemical images and generate structures 
• Find reactions 
• Find data (MP, BP, LogP) and deposit 
• Find figures and database them 
• Find spectra (and link to structures)
Models published from data
Text-mining Data to compare
The Future 
Internet Data 
Commercial Software 
Pre-competitive Data 
Open Science 
Open Data 
Publishers 
Educators 
Open Databases 
Chemical Vendors 
Small organic molecules 
Undefined materials 
Organometallics 
Nanomaterials 
Polymers 
Minerals 
Particle bound 
Links to Biologicals
A Global Chemistry Network 
• The Global Chemistry Network is much bigger 
than just data - scientific networking, 
micro/publishing, integration hub. 
• The data repository as a handler for data, GCN 
as a submission interface, GCN as a profile 
handler, rewards and recognition platform etc. 
• Data repository architecture designed to deliver 
the underpinning data containers and 
visualization widgets etc.
Thank you 
Email: williamsa@rsc.org 
ORCID: 0000-0002-2668-4821 
Twitter: @ChemConnector 
Personal Blog: www.chemconnector.com 
SLIDES: www.slideshare.net/AntonyWilliams

More Related Content

What's hot

Making solubility models with reaxy
Making solubility models with reaxyMaking solubility models with reaxy
Making solubility models with reaxyAnn-Marie Roche
 
Building a semantic chemistry platform with the royal society of chemistry
Building a semantic chemistry platform with the royal society of chemistryBuilding a semantic chemistry platform with the royal society of chemistry
Building a semantic chemistry platform with the royal society of chemistryValery Tkachenko
 
Open Access to Knowledge@ Novartis
Open Access to Knowledge@ NovartisOpen Access to Knowledge@ Novartis
Open Access to Knowledge@ NovartisAref Jdey
 
SciFinder and its utility in Drug discovery
SciFinder and its utility in Drug discoverySciFinder and its utility in Drug discovery
SciFinder and its utility in Drug discoveryAlichy Sowmya
 
2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...
2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...
2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...Crossref
 

What's hot (19)

A chemistry data repository to serve them all
A chemistry data repository to serve them allA chemistry data repository to serve them all
A chemistry data repository to serve them all
 
Data Mining Dissertations and Adventures and Experiences in the World of Chem...
Data Mining Dissertations and Adventures and Experiences in the World of Chem...Data Mining Dissertations and Adventures and Experiences in the World of Chem...
Data Mining Dissertations and Adventures and Experiences in the World of Chem...
 
Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...
 
The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...
 
Data integration and building a profile for yourself as an online scientist
Data integration and building a profile for yourself as an online scientistData integration and building a profile for yourself as an online scientist
Data integration and building a profile for yourself as an online scientist
 
ChemSpider - building an online database of open spectra
ChemSpider - building an online database of open spectra ChemSpider - building an online database of open spectra
ChemSpider - building an online database of open spectra
 
Cheminformatics and the Structure Elucidation of Natural Products
Cheminformatics and the Structure Elucidation of Natural ProductsCheminformatics and the Structure Elucidation of Natural Products
Cheminformatics and the Structure Elucidation of Natural Products
 
Value of the mediawiki platform for providing content to the chemistry community
Value of the mediawiki platform for providing content to the chemistry communityValue of the mediawiki platform for providing content to the chemistry community
Value of the mediawiki platform for providing content to the chemistry community
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
 
Making solubility models with reaxy
Making solubility models with reaxyMaking solubility models with reaxy
Making solubility models with reaxy
 
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted AnalysisThe US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
 
Building a semantic chemistry platform with the royal society of chemistry
Building a semantic chemistry platform with the royal society of chemistryBuilding a semantic chemistry platform with the royal society of chemistry
Building a semantic chemistry platform with the royal society of chemistry
 
The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...
 
Open Access to Knowledge@ Novartis
Open Access to Knowledge@ NovartisOpen Access to Knowledge@ Novartis
Open Access to Knowledge@ Novartis
 
Web Crawling Chemistry
Web Crawling ChemistryWeb Crawling Chemistry
Web Crawling Chemistry
 
Why Chemistry and the Web Will Benefit from a ChemSpider
Why Chemistry and the Web Will Benefit from a ChemSpiderWhy Chemistry and the Web Will Benefit from a ChemSpider
Why Chemistry and the Web Will Benefit from a ChemSpider
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
 
SciFinder and its utility in Drug discovery
SciFinder and its utility in Drug discoverySciFinder and its utility in Drug discovery
SciFinder and its utility in Drug discovery
 
2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...
2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...
2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...
 

Viewers also liked

Genetics power point
Genetics power pointGenetics power point
Genetics power pointoacore2
 
Brian Covello: Diabetes Research Presentation Semester 2
Brian Covello: Diabetes Research Presentation Semester 2Brian Covello: Diabetes Research Presentation Semester 2
Brian Covello: Diabetes Research Presentation Semester 2Brian Covello
 
Sigma Xi Showcase
Sigma Xi ShowcaseSigma Xi Showcase
Sigma Xi Showcasewrhsiang
 
REU Research Poster
REU Research PosterREU Research Poster
REU Research PosterKrista Chew
 
02 The Role of DNA in Protein Synthesis
02 The Role of DNA in Protein Synthesis02 The Role of DNA in Protein Synthesis
02 The Role of DNA in Protein SynthesisJaya Kumar
 
Why Coffee Protects Against Type II Diabetes
Why Coffee Protects Against Type II DiabetesWhy Coffee Protects Against Type II Diabetes
Why Coffee Protects Against Type II DiabetesBuyOrganicCoffee
 
Pnirs Poster Final June 10 2005
Pnirs Poster Final June 10 2005Pnirs Poster Final June 10 2005
Pnirs Poster Final June 10 2005jdecarli
 
Polymerase chain reaction (pcr)
Polymerase chain reaction (pcr)Polymerase chain reaction (pcr)
Polymerase chain reaction (pcr)Raju Bishnoi
 
polymerase chain reaction
polymerase chain reactionpolymerase chain reaction
polymerase chain reactionVipin Kannan
 
DNA replication, transcription, and translation
DNA replication, transcription, and translationDNA replication, transcription, and translation
DNA replication, transcription, and translationjun de la Ceruz
 
Obesity & adipokines
Obesity & adipokinesObesity & adipokines
Obesity & adipokinesRazavi Nader
 
Current Pharmacology & Toxicology Guidlines For Pharmaceutical Industry
Current Pharmacology & Toxicology Guidlines For Pharmaceutical IndustryCurrent Pharmacology & Toxicology Guidlines For Pharmaceutical Industry
Current Pharmacology & Toxicology Guidlines For Pharmaceutical IndustryProf. Dr. Basavaraj Nanjwade
 
BCIs and DNA Nanotechnology
BCIs and DNA NanotechnologyBCIs and DNA Nanotechnology
BCIs and DNA NanotechnologyMelanie Swan
 
Lecture on nucleic acid and proteins
Lecture on nucleic acid and proteinsLecture on nucleic acid and proteins
Lecture on nucleic acid and proteinsMarilen Parungao
 

Viewers also liked (20)

Genetics power point
Genetics power pointGenetics power point
Genetics power point
 
Brian Covello: Diabetes Research Presentation Semester 2
Brian Covello: Diabetes Research Presentation Semester 2Brian Covello: Diabetes Research Presentation Semester 2
Brian Covello: Diabetes Research Presentation Semester 2
 
ORS Abstract
ORS AbstractORS Abstract
ORS Abstract
 
Sigma Xi Showcase
Sigma Xi ShowcaseSigma Xi Showcase
Sigma Xi Showcase
 
JL Poster
JL PosterJL Poster
JL Poster
 
REU Research Poster
REU Research PosterREU Research Poster
REU Research Poster
 
02 The Role of DNA in Protein Synthesis
02 The Role of DNA in Protein Synthesis02 The Role of DNA in Protein Synthesis
02 The Role of DNA in Protein Synthesis
 
Why Coffee Protects Against Type II Diabetes
Why Coffee Protects Against Type II DiabetesWhy Coffee Protects Against Type II Diabetes
Why Coffee Protects Against Type II Diabetes
 
Pnirs Poster Final June 10 2005
Pnirs Poster Final June 10 2005Pnirs Poster Final June 10 2005
Pnirs Poster Final June 10 2005
 
Polymerase chain reaction (pcr)
Polymerase chain reaction (pcr)Polymerase chain reaction (pcr)
Polymerase chain reaction (pcr)
 
Genetics
GeneticsGenetics
Genetics
 
polymerase chain reaction
polymerase chain reactionpolymerase chain reaction
polymerase chain reaction
 
DNA replication, transcription, and translation
DNA replication, transcription, and translationDNA replication, transcription, and translation
DNA replication, transcription, and translation
 
Fenugreek final
Fenugreek finalFenugreek final
Fenugreek final
 
Obesity & adipokines
Obesity & adipokinesObesity & adipokines
Obesity & adipokines
 
Current Pharmacology & Toxicology Guidlines For Pharmaceutical Industry
Current Pharmacology & Toxicology Guidlines For Pharmaceutical IndustryCurrent Pharmacology & Toxicology Guidlines For Pharmaceutical Industry
Current Pharmacology & Toxicology Guidlines For Pharmaceutical Industry
 
BCIs and DNA Nanotechnology
BCIs and DNA NanotechnologyBCIs and DNA Nanotechnology
BCIs and DNA Nanotechnology
 
Pcr
Pcr Pcr
Pcr
 
Human genome
Human genomeHuman genome
Human genome
 
Lecture on nucleic acid and proteins
Lecture on nucleic acid and proteinsLecture on nucleic acid and proteins
Lecture on nucleic acid and proteins
 

Similar to Building a data repository to manage chemistry research data

How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...Ken Karapetyan
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineKen Karapetyan
 
Data enhancing the royal society of chemistry publication archive
Data enhancing the royal society of chemistry publication archiveData enhancing the royal society of chemistry publication archive
Data enhancing the royal society of chemistry publication archiveKen Karapetyan
 
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryDr. Haxel Consult
 
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...guest01a117
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectKen Karapetyan
 
The application of cloud computing to royal society of chemistry data platforms
The application of cloud computing to royal society of chemistry data platformsThe application of cloud computing to royal society of chemistry data platforms
The application of cloud computing to royal society of chemistry data platformsValery Tkachenko
 

Similar to Building a data repository to manage chemistry research data (20)

Current initiatives in developing research data repositories at the Royal Soc...
Current initiatives in developing research data repositories at the Royal Soc...Current initiatives in developing research data repositories at the Royal Soc...
Current initiatives in developing research data repositories at the Royal Soc...
 
How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...
 
The importance of the InChI identifier as a foundation technology for eScienc...
The importance of the InChI identifier as a foundation technology for eScienc...The importance of the InChI identifier as a foundation technology for eScienc...
The importance of the InChI identifier as a foundation technology for eScienc...
 
Using online chemistry databases to facilitate structure identification in ma...
Using online chemistry databases to facilitate structure identification in ma...Using online chemistry databases to facilitate structure identification in ma...
Using online chemistry databases to facilitate structure identification in ma...
 
ChemSpider as an integration hub for interlinked chemistry data
ChemSpider as an integration hub for interlinked chemistry dataChemSpider as an integration hub for interlinked chemistry data
ChemSpider as an integration hub for interlinked chemistry data
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
 
Hosting public domain chemicals data online for the community – the challenge...
Hosting public domain chemicals data online for the community – the challenge...Hosting public domain chemicals data online for the community – the challenge...
Hosting public domain chemicals data online for the community – the challenge...
 
Data enhancing the royal society of chemistry publication archive
Data enhancing the royal society of chemistry publication archiveData enhancing the royal society of chemistry publication archive
Data enhancing the royal society of chemistry publication archive
 
Data enhancing the royal society of chemistry publication archive
Data enhancing the royal society of chemistry publication archiveData enhancing the royal society of chemistry publication archive
Data enhancing the royal society of chemistry publication archive
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
 
Ontology work at the Royal Society of Chemistry
Ontology work at the Royal Society of ChemistryOntology work at the Royal Society of Chemistry
Ontology work at the Royal Society of Chemistry
 
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
 
Our dire need to mandate data standards and expectations for scientific publi...
Our dire need to mandate data standards and expectations for scientific publi...Our dire need to mandate data standards and expectations for scientific publi...
Our dire need to mandate data standards and expectations for scientific publi...
 
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
 
Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards
 
The application of cloud computing to royal society of chemistry data platforms
The application of cloud computing to royal society of chemistry data platformsThe application of cloud computing to royal society of chemistry data platforms
The application of cloud computing to royal society of chemistry data platforms
 
Apps and approaches to mobilizing chemistry from the Royal Society of Chemistry
Apps and approaches to mobilizing chemistry from the Royal Society of ChemistryApps and approaches to mobilizing chemistry from the Royal Society of Chemistry
Apps and approaches to mobilizing chemistry from the Royal Society of Chemistry
 
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
 

Recently uploaded

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 

Recently uploaded (20)

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 

Building a data repository to manage chemistry research data

  • 1. Building a Data Repository to Manage Chemistry Research Data Antony Williams University of Connecticut 10/15/2014
  • 2. Chemistry for the Community • The Royal Society of Chemistry as a provider of chemistry for the community: • As a charity • As a scientific publisher • As a host of commercial databases • As a partner in grant-based projects • As the host of ChemSpider • And now in development : the RSC Data Repository for Chemistry
  • 3. • ~30 million chemicals and growing • Data sourced from hundreds of sources • Crowd sourced curation and annotation • Ongoing deposition of data from our journals and our collaborators • Structure centric hub for web-searching • …and a really big dictionary!!!
  • 10. Vendors and data sources
  • 12. Many Names, One Structure
  • 13. Crowdsourced “Annotations” • Users can add • Descriptions, Syntheses and Commentaries • Links to PubMed articles • Links to articles via DOIs • Add spectral data • Add Crystallographic Information Files • Add photos • Add MP3 files • Add Videos
  • 14. Crowdsourced Enhancement • The community can clean and enhance the database by providing Feedback and direct curation • Tens of thousands of edits made
  • 15. Adding data to ChemSpider • Users can contribute data and use as a shared resource for research group
  • 18. Micropublishing with Peer Review (a chemical synthesis blog?)
  • 21. Creating a CSSP Page – ask Trevor or Christopher
  • 24. Searching CSSP • Reaction name or type • Reactants or products • Author
  • 27. Anatomy of a Synthetic Page Reaction type Product Reaction scheme DOI for easy citation Chemicals used, including source (when important) Detailed procedure with no length limit View the structure of a compound – and search for it on other sites
  • 28. Anatomy of a Synthetic Page Valuable tips and tricks Downloadable spectra and structures Leave feedback, ask questions, contribute your own experiences Analytical data (1H NMR at minimum)
  • 29. What are we building now? • We are building the “RSC Data Repository” • Containers for compounds, reactions, analytical data, tabular data • Algorithms for data validation and standardization • Flexible indexing and search technologies • A platform for modeling data and hosting existing models and predictive algorithms
  • 35. Deposition of Data • Developing systems that provides feedback to users regarding data quality • Validate/standardize chemical compounds • Check for balanced reactions • Checks spectral data • EXAMPLE Future work • Properties – compare experimental to pred. • Automated structure verification - NMR
  • 36. Can we get historical data? • Text and data can be mined • Spectra can be extracted and converted • SO MUCH Open Source Code available
  • 37. Text Mining The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4- thiadiazol-5-yl)urea prepared in Example 6 , thionyl chloride ( 5 ml ) and benzene ( 50 ml ) were charged into a glass reaction vessel equipped with a mechanical stirrer , thermometer and reflux condenser . The reaction mixture was heated at reflux with stirring , for a period of about one-half hour . After this time the benzene and unreacted thionyl chloride were stripped from the reaction mixture under reduced pressure to yield the desired product N-(β-chloroethyl)-N-methyl- N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a solid residue
  • 38. Text Mining The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4- thiadiazol-5-yl)urea prepared in Example 6 , thionyl chloride ( 5 ml ) and benzene ( 50 ml ) were charged into a glass reaction vessel equipped with a mechanical stirrer , thermometer and reflux condenser . The reaction mixture was heated at reflux with stirring , for a period of about one-half hour . After this time the benzene and unreacted thionyl chloride were stripped from the reaction mixture under reduced pressure to yield the desired product N-(β-chloroethyl)-N-methyl- N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a solid residue
  • 39. Text spectra? 13C NMR (CDCl3, 100 MHz): δ = 14.12 (CH3), 30.11 (CH, benzylic methane), 30.77 (CH, benzylic methane), 66.12 (CH2), 68.49 (CH2), 117.72, 118.19, 120.29, 122.67, 123.37, 125.69, 125.84, 129.03, 130.00, 130.53 (ArCH), 99.42, 123.60, 134.69, 139.23, 147.21, 147.61, 149.41, 152.62, 154.88 (ArC)
  • 40. 1H NMR (CDCl3, 400 MHz): δ = 2.57 (m, 4H, Me, C(5a)H), 4.24 (d, 1H, J = 4.8 Hz, C(11b)H), 4.35 (t, 1H, Jb = 10.8 Hz, C(6)H), 4.47 (m, 2H, C(5)H), 4.57 (dd, 1H, J = 2.8 Hz, C(6)H), 6.95 (d, 1H, J = 8.4 Hz, ArH), 7.18–7.94 (m, 11H, ArH)
  • 44. Extracting our Archive • What could we get from our archive? • Find chemical names and generate structures • Find chemical images and generate structures • Find reactions • Find data (MP, BP, LogP) and deposit • Find figures and database them • Find spectra (and link to structures)
  • 47. The Future Internet Data Commercial Software Pre-competitive Data Open Science Open Data Publishers Educators Open Databases Chemical Vendors Small organic molecules Undefined materials Organometallics Nanomaterials Polymers Minerals Particle bound Links to Biologicals
  • 48. A Global Chemistry Network • The Global Chemistry Network is much bigger than just data - scientific networking, micro/publishing, integration hub. • The data repository as a handler for data, GCN as a submission interface, GCN as a profile handler, rewards and recognition platform etc. • Data repository architecture designed to deliver the underpinning data containers and visualization widgets etc.
  • 49. Thank you Email: williamsa@rsc.org ORCID: 0000-0002-2668-4821 Twitter: @ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams

Editor's Notes

  1. Search with a chemical name, common name, tradename or other identifier.
  2. There are several structure drawing applets available.