SlideShare a Scribd company logo
1 of 87
Serving the Medicinal Chemistry 
Community with RSC 
Cheminformatics Platforms 
Antony Williams 
Brazilian Medicinal Chemistry Conference, 
November 11th 2014
www.slideshare.net/AntonyWilliams
Chemistry for the Community 
• The Royal Society of Chemistry as a provider 
of chemistry for the community: 
• As a charity 
• As a scientific publisher 
• As a host of commercial databases 
• As a partner in grant-based projects 
• As the host of ChemSpider 
• New: the RSC Data Repository for Chemistry
Overwhelmed with data…
So much online data…
Organizations releasing data
Funders encourage openness
We model data – then lose it 
• What if we could share models and the 
underlying data via a central repository? 
• This is MOSTLY not a technology issue!!!
Pharma Companies Repeat Work 
Pre-competitive Informatics: 
Pharma are all accessing, processing, storing & re-processing external research data 
Literature 
Genbank 
PubChem 
Patents 
Databases 
Downloads 
Data Integration Data Analysis 
Firewalled Databases 
Repeat at 
each 
company 
x
Publications lock up data
When I finish this article…
The data will be locked up..
But what if we could navigate? 
What’s the 
structure? 
Are they in 
our file? 
What’s 
similar? 
What’s the 
Pharmacology target? 
data? 
Known 
Pathways? 
Working On 
Connections Now? 
to disease? 
Expressed in 
right cell type? 
Competitors? 
IP?
• ~30 million chemicals and growing 
• Data sourced from >500 different sources 
• Crowd sourced curation and annotation 
• Ongoing deposition of data from our 
journals and our collaborators 
• Structure centric hub for web-searching 
• It’s a really big dictionary!!!
ChemSpider
ChemSpider
Experimental/Predicted Properties
Literature references
Google Books
Patents references
RSC Databases
Vendors and data sources
With structures and names…
Name Searching
Standards for Integration
Structure Searching
What did we learn??? 
• Data Quality is an enormous challenge 
• Crowd sourced annotation can help!
Crowdsourced Enhancement 
• The community can clean and enhance the 
database by providing Feedback and direct 
curation 
• Tens of thousands of edits made
But Software Can Help 
• SRS as guidance for standardization rules
http://cvsp.chemspider.com
ChemSpider is a building block
…for grant-based services 
• Use ChemSpider data slices and API/Web 
services to support grant-based projects 
• Multiple European consortium-based grants 
• PharmaSea (FP7 funded) 
• Open PHACTS (IMI funded) 
• Support Open Drug Discovery projects
Over half of all drugs introduced between 
1940 and 2006 were of natural origin or 
inspired by natural compounds
http://www.pharma-sea.eu/
PharmaSea
..as a dereplication platform
http://www.openphacts.org/
• 3-year Innovative Medicines Initiative project 
• Integrating chemistry and biology data using 
semantic web technologies 
• Open code, open data, open standards 
• Academics, Pharma Companies, Publishers…
The Open PHACTS community ecosystem
Open PHACTS http://ops.rsc.org
Chemistry Searching…
But what about Biology???
http://explorer.openphacts.org
Open PHACTS Explorer
Pharmacology by Compound
Compounds and enzymes
Compounds and enzymes
Pharmacology by Target
Facilitated by ChemSpider RDF
Open Sourcing Data and Code 
• Open PHACTS data licensed as Open Data 
– approx. 2 Million chemicals 
• Open Source code to release to community 
(from Open PHACTS github site) 
• Chemical Registration Service 
• Chemical Validation and Standardization Platform
ChemSpider as a “dictionary” 
• Systematic name(s) 
• Trivial Name(s) 
• SMILES 
• InChI Strings 
• InChIKeys 
• Database IDs 
• Registry Number
Valium on ChemSpider 
With strong dictionaries 
connections can be made…
Semantic Mark-up of Articles
Linking Names to Structures
…and providing more links… 
MedChemComm
Mark-up of MedChemComm
Mark-up of MedChemComm
Mark-up of MedChemComm
Links out from MedChemComm
Links out from MedChemComm
What about old publications? 
• We would LOVE to bring data out of our archive 
• Find chemical names and generate structures 
• Find chemical images and generate structures 
• Find reactions – and make a database! 
• Find data (MP, BP, LogP) and host. Build models! 
• Find figures and database them 
• Find spectra (and link to structures) 
• Validate the data algorithmically
RSC Archive – since 1841
SO MANY reactions!
Text Mining 
The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4- 
thiadiazol-5-yl)urea prepared in Example 6 , thionyl chloride ( 5 
ml ) and benzene ( 50 ml ) were charged into a glass reaction 
vessel equipped with a mechanical stirrer , thermometer and 
reflux condenser . 
The reaction mixture was heated at reflux with stirring , for a 
period of about one-half hour . 
After this time the benzene and unreacted thionyl chloride 
were stripped from the reaction mixture under reduced 
pressure to yield the desired product N-(β-chloroethyl)-N-methyl- 
N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a solid 
residue
Text Mining 
The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4- 
thiadiazol-5-yl)urea prepared in Example 6 , thionyl chloride ( 5 
ml ) and benzene ( 50 ml ) were charged into a glass reaction 
vessel equipped with a mechanical stirrer , thermometer and 
reflux condenser . 
The reaction mixture was heated at reflux with stirring , for a 
period of about one-half hour . 
After this time the benzene and unreacted thionyl chloride 
were stripped from the reaction mixture under reduced 
pressure to yield the desired product N-(β-chloroethyl)-N-methyl- 
N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a solid 
residue
But names = structures 
• Systematic names can be generated FROM 
chemical structures algorithmically
..and Context Gives Reactions 
The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4- 
thiadiazol-5-yl)urea prepared in Example 6 , thionyl chloride ( 5 
ml ) and benzene ( 50 ml ) were charged into a glass reaction 
vessel equipped with a mechanical stirrer , thermometer and 
reflux condenser . 
The reaction mixture was heated at reflux with stirring , for a 
period of about one-half hour . 
After this time the benzene and unreacted thionyl chloride 
were stripped from the reaction mixture under reduced 
pressure to yield the desired product N-(β-chloroethyl)-N-methyl- 
N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a solid 
residue
ChemSpider Reactions
Text spectra? 
• 13C NMR (CDCl3, 100 MHz): δ = 14.12 (CH3), 
30.11 (CH, benzylic methane), 30.77 (CH, 
benzylic methane), 66.12 (CH2), 68.49 (CH2), 
117.72, 118.19, 120.29, 122.67, 123.37, 
125.69, 125.84, 129.03, 130.00, 130.53 
(ArCH), 99.42, 123.60, 134.69, 139.23, 
147.21, 147.61, 149.41, 152.62, 154.88 (ArC)
1H NMR (CDCl3, 400 MHz): 
δ = 2.57 (m, 4H, Me, C(5a)H), 4.24 (d, 1H, J = 4.8 Hz, 
C(11b)H), 4.35 (t, 1H, Jb = 10.8 Hz, C(6)H), 4.47 (m, 2H, 
C(5)H), 4.57 (dd, 1H, J = 2.8 Hz, C(6)H), 6.95 (d, 1H, J = 
8.4 Hz, ArH), 7.18–7.94 (m, 11H, ArH)
Turn “Figures” Into Data 
FIGURE 
EXTRACTED
Models published from data
Text-mining Data to compare 
Presently extracting 
property data from 
Google Patents as test
National Compound Collection 
• Unlock chemistry data in PhD theses 
• Discover novel molecules for biosciences 
• Working together with industry and the 
academic community 
• Build the data into RSC online platforms 
• Perform virtual screening/modeling and access 
physical samples to screen
We should make sure 
thesis data are available 
in consumable formats – 
compounds, reactions, 
analytical data etc.
What are we building? 
• We are building the “RSC Data Repository” 
• Containers for compounds, reactions, analytical 
data, tabular data 
• Algorithms for data validation and standardization 
• Flexible indexing and search technologies 
• A platform for modeling data and hosting existing 
models and predictive algorithms 
• Chemistry RESEARCH DATA MANAGEMENT
Compounds
Reactions
Analytical data
Crystallography data
With data in hand maybe it’s time 
What’s the 
structure? 
Are they in 
our file? 
What’s 
similar? 
What’s the 
Pharmacology target? 
data? 
Known 
Pathways? 
Working On 
Connections Now? 
to disease? 
Expressed in 
right cell type? 
Competitors? 
IP?
Conclusions 
• We are building platforms that can support 
multiple communities, including MedChem 
• We are working hard to extend our data and 
improve quality of online data 
• Opening APIs to the platforms and data 
provides a powerful building block 
• We are Open Sourcing components of our 
platforms to the community
Thank you 
Email: williamsa@rsc.org 
ORCID: 0000-0002-2668-4821 
Twitter: @ChemConnector 
Personal Blog: www.chemconnector.com 
SLIDES: www.slideshare.net/AntonyWilliams

More Related Content

What's hot

schema.org and biomedical ontologies
schema.org and biomedical ontologies schema.org and biomedical ontologies
schema.org and biomedical ontologies Simon Jupp
 
Facilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-juppFacilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-juppSimon Jupp
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectStuart Chalk
 
Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Sean Ekins
 

What's hot (20)

The importance of the InChI identifier as a foundation technology for eScienc...
The importance of the InChI identifier as a foundation technology for eScienc...The importance of the InChI identifier as a foundation technology for eScienc...
The importance of the InChI identifier as a foundation technology for eScienc...
 
Why Chemistry and the Web Will Benefit from a ChemSpider
Why Chemistry and the Web Will Benefit from a ChemSpiderWhy Chemistry and the Web Will Benefit from a ChemSpider
Why Chemistry and the Web Will Benefit from a ChemSpider
 
schema.org and biomedical ontologies
schema.org and biomedical ontologies schema.org and biomedical ontologies
schema.org and biomedical ontologies
 
The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...
 
ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...
ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...
ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...
 
Facilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-juppFacilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-jupp
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP Project
 
Cheminformatics and the Structure Elucidation of Natural Products
Cheminformatics and the Structure Elucidation of Natural ProductsCheminformatics and the Structure Elucidation of Natural Products
Cheminformatics and the Structure Elucidation of Natural Products
 
Taming The Wild West Of Internet Based Chemistry You Can Help
Taming The Wild West Of Internet Based Chemistry You Can HelpTaming The Wild West Of Internet Based Chemistry You Can Help
Taming The Wild West Of Internet Based Chemistry You Can Help
 
The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...
 
Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...
 
Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...
 
eScience Resources for the Chemistry Community from the Royal Society of Chem...
eScience Resources for the Chemistry Community from the Royal Society of Chem...eScience Resources for the Chemistry Community from the Royal Society of Chem...
eScience Resources for the Chemistry Community from the Royal Society of Chem...
 
eScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiativeseScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiatives
 
Hosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry dataHosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry data
 
Our dire need to mandate data standards and expectations for scientific publi...
Our dire need to mandate data standards and expectations for scientific publi...Our dire need to mandate data standards and expectations for scientific publi...
Our dire need to mandate data standards and expectations for scientific publi...
 
Building a data repository to manage chemistry research data
Building a data repository to manage chemistry research dataBuilding a data repository to manage chemistry research data
Building a data repository to manage chemistry research data
 
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspnRSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
 
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
 
The future of scientific information & communication
The future of scientific information & communicationThe future of scientific information & communication
The future of scientific information & communication
 

Similar to Serving the medicinal chemistry community with Royal Society of Chemistry cheminformatics platforms

How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...Ken Karapetyan
 
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryDr. Haxel Consult
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineKen Karapetyan
 

Similar to Serving the medicinal chemistry community with Royal Society of Chemistry cheminformatics platforms (20)

ChemSpider as an integration hub for interlinked chemistry data
ChemSpider as an integration hub for interlinked chemistry dataChemSpider as an integration hub for interlinked chemistry data
ChemSpider as an integration hub for interlinked chemistry data
 
Hosting public domain chemicals data online for the community – the challenge...
Hosting public domain chemicals data online for the community – the challenge...Hosting public domain chemicals data online for the community – the challenge...
Hosting public domain chemicals data online for the community – the challenge...
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
 
How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...
 
Experiences in Hosting Big Chemistry Data Collections for the Community
Experiences in Hosting Big Chemistry Data Collections for the CommunityExperiences in Hosting Big Chemistry Data Collections for the Community
Experiences in Hosting Big Chemistry Data Collections for the Community
 
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
 
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
 
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
 
The expansive reach of ChemSpider as a resource for the chemistry community
The expansive reach of ChemSpider as a resource for the chemistry communityThe expansive reach of ChemSpider as a resource for the chemistry community
The expansive reach of ChemSpider as a resource for the chemistry community
 
Web Crawling Chemistry
Web Crawling ChemistryWeb Crawling Chemistry
Web Crawling Chemistry
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
 
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
 
Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards
 
Improving online chemistry one structure at a time
Improving online chemistry one structure at a timeImproving online chemistry one structure at a time
Improving online chemistry one structure at a time
 
Current initiatives in developing research data repositories at the Royal Soc...
Current initiatives in developing research data repositories at the Royal Soc...Current initiatives in developing research data repositories at the Royal Soc...
Current initiatives in developing research data repositories at the Royal Soc...
 
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
 
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
 
Chemspider Presentation at the ACS Meeting in New orleans
Chemspider Presentation at the ACS Meeting in New orleansChemspider Presentation at the ACS Meeting in New orleans
Chemspider Presentation at the ACS Meeting in New orleans
 
Integrating and curating internet based chemistry resources to serve life sci...
Integrating and curating internet based chemistry resources to serve life sci...Integrating and curating internet based chemistry resources to serve life sci...
Integrating and curating internet based chemistry resources to serve life sci...
 

Recently uploaded

zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.k64182334
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
The Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravityThe Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravitySubhadipsau21168
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 

Recently uploaded (20)

zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
The Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravityThe Black hole shadow in Modified Gravity
The Black hole shadow in Modified Gravity
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 

Serving the medicinal chemistry community with Royal Society of Chemistry cheminformatics platforms

  • 1. Serving the Medicinal Chemistry Community with RSC Cheminformatics Platforms Antony Williams Brazilian Medicinal Chemistry Conference, November 11th 2014
  • 3. Chemistry for the Community • The Royal Society of Chemistry as a provider of chemistry for the community: • As a charity • As a scientific publisher • As a host of commercial databases • As a partner in grant-based projects • As the host of ChemSpider • New: the RSC Data Repository for Chemistry
  • 5. So much online data…
  • 8. We model data – then lose it • What if we could share models and the underlying data via a central repository? • This is MOSTLY not a technology issue!!!
  • 9. Pharma Companies Repeat Work Pre-competitive Informatics: Pharma are all accessing, processing, storing & re-processing external research data Literature Genbank PubChem Patents Databases Downloads Data Integration Data Analysis Firewalled Databases Repeat at each company x
  • 11. When I finish this article…
  • 12. The data will be locked up..
  • 13. But what if we could navigate? What’s the structure? Are they in our file? What’s similar? What’s the Pharmacology target? data? Known Pathways? Working On Connections Now? to disease? Expressed in right cell type? Competitors? IP?
  • 14.
  • 15. • ~30 million chemicals and growing • Data sourced from >500 different sources • Crowd sourced curation and annotation • Ongoing deposition of data from our journals and our collaborators • Structure centric hub for web-searching • It’s a really big dictionary!!!
  • 23. Vendors and data sources
  • 28. What did we learn??? • Data Quality is an enormous challenge • Crowd sourced annotation can help!
  • 29. Crowdsourced Enhancement • The community can clean and enhance the database by providing Feedback and direct curation • Tens of thousands of edits made
  • 30. But Software Can Help • SRS as guidance for standardization rules
  • 32. ChemSpider is a building block
  • 33. …for grant-based services • Use ChemSpider data slices and API/Web services to support grant-based projects • Multiple European consortium-based grants • PharmaSea (FP7 funded) • Open PHACTS (IMI funded) • Support Open Drug Discovery projects
  • 34.
  • 35. Over half of all drugs introduced between 1940 and 2006 were of natural origin or inspired by natural compounds
  • 36.
  • 41. • 3-year Innovative Medicines Initiative project • Integrating chemistry and biology data using semantic web technologies • Open code, open data, open standards • Academics, Pharma Companies, Publishers…
  • 42. The Open PHACTS community ecosystem
  • 45. But what about Biology???
  • 52.
  • 54. Open Sourcing Data and Code • Open PHACTS data licensed as Open Data – approx. 2 Million chemicals • Open Source code to release to community (from Open PHACTS github site) • Chemical Registration Service • Chemical Validation and Standardization Platform
  • 55. ChemSpider as a “dictionary” • Systematic name(s) • Trivial Name(s) • SMILES • InChI Strings • InChIKeys • Database IDs • Registry Number
  • 56. Valium on ChemSpider With strong dictionaries connections can be made…
  • 58. Linking Names to Structures
  • 59. …and providing more links… MedChemComm
  • 63. Links out from MedChemComm
  • 64. Links out from MedChemComm
  • 65. What about old publications? • We would LOVE to bring data out of our archive • Find chemical names and generate structures • Find chemical images and generate structures • Find reactions – and make a database! • Find data (MP, BP, LogP) and host. Build models! • Find figures and database them • Find spectra (and link to structures) • Validate the data algorithmically
  • 66. RSC Archive – since 1841
  • 68. Text Mining The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4- thiadiazol-5-yl)urea prepared in Example 6 , thionyl chloride ( 5 ml ) and benzene ( 50 ml ) were charged into a glass reaction vessel equipped with a mechanical stirrer , thermometer and reflux condenser . The reaction mixture was heated at reflux with stirring , for a period of about one-half hour . After this time the benzene and unreacted thionyl chloride were stripped from the reaction mixture under reduced pressure to yield the desired product N-(β-chloroethyl)-N-methyl- N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a solid residue
  • 69. Text Mining The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4- thiadiazol-5-yl)urea prepared in Example 6 , thionyl chloride ( 5 ml ) and benzene ( 50 ml ) were charged into a glass reaction vessel equipped with a mechanical stirrer , thermometer and reflux condenser . The reaction mixture was heated at reflux with stirring , for a period of about one-half hour . After this time the benzene and unreacted thionyl chloride were stripped from the reaction mixture under reduced pressure to yield the desired product N-(β-chloroethyl)-N-methyl- N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a solid residue
  • 70. But names = structures • Systematic names can be generated FROM chemical structures algorithmically
  • 71. ..and Context Gives Reactions The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4- thiadiazol-5-yl)urea prepared in Example 6 , thionyl chloride ( 5 ml ) and benzene ( 50 ml ) were charged into a glass reaction vessel equipped with a mechanical stirrer , thermometer and reflux condenser . The reaction mixture was heated at reflux with stirring , for a period of about one-half hour . After this time the benzene and unreacted thionyl chloride were stripped from the reaction mixture under reduced pressure to yield the desired product N-(β-chloroethyl)-N-methyl- N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a solid residue
  • 73. Text spectra? • 13C NMR (CDCl3, 100 MHz): δ = 14.12 (CH3), 30.11 (CH, benzylic methane), 30.77 (CH, benzylic methane), 66.12 (CH2), 68.49 (CH2), 117.72, 118.19, 120.29, 122.67, 123.37, 125.69, 125.84, 129.03, 130.00, 130.53 (ArCH), 99.42, 123.60, 134.69, 139.23, 147.21, 147.61, 149.41, 152.62, 154.88 (ArC)
  • 74. 1H NMR (CDCl3, 400 MHz): δ = 2.57 (m, 4H, Me, C(5a)H), 4.24 (d, 1H, J = 4.8 Hz, C(11b)H), 4.35 (t, 1H, Jb = 10.8 Hz, C(6)H), 4.47 (m, 2H, C(5)H), 4.57 (dd, 1H, J = 2.8 Hz, C(6)H), 6.95 (d, 1H, J = 8.4 Hz, ArH), 7.18–7.94 (m, 11H, ArH)
  • 75. Turn “Figures” Into Data FIGURE EXTRACTED
  • 77. Text-mining Data to compare Presently extracting property data from Google Patents as test
  • 78. National Compound Collection • Unlock chemistry data in PhD theses • Discover novel molecules for biosciences • Working together with industry and the academic community • Build the data into RSC online platforms • Perform virtual screening/modeling and access physical samples to screen
  • 79. We should make sure thesis data are available in consumable formats – compounds, reactions, analytical data etc.
  • 80. What are we building? • We are building the “RSC Data Repository” • Containers for compounds, reactions, analytical data, tabular data • Algorithms for data validation and standardization • Flexible indexing and search technologies • A platform for modeling data and hosting existing models and predictive algorithms • Chemistry RESEARCH DATA MANAGEMENT
  • 85. With data in hand maybe it’s time What’s the structure? Are they in our file? What’s similar? What’s the Pharmacology target? data? Known Pathways? Working On Connections Now? to disease? Expressed in right cell type? Competitors? IP?
  • 86. Conclusions • We are building platforms that can support multiple communities, including MedChem • We are working hard to extend our data and improve quality of online data • Opening APIs to the platforms and data provides a powerful building block • We are Open Sourcing components of our platforms to the community
  • 87. Thank you Email: williamsa@rsc.org ORCID: 0000-0002-2668-4821 Twitter: @ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams