SlideShare a Scribd company logo
1 of 83
Download to read offline
Crowdsourcing, Collaborations and
                Text-Mining in a
        World of Open Chemistry

                      Antony Williams
Imagine a time when ….

The internet is searchable by chemical structure and
substructure (e.g.Wikipedia, Google Scholar)
Chemistry articles are indexed and searchable by a free
online service
The web is linked together through the “language of
chemistry”
Publicly funded research data can be shared and
discussed in the Open, maybe as ONS?
Cheminformatics has as much of a public face as
bioinformatics

             Building a Structure Centric Community for Chemists
ChemSpider - A Search Engine for Chemists

Questions a chemist might ask…
  What is the melting point of n-butanol?
  What is the chemical structure of Xanax?
  Chemically, what is phenolphthalein?
  What are the stereocenters of cholesterol?
  Where can I find publications about xylene?
  What are the different trade names for Ketoconazole?
  What is the NMR spectrum of Aspirin?
  What are the safety handling issues for Thymol Blue?

  ChemSpider can answer all of these questions

              Building a Structure Centric Community for Chemists
What is a Structure?
     Ask a computer…ask a chemist




Building a Structure Centric Community for Chemists
Tell Me About Glutathione




Building a Structure Centric Community for Chemists
Tell Me About Glutathione




Building a Structure Centric Community for Chemists
Tell Me About Glutathione




Building a Structure Centric Community for Chemists
Tell Me About Glutathione




Building a Structure Centric Community for Chemists
Tell Me About Glutathione




Building a Structure Centric Community for Chemists
Tell Me About Glutathione




Building a Structure Centric Community for Chemists
Link outs




Building a Structure Centric Community for Chemists
Links out to KEGG
Kyoto Encyclopedia of Genes and Genomes




         Building a Structure Centric Community for Chemists
How many names does a compound have?




       Building a Structure Centric Community for Chemists
ChemSpider Data Content

Over 21.5 million unique chemical structures from ca. 150 data
sources
   Online Databases –PubChem, Drugbank, KEGG, Wikipedia
   Literature – PubMed, J Het Chem, Nature, RSC, Open Access
   Chemical Vendors – over 40 different vendors and growing
   Personal Depositions – individual contributions
   Content database vendors
   Analytical data collections
   Patents
   Web scraping

Content is linked back to the original data sources

                  Building a Structure Centric Community for Chemists
Other Searches

What compounds have a mass of 300+/-0.001?




or search a combination of intrinsic/predicted properties
             Building a Structure Centric Community for Chemists
Other Searches




Building a Structure Centric Community for Chemists
Complex Search




Building a Structure Centric Community for Chemists
The Quality of Data Online…
Aggregating data opens up quality issues
Structure-identifier associations are “dirty”
Structures are COMMONLY incorrect
Manual curation of small databases is enough work – what
about millions of structures?
Structures are far from perfect. What is a “correct structure”?
  Full stereochemistry?
  Historical timeline of structure?
  Who is the authority?


                 Building a Structure Centric Community for Chemists
Who holds THE Quality Authority?

Chemical Abstracts Service is the structural authority
today. 1400 employees, world standard in chemistry
information
101 years of knowledge, process and expertise.
How can an online, free access system peacefully co-
exist with the authority?




              Building a Structure Centric Community for Chemists
Quality is a Major Issue- Search Butanol
             OLD EXAMPLE..now fixed




   Building a Structure Centric Community for Chemists
Wikipedia Chemistry Curation project

Only ca. 5000 organic structures, 7000 total
structures
Almost a year of work so far for a team of 6
people
Many errors removed in the process. Curation
process is a daily event for users/depositors
Slow and torturous process

http://en.wikipedia.org/wiki/Talk:Tacrolimus#
IUPAC_Name_and_structure


                 Building a Structure Centric Community for Chemists
Wikipedia Curation

Looking for self-consistency
across a Wikipedia Page
Primary key is the article TITLE
The chemical shown needs to
match the title
Cyclic self-consistency – and
decisions must get made




             Building a Structure Centric Community for Chemists
Viagra or Sildenafil




Building a Structure Centric Community for Chemists
Other issues…




Building a Structure Centric Community for Chemists
Charges




Building a Structure Centric Community for Chemists
Sugars – Machine Readable vs Aesthetics




Haworth                       Stereo                         Fischer

       Building a Structure Centric Community for Chemists
Wikipedia – Crowdsourcing Chemistry




       Building a Structure Centric Community for Chemists
Thymol Blue on ChemSpider

Data online includes:
  UV-vis spectrum
  Measured experimental properties
  Link to Wikipedia article
  Links to chromatography details
  Multiple identifiers/trade names etc.
  Links to vendors/suppliers/other databases
  Safety information

  http://www.chemspider.com/q/thymol%20blue

              Building a Structure Centric Community for Chemists
Differences between ChemSpider/Wikipedia

           ChemSpider                                  Wikipedia
>21 million unique structures                ~5000 organics, 2000 others
Complex queries – Properties,                Text
Text, structure/substructure, OA
publishers, Data Sources, …
Prediction of properties                     No
Analytical Data                              No, but links.
Active depositors/curators – 30              Active editors > 50 (?)
6000 people/day; 1900 registered             ????
Compound monographs linked                   Detailed compound monographs

                  Building a Structure Centric Community for Chemists
Differences between Wikipedia/ChemSpider

            Wikipedia                                    ChemSpider
Supported by tried and tested                 Primarily Microsoft .NET
Media-Wiki platform.                          technologies with OS components
Established infrastructure and                “Out of a basement” on three
Wikipedia Foundation Team                     servers and 5 volunteers
Chemistry is a subset of the ‘Pedia           Chemistry is the focus of ‘Spider
GFL licensing for everything                  Mixed “licensing”
Strong team of WP:Chem                        Growing team of advocates,
advocates, curators and admins                curators and users
Worldwide reputation as quality               Growing reputation as focused on
source – good and bad                         quality
                   Building a Structure Centric Community for Chemists
Crowd-sourcing Curation

How to curate data for millions of structures?
Robot processes can clean up depositions
  Search for Chloride and check molecular formula for Cl
  Check for stereochemistry and remove names with stereo
Provide a simple-to-use platform to curate, annotate
and tag data
Provide curator administration to prevent vandalism
(Veropedia)


             Building a Structure Centric Community for Chemists
Post Comments
Anyone can “Post Comments” associated with a
structure. To curate data we require login to track




              Building a Structure Centric Community for Chemists
Multi-level Curation and Approval




    Building a Structure Centric Community for Chemists
Crowd-sourcing Chemistry

Crowd-sourced curation: identify and tag errors, edit
names, synonyms, identify records for deprecation

ALSO

Crowd-sourced deposition: anyone can deposit data
(structures, text, images, analytical data)



             Building a Structure Centric Community for Chemists
DailyMed




Building a Structure Centric Community for Chemists
Quality of Structures




Building a Structure Centric Community for Chemists
Quality of Structures!!!




Building a Structure Centric Community for Chemists
Structure-Centric

We want to search “information” by structure, substructure,
similarity of structure
Specific focus on Open Chemistry at present
Standard approaches would be:
  Identify chemical names “entity extraction”
  Convert chemical names to structures and index
ChemSpider has a validated dictionary of structure-name
pairs
Use name extraction, name-conversion and dictionary look-
up. THEN curate.

               Building a Structure Centric Community for Chemists
“Entity Extraction”

Rule-based recognition of systematic names:
  Use a lexeme of name fragments
  Rules for identifying bounds of a name


Look-up dictionary:
  Drug Names
  Trivial Names
  Numbers : Registry IDs, EINECS/ELINCS
  Massive look-up dictionary of validated identifiers on
  ChemSpider

              Building a Structure Centric Community for Chemists
Building a Structure Centric Community for Chemists
Name Recognition

Azo aldehyde 2 was synthesized according to a
reported method [17]. To a stirred solution of azo aldehyde
2 (1.08 g, 3.76 mmol ) in dry CH2Cl2 (30.00 mL) at 0
oC were successively added (3,4-diaminophenyl)phenyl
methanone 1(0.40 g, 1.88 mmol) and a excces of anhydrous
MgSO4 (2.00 g,16.67 mmol) .
The resulting mixture was stirred for 6 hours at room
temperature [18]. The mixture was filtered and washed with
dichloromethane . Then the solvent was evaporated under
reduced pressure to give azo Schiff base 3 as a red solid which
was recrystalized from ethanol 95% (1.28 g, 91 %)


               Building a Structure Centric Community for Chemists
Name Recognition

Azo aldehyde 2 was synthesized according to a
reported method [17]. To a stirred solution of azo aldehyde
2 (1.08 g, 3.76 mmol ) in dry CH2Cl2 (30.00 mL) at 0
oC were successively added (3,4-diaminophenyl)phenyl
methanone 1(0.40 g, 1.88 mmol) and a excess of anhydrous
MgSO4 (2.00 g,16.67 mmol) .
The resulting mixture was stirred for 6 hours at room
temperature [18]. The mixture was filtered and washed with
dichloromethane . Then the solvent was evaporated under
reduced pressure to give azo Schiff base 3 as a red solid which
was recrystalized from ethanol 95% (1.28 g, 91 %)


               Building a Structure Centric Community for Chemists
How Many Chemical Names?
“She had the drive to derive success in any
venture and was well versed in Karate.
When the man in the tartan shirt
approached her with a dagger in his hand
she spat in his face, took the stance of a
commando and took advantage of his
shock to release the dagger from his grip
and causing him to recoil. He went home
and took an aspirin after the beating.”
          Building a Structure Centric Community for Chemists
How Many Chemical Names?
“She had the drive to derive success in any
venture and was well versed in Karate.
When the man in the tartan shirt
approached her with a dagger in his hand
she spat in his face, took the stance of a
commando and took advantage of his
shock to release the dagger from his grip
and causing him to recoil. He went home
and took an aspirin after the beating.”
          Building a Structure Centric Community for Chemists
ChemMantis

Chemical Markup And Nomenclature Transformation
Integrated System




           Building a Structure Centric Community for Chemists
Making Open Access Articles Searchable
                         Proof of Concept
Can we HOST Chemistry Open Access articles on
ChemSpider and add-value
Can we identify chemical names in Open Access articles
in a user-friendly manner
Can we convert names to structures in Open-Access
articles and expand ChemSpider and provide structure
searching of Open Access chemistry articles?
Can we provide an environment for chemists to mark-up
their own articles and crowd-source markup of an
archive?

             Building a Structure Centric Community for Chemists
Document markup

ChemSpider now hosting Open Access articles from
MDPI, Molecular Diversity Preservation International
Hosting the Molbank collection at present




             Building a Structure Centric Community for Chemists
A Standard for Document Markup?

NLM-DTD: National Library of Medicine; Document
Type Definition
Approved markup definitions to apply to journal
articles – extended as necessary for our purposes




            Building a Structure Centric Community for Chemists
NLM/DTD markup




Building a Structure Centric Community for Chemists
Chemistry and Biology



Menus can be extended as necessary




            Building a Structure Centric Community for Chemists
Document markup




Building a Structure Centric Community for Chemists
Markup – 3 seconds!




Building a Structure Centric Community for Chemists
On the fly conversion




Building a Structure Centric Community for Chemists
Shorthand Formulae Supported




 Building a Structure Centric Community for Chemists
One Click to more Info…




Building a Structure Centric Community for Chemists
Structure Image Conversion




Building a Structure Centric Community for Chemists
Two Seconds Later




Building a Structure Centric Community for Chemists
Not Always Perfect….




Building a Structure Centric Community for Chemists
A Platform for Markup

Can we provide a platform for document markup for
chemists?
Workflow:
  Upload word docs, RTF files or point to HTML and load
  Apply entity extraction, convert names to structures, mark-up
  automatically and ask for user participation
  Publish final version with NLM-DTD markup
  Deposit all structures on ChemSpider under embargo and
  wait for article DOI to release



              Building a Structure Centric Community for Chemists
Challenges

Computer software can generate chemical names better
than the majority of chemists
The majority of chemical names are generated by
humans, and Incorrect – convert to the wrong structure
or are ambiguous
One name, Multiple Structures




             Building a Structure Centric Community for Chemists
Names and Structures

Dichloroacetone




Trichloromethylsilane




             Building a Structure Centric Community for Chemists
Ambiguity




Building a Structure Centric Community for Chemists
Ambiguity in Abbreviations - DPA




    Building a Structure Centric Community for Chemists
Ambiguity in Abbreviations - THF




    Building a Structure Centric Community for Chemists
Import is Easy

Make articles Public/Private (embargo date soon)
Auto-markup and check by user




             Building a Structure Centric Community for Chemists
IUPAC PAC Articles




Building a Structure Centric Community for Chemists
Supports Word .DOC, HTML, RTF




    Building a Structure Centric Community for Chemists
Drexel University Documents




Building a Structure Centric Community for Chemists
Drexel University Documents




Building a Structure Centric Community for Chemists
Drexel University Documents




Building a Structure Centric Community for Chemists
Patents




Building a Structure Centric Community for Chemists
Single Configuration File defines entities
for markup
Algorithms can be built for certain
entities but the majority are dictionaries
– vendors, Phys Properties, Analytical
We can extend our system to support
your needs based on dictionaries – what
does NPG need/not need?



              Building a Structure Centric Community for Chemists
Nature Publications




Building a Structure Centric Community for Chemists
Entity Balloons

Structures are the
language of chemistry
Show structures to
chemists and search/link
from there




             Building a Structure Centric Community for Chemists
Other Dictionaries - Species

We are considering
  Bacteria
  Fungi
  Enzymes
  Viruses
  PDB codes….




            Building a Structure Centric Community for Chemists
Integrations Out to Other Sources




   Building a Structure Centric Community for Chemists
Integrations Out to Other Sources




   Building a Structure Centric Community for Chemists
Reactions




Building a Structure Centric Community for Chemists
Manual Curation is Always Necessary




       Building a Structure Centric Community for Chemists
Text-Indexing and ChemSpider?

ChemSpider text-indexes almost 500,000 Open Access
and Free Access articles




Collection is growing and more publishers have already
agreed. Including theses in the future.

             Building a Structure Centric Community for Chemists
Open Access Literature Search




Building a Structure Centric Community for Chemists
Conclusions

The quality of structure-based data online should
always be questioned – that includes ChemSpider
Data on ChemSpider are being added and curated on a
daily basis but we need more eyeballs helping always
ChemSpider has a large validated structure-name
dictionary
Chemical name extraction and document markup is
very enabling



             Building a Structure Centric Community for Chemists
Oops…




Building a Structure Centric Community for Chemists

More Related Content

What's hot

Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literaturepetermurrayrust
 
High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHpetermurrayrust
 
Asking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismAsking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismpetermurrayrust
 
ContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC DigifestContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC Digifestpetermurrayrust
 
Text and Data Mining explained at FTDM
Text and Data Mining explained at FTDMText and Data Mining explained at FTDM
Text and Data Mining explained at FTDMpetermurrayrust
 
A Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and WikidataA Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and Wikidatapetermurrayrust
 
Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications Trish Whetzel
 
High throughput mining of the plant-science literature
High throughput mining of the plant-science literatureHigh throughput mining of the plant-science literature
High throughput mining of the plant-science literaturepetermurrayrust
 
Content Mining of Science in Europe
Content Mining of Science in EuropeContent Mining of Science in Europe
Content Mining of Science in Europepetermurrayrust
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectivepetermurrayrust
 

What's hot (18)

Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literature
 
Cochrane workshop2016
Cochrane workshop2016Cochrane workshop2016
Cochrane workshop2016
 
Navigating the Complex Web of Chemistry Using ChemSpider
Navigating the Complex Web of Chemistry Using ChemSpiderNavigating the Complex Web of Chemistry Using ChemSpider
Navigating the Complex Web of Chemistry Using ChemSpider
 
High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIH
 
Mining public domain data as a basis for drug repurposing
Mining public domain data as a basis for drug repurposingMining public domain data as a basis for drug repurposing
Mining public domain data as a basis for drug repurposing
 
Dealing with the complex challenge of managing diverse analytical chemistry d...
Dealing with the complex challenge of managing diverse analytical chemistry d...Dealing with the complex challenge of managing diverse analytical chemistry d...
Dealing with the complex challenge of managing diverse analytical chemistry d...
 
Asking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismAsking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolism
 
Why open drug discovery needs four simple rules for licensing data and models
Why open drug discovery needs four simple rules for licensing data and modelsWhy open drug discovery needs four simple rules for licensing data and models
Why open drug discovery needs four simple rules for licensing data and models
 
ContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC DigifestContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC Digifest
 
E print servers
E print serversE print servers
E print servers
 
Structure representations in public chemistry databases: The challenges of va...
Structure representations in public chemistry databases: The challenges of va...Structure representations in public chemistry databases: The challenges of va...
Structure representations in public chemistry databases: The challenges of va...
 
Text and Data Mining explained at FTDM
Text and Data Mining explained at FTDMText and Data Mining explained at FTDM
Text and Data Mining explained at FTDM
 
A Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and WikidataA Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and Wikidata
 
Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications
 
High throughput mining of the plant-science literature
High throughput mining of the plant-science literatureHigh throughput mining of the plant-science literature
High throughput mining of the plant-science literature
 
open access in Science
open access in Scienceopen access in Science
open access in Science
 
Content Mining of Science in Europe
Content Mining of Science in EuropeContent Mining of Science in Europe
Content Mining of Science in Europe
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspective
 

Similar to Crowdsourcing Open Chemistry Research

Whitney Symposium Lecturejune 2008 1220331644496491 9
Whitney Symposium Lecturejune 2008 1220331644496491 9Whitney Symposium Lecturejune 2008 1220331644496491 9
Whitney Symposium Lecturejune 2008 1220331644496491 9Scott Conner
 

Similar to Crowdsourcing Open Chemistry Research (20)

Whitney Symposium Lecturejune 2008 1220331644496491 9
Whitney Symposium Lecturejune 2008 1220331644496491 9Whitney Symposium Lecturejune 2008 1220331644496491 9
Whitney Symposium Lecturejune 2008 1220331644496491 9
 
ChemSpider Overview SLides August 2007
ChemSpider Overview SLides August 2007ChemSpider Overview SLides August 2007
ChemSpider Overview SLides August 2007
 
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
 
ChemSpider and How The Wisdom Of The Crowds Can Improve The Quality Of ...
ChemSpider  and How The Wisdom Of The  Crowds  Can  Improve The  Quality Of  ...ChemSpider  and How The Wisdom Of The  Crowds  Can  Improve The  Quality Of  ...
ChemSpider and How The Wisdom Of The Crowds Can Improve The Quality Of ...
 
Connecting Chemists to the Internet Through ChemSpider
Connecting Chemists to the Internet Through ChemSpiderConnecting Chemists to the Internet Through ChemSpider
Connecting Chemists to the Internet Through ChemSpider
 
Crawling Across the Web of Chemistry Using ChemSpider
Crawling Across the Web of Chemistry Using ChemSpider Crawling Across the Web of Chemistry Using ChemSpider
Crawling Across the Web of Chemistry Using ChemSpider
 
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
 
AZ of Chemspider February 2011
AZ of Chemspider February 2011AZ of Chemspider February 2011
AZ of Chemspider February 2011
 
ChemSpider – The Vision and Challenges Associated with Building a Free Online...
ChemSpider – The Vision and Challenges Associated with Building a Free Online...ChemSpider – The Vision and Challenges Associated with Building a Free Online...
ChemSpider – The Vision and Challenges Associated with Building a Free Online...
 
Why Chemistry and the Web Will Benefit from a ChemSpider
Why Chemistry and the Web Will Benefit from a ChemSpiderWhy Chemistry and the Web Will Benefit from a ChemSpider
Why Chemistry and the Web Will Benefit from a ChemSpider
 
ChemSpider Presentation At University Of Toronto
ChemSpider Presentation At University Of TorontoChemSpider Presentation At University Of Toronto
ChemSpider Presentation At University Of Toronto
 
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
 
Taming The Wild West Of Internet Based Chemistry You Can Help
Taming The Wild West Of Internet Based Chemistry You Can HelpTaming The Wild West Of Internet Based Chemistry You Can Help
Taming The Wild West Of Internet Based Chemistry You Can Help
 
How Community Crowdsourcing and Social Networking is Helping to Build a Quali...
How Community Crowdsourcing and Social Networking is Helping to Build a Quali...How Community Crowdsourcing and Social Networking is Helping to Build a Quali...
How Community Crowdsourcing and Social Networking is Helping to Build a Quali...
 
RSC ChemSpider – Building An Internet Based Community For Chemists
RSC ChemSpider – Building An Internet Based Community For ChemistsRSC ChemSpider – Building An Internet Based Community For Chemists
RSC ChemSpider – Building An Internet Based Community For Chemists
 
Integrating and curating internet based chemistry resources to serve life sci...
Integrating and curating internet based chemistry resources to serve life sci...Integrating and curating internet based chemistry resources to serve life sci...
Integrating and curating internet based chemistry resources to serve life sci...
 
Building A Community Resource For The Life Sciences
Building A Community Resource For The Life SciencesBuilding A Community Resource For The Life Sciences
Building A Community Resource For The Life Sciences
 
RSC ChemSpider is the online chemistry database where community contributions...
RSC ChemSpider is the online chemistry database where community contributions...RSC ChemSpider is the online chemistry database where community contributions...
RSC ChemSpider is the online chemistry database where community contributions...
 
ChemSpider -Connecting and Curating Online Chemistry Resources
ChemSpider -Connecting and Curating Online Chemistry ResourcesChemSpider -Connecting and Curating Online Chemistry Resources
ChemSpider -Connecting and Curating Online Chemistry Resources
 
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
 

Recently uploaded

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Recently uploaded (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

Crowdsourcing Open Chemistry Research

  • 1. Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry Antony Williams
  • 2. Imagine a time when …. The internet is searchable by chemical structure and substructure (e.g.Wikipedia, Google Scholar) Chemistry articles are indexed and searchable by a free online service The web is linked together through the “language of chemistry” Publicly funded research data can be shared and discussed in the Open, maybe as ONS? Cheminformatics has as much of a public face as bioinformatics Building a Structure Centric Community for Chemists
  • 3. ChemSpider - A Search Engine for Chemists Questions a chemist might ask… What is the melting point of n-butanol? What is the chemical structure of Xanax? Chemically, what is phenolphthalein? What are the stereocenters of cholesterol? Where can I find publications about xylene? What are the different trade names for Ketoconazole? What is the NMR spectrum of Aspirin? What are the safety handling issues for Thymol Blue? ChemSpider can answer all of these questions Building a Structure Centric Community for Chemists
  • 4. What is a Structure? Ask a computer…ask a chemist Building a Structure Centric Community for Chemists
  • 5. Tell Me About Glutathione Building a Structure Centric Community for Chemists
  • 6. Tell Me About Glutathione Building a Structure Centric Community for Chemists
  • 7. Tell Me About Glutathione Building a Structure Centric Community for Chemists
  • 8. Tell Me About Glutathione Building a Structure Centric Community for Chemists
  • 9. Tell Me About Glutathione Building a Structure Centric Community for Chemists
  • 10. Tell Me About Glutathione Building a Structure Centric Community for Chemists
  • 11. Link outs Building a Structure Centric Community for Chemists
  • 12. Links out to KEGG Kyoto Encyclopedia of Genes and Genomes Building a Structure Centric Community for Chemists
  • 13. How many names does a compound have? Building a Structure Centric Community for Chemists
  • 14. ChemSpider Data Content Over 21.5 million unique chemical structures from ca. 150 data sources Online Databases –PubChem, Drugbank, KEGG, Wikipedia Literature – PubMed, J Het Chem, Nature, RSC, Open Access Chemical Vendors – over 40 different vendors and growing Personal Depositions – individual contributions Content database vendors Analytical data collections Patents Web scraping Content is linked back to the original data sources Building a Structure Centric Community for Chemists
  • 15. Other Searches What compounds have a mass of 300+/-0.001? or search a combination of intrinsic/predicted properties Building a Structure Centric Community for Chemists
  • 16. Other Searches Building a Structure Centric Community for Chemists
  • 17. Complex Search Building a Structure Centric Community for Chemists
  • 18. The Quality of Data Online… Aggregating data opens up quality issues Structure-identifier associations are “dirty” Structures are COMMONLY incorrect Manual curation of small databases is enough work – what about millions of structures? Structures are far from perfect. What is a “correct structure”? Full stereochemistry? Historical timeline of structure? Who is the authority? Building a Structure Centric Community for Chemists
  • 19. Who holds THE Quality Authority? Chemical Abstracts Service is the structural authority today. 1400 employees, world standard in chemistry information 101 years of knowledge, process and expertise. How can an online, free access system peacefully co- exist with the authority? Building a Structure Centric Community for Chemists
  • 20. Quality is a Major Issue- Search Butanol OLD EXAMPLE..now fixed Building a Structure Centric Community for Chemists
  • 21. Wikipedia Chemistry Curation project Only ca. 5000 organic structures, 7000 total structures Almost a year of work so far for a team of 6 people Many errors removed in the process. Curation process is a daily event for users/depositors Slow and torturous process http://en.wikipedia.org/wiki/Talk:Tacrolimus# IUPAC_Name_and_structure Building a Structure Centric Community for Chemists
  • 22. Wikipedia Curation Looking for self-consistency across a Wikipedia Page Primary key is the article TITLE The chemical shown needs to match the title Cyclic self-consistency – and decisions must get made Building a Structure Centric Community for Chemists
  • 23. Viagra or Sildenafil Building a Structure Centric Community for Chemists
  • 24. Other issues… Building a Structure Centric Community for Chemists
  • 25. Charges Building a Structure Centric Community for Chemists
  • 26. Sugars – Machine Readable vs Aesthetics Haworth Stereo Fischer Building a Structure Centric Community for Chemists
  • 27. Wikipedia – Crowdsourcing Chemistry Building a Structure Centric Community for Chemists
  • 28. Thymol Blue on ChemSpider Data online includes: UV-vis spectrum Measured experimental properties Link to Wikipedia article Links to chromatography details Multiple identifiers/trade names etc. Links to vendors/suppliers/other databases Safety information http://www.chemspider.com/q/thymol%20blue Building a Structure Centric Community for Chemists
  • 29. Differences between ChemSpider/Wikipedia ChemSpider Wikipedia >21 million unique structures ~5000 organics, 2000 others Complex queries – Properties, Text Text, structure/substructure, OA publishers, Data Sources, … Prediction of properties No Analytical Data No, but links. Active depositors/curators – 30 Active editors > 50 (?) 6000 people/day; 1900 registered ???? Compound monographs linked Detailed compound monographs Building a Structure Centric Community for Chemists
  • 30. Differences between Wikipedia/ChemSpider Wikipedia ChemSpider Supported by tried and tested Primarily Microsoft .NET Media-Wiki platform. technologies with OS components Established infrastructure and “Out of a basement” on three Wikipedia Foundation Team servers and 5 volunteers Chemistry is a subset of the ‘Pedia Chemistry is the focus of ‘Spider GFL licensing for everything Mixed “licensing” Strong team of WP:Chem Growing team of advocates, advocates, curators and admins curators and users Worldwide reputation as quality Growing reputation as focused on source – good and bad quality Building a Structure Centric Community for Chemists
  • 31. Crowd-sourcing Curation How to curate data for millions of structures? Robot processes can clean up depositions Search for Chloride and check molecular formula for Cl Check for stereochemistry and remove names with stereo Provide a simple-to-use platform to curate, annotate and tag data Provide curator administration to prevent vandalism (Veropedia) Building a Structure Centric Community for Chemists
  • 32. Post Comments Anyone can “Post Comments” associated with a structure. To curate data we require login to track Building a Structure Centric Community for Chemists
  • 33. Multi-level Curation and Approval Building a Structure Centric Community for Chemists
  • 34. Crowd-sourcing Chemistry Crowd-sourced curation: identify and tag errors, edit names, synonyms, identify records for deprecation ALSO Crowd-sourced deposition: anyone can deposit data (structures, text, images, analytical data) Building a Structure Centric Community for Chemists
  • 35. DailyMed Building a Structure Centric Community for Chemists
  • 36. Quality of Structures Building a Structure Centric Community for Chemists
  • 37. Quality of Structures!!! Building a Structure Centric Community for Chemists
  • 38. Structure-Centric We want to search “information” by structure, substructure, similarity of structure Specific focus on Open Chemistry at present Standard approaches would be: Identify chemical names “entity extraction” Convert chemical names to structures and index ChemSpider has a validated dictionary of structure-name pairs Use name extraction, name-conversion and dictionary look- up. THEN curate. Building a Structure Centric Community for Chemists
  • 39. “Entity Extraction” Rule-based recognition of systematic names: Use a lexeme of name fragments Rules for identifying bounds of a name Look-up dictionary: Drug Names Trivial Names Numbers : Registry IDs, EINECS/ELINCS Massive look-up dictionary of validated identifiers on ChemSpider Building a Structure Centric Community for Chemists
  • 40. Building a Structure Centric Community for Chemists
  • 41. Name Recognition Azo aldehyde 2 was synthesized according to a reported method [17]. To a stirred solution of azo aldehyde 2 (1.08 g, 3.76 mmol ) in dry CH2Cl2 (30.00 mL) at 0 oC were successively added (3,4-diaminophenyl)phenyl methanone 1(0.40 g, 1.88 mmol) and a excces of anhydrous MgSO4 (2.00 g,16.67 mmol) . The resulting mixture was stirred for 6 hours at room temperature [18]. The mixture was filtered and washed with dichloromethane . Then the solvent was evaporated under reduced pressure to give azo Schiff base 3 as a red solid which was recrystalized from ethanol 95% (1.28 g, 91 %) Building a Structure Centric Community for Chemists
  • 42. Name Recognition Azo aldehyde 2 was synthesized according to a reported method [17]. To a stirred solution of azo aldehyde 2 (1.08 g, 3.76 mmol ) in dry CH2Cl2 (30.00 mL) at 0 oC were successively added (3,4-diaminophenyl)phenyl methanone 1(0.40 g, 1.88 mmol) and a excess of anhydrous MgSO4 (2.00 g,16.67 mmol) . The resulting mixture was stirred for 6 hours at room temperature [18]. The mixture was filtered and washed with dichloromethane . Then the solvent was evaporated under reduced pressure to give azo Schiff base 3 as a red solid which was recrystalized from ethanol 95% (1.28 g, 91 %) Building a Structure Centric Community for Chemists
  • 43. How Many Chemical Names? “She had the drive to derive success in any venture and was well versed in Karate. When the man in the tartan shirt approached her with a dagger in his hand she spat in his face, took the stance of a commando and took advantage of his shock to release the dagger from his grip and causing him to recoil. He went home and took an aspirin after the beating.” Building a Structure Centric Community for Chemists
  • 44. How Many Chemical Names? “She had the drive to derive success in any venture and was well versed in Karate. When the man in the tartan shirt approached her with a dagger in his hand she spat in his face, took the stance of a commando and took advantage of his shock to release the dagger from his grip and causing him to recoil. He went home and took an aspirin after the beating.” Building a Structure Centric Community for Chemists
  • 45. ChemMantis Chemical Markup And Nomenclature Transformation Integrated System Building a Structure Centric Community for Chemists
  • 46. Making Open Access Articles Searchable Proof of Concept Can we HOST Chemistry Open Access articles on ChemSpider and add-value Can we identify chemical names in Open Access articles in a user-friendly manner Can we convert names to structures in Open-Access articles and expand ChemSpider and provide structure searching of Open Access chemistry articles? Can we provide an environment for chemists to mark-up their own articles and crowd-source markup of an archive? Building a Structure Centric Community for Chemists
  • 47. Document markup ChemSpider now hosting Open Access articles from MDPI, Molecular Diversity Preservation International Hosting the Molbank collection at present Building a Structure Centric Community for Chemists
  • 48. A Standard for Document Markup? NLM-DTD: National Library of Medicine; Document Type Definition Approved markup definitions to apply to journal articles – extended as necessary for our purposes Building a Structure Centric Community for Chemists
  • 49. NLM/DTD markup Building a Structure Centric Community for Chemists
  • 50. Chemistry and Biology Menus can be extended as necessary Building a Structure Centric Community for Chemists
  • 51. Document markup Building a Structure Centric Community for Chemists
  • 52. Markup – 3 seconds! Building a Structure Centric Community for Chemists
  • 53. On the fly conversion Building a Structure Centric Community for Chemists
  • 54. Shorthand Formulae Supported Building a Structure Centric Community for Chemists
  • 55. One Click to more Info… Building a Structure Centric Community for Chemists
  • 56. Structure Image Conversion Building a Structure Centric Community for Chemists
  • 57. Two Seconds Later Building a Structure Centric Community for Chemists
  • 58. Not Always Perfect…. Building a Structure Centric Community for Chemists
  • 59. A Platform for Markup Can we provide a platform for document markup for chemists? Workflow: Upload word docs, RTF files or point to HTML and load Apply entity extraction, convert names to structures, mark-up automatically and ask for user participation Publish final version with NLM-DTD markup Deposit all structures on ChemSpider under embargo and wait for article DOI to release Building a Structure Centric Community for Chemists
  • 60. Challenges Computer software can generate chemical names better than the majority of chemists The majority of chemical names are generated by humans, and Incorrect – convert to the wrong structure or are ambiguous One name, Multiple Structures Building a Structure Centric Community for Chemists
  • 61. Names and Structures Dichloroacetone Trichloromethylsilane Building a Structure Centric Community for Chemists
  • 62. Ambiguity Building a Structure Centric Community for Chemists
  • 63. Ambiguity in Abbreviations - DPA Building a Structure Centric Community for Chemists
  • 64. Ambiguity in Abbreviations - THF Building a Structure Centric Community for Chemists
  • 65. Import is Easy Make articles Public/Private (embargo date soon) Auto-markup and check by user Building a Structure Centric Community for Chemists
  • 66. IUPAC PAC Articles Building a Structure Centric Community for Chemists
  • 67. Supports Word .DOC, HTML, RTF Building a Structure Centric Community for Chemists
  • 68. Drexel University Documents Building a Structure Centric Community for Chemists
  • 69. Drexel University Documents Building a Structure Centric Community for Chemists
  • 70. Drexel University Documents Building a Structure Centric Community for Chemists
  • 71. Patents Building a Structure Centric Community for Chemists
  • 72. Single Configuration File defines entities for markup Algorithms can be built for certain entities but the majority are dictionaries – vendors, Phys Properties, Analytical We can extend our system to support your needs based on dictionaries – what does NPG need/not need? Building a Structure Centric Community for Chemists
  • 73. Nature Publications Building a Structure Centric Community for Chemists
  • 74. Entity Balloons Structures are the language of chemistry Show structures to chemists and search/link from there Building a Structure Centric Community for Chemists
  • 75. Other Dictionaries - Species We are considering Bacteria Fungi Enzymes Viruses PDB codes…. Building a Structure Centric Community for Chemists
  • 76. Integrations Out to Other Sources Building a Structure Centric Community for Chemists
  • 77. Integrations Out to Other Sources Building a Structure Centric Community for Chemists
  • 78. Reactions Building a Structure Centric Community for Chemists
  • 79. Manual Curation is Always Necessary Building a Structure Centric Community for Chemists
  • 80. Text-Indexing and ChemSpider? ChemSpider text-indexes almost 500,000 Open Access and Free Access articles Collection is growing and more publishers have already agreed. Including theses in the future. Building a Structure Centric Community for Chemists
  • 81. Open Access Literature Search Building a Structure Centric Community for Chemists
  • 82. Conclusions The quality of structure-based data online should always be questioned – that includes ChemSpider Data on ChemSpider are being added and curated on a daily basis but we need more eyeballs helping always ChemSpider has a large validated structure-name dictionary Chemical name extraction and document markup is very enabling Building a Structure Centric Community for Chemists
  • 83. Oops… Building a Structure Centric Community for Chemists