SlideShare a Scribd company logo
1 of 1
Innovative Research for a Sustainable Future
www.epa.gov/research
Integrating an Analytical Methods and Mass Spectral Database with
Cheminformatics Capabilities
Gregory Janesch1, Erik Carr1, Vicente Samano2, Brian Meyer2 and Antony Williams3
1. ORAU Student Services Contractor to Center for Computational Toxicology & Exposure, Office of Research & Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, USA
2. Senior Environmental Employment Program, US Environmental Protection Agency, Research Triangle Park, USA
3. Center for Computational Toxicology & Exposure, Office of Research & Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, USA
`
ACS West
San Francisco, CA
August 13-17, 2023
There are three kinds of data contained within the database.
- Fact sheets are results-oriented documents with data associated
with one or more substances including basic descriptions of health
effects to monographs with NMR, Raman, and IR spectra.
- Methods document an end-to-end analytical procedure for one or
more substances, sometimes 100s of chemicals. The documents
are curated to extract the chemical compounds and then
annotated with information such as matrix and methodologies.
- Spectra, in the form of lists of m/z-intensity pairs and parameters.
In addition to the above information, records have assorted
metadata stored in the database. These data include information
such as experimental conditions, authors, a synopsis for the method
or fact sheet, and other data depending on what kind of record it is.
Data are open access and are derived from a variety of sources.
These include online spectral databases, vendor methods, research
groups, EPA databases and other government agencies.
At the time of writing the database contains approximately:
- 165,000 spectra (plus 600,000 externally linked spectra)
- >700 fact sheets
- >3300 methods
General Searching
Data
Spectrum Search
Description
A large variety of sources for spectra, documented analytical
procedures and methods, and other associated documentation exist
and are, in theory, easily available with the usual web search.
However, these sources are largely isolated from each other, not
easy to find via general searches because of inconsistencies in
chemical names and identifiers and then are highly varied in format.
To address these challenges, the Analytical Methods and Open
Spectra (AMOS) web application has been developed. AMOS is a
database and associated web-based application containing several
types of records searchable by common identifiers known to
chemists (i.e., CASRNs, InChI Keys and chemical names).
The authors thank the data curation team for their rigorous work in
annotating and identifying information in the records. Chemical data
extraction, curation and annotation is an essential part of this work.
Primary search functionality
searches all records for a
single chemical substance.
One half of the page (Fig.1)
shows the searched
compound (assuming a
match) and yields a table of
records containing that
substance, the data source,
associated methodology, and
a short description of the
record itself.
Selecting a row in that table
allows for viewing the
contents of that record more
closely, whether opening an
analytical method or
displaying a spectrum.
For spectral data, an
additional search option is
available. If a mass range,
methodology, and spectrum
(as x,y pairs) are supplied,
matching spectra with that
mass and methodology,
ranked by their similarity to
the user-supplied spectrum
will be returned. See Fig. 2.
The top table lists the
associated substance for
the found spectrum (with
associated DTXSID), the
similarity of that spectrum,
and a description of that
spectrum. Below that table
is an interactive plot of the
overlap of the two spectra.
Method Searches
AMOS contains two functions for searching for methods. One is a simple
table that lists all methods in the database (not pictured). This list can be
filtered by several fields including matrix, analyte, and method name,
allowing for quick discovery of methods that cover a known topic.
The other, shown below, is a search for methods containing similar
substances, thereby providing a starting point even for chemicals without
methods. A substance is searched for and if methods exist they are
returned. If there are no existing methods for that chemical then AMOS
returns all methods which contain at least one substance with a
sufficiently high Tanimoto structural similarity coefficient. This can be
especially useful in cases where a substance does not have any methods
associated with it at all – in the example below (see Fig. 3), the drug was
only available starting in 2015, so there has been relatively little time to
develop and publish methods for it.
Acknowledgements
Disclaimers
This tool is currently internal to the US- EPA and still under development.
Plans to release this to the public have not been finalized, but the process
is hoped to be complete by early 2024.The data used in this application
have not been thoroughly reviewed by the EPA and the user needs to
exercise judgement in their use of the results.
The views expressed in this poster are those of the authors and do not
necessarily reflect the views or policies of the U.S. EPA
Figure 1: The list of methods and
LC-MS or GC-MS spectra
associated with perfluorooctane-
sulfonic acid (PFOS).
Figure 2: A spectral similarity search
result includes the similarity match for
spectra and the list of associated
chemical compounds.
Figure 3: A search for a chemical with no matching methods then
provides the associated structure to a Tanimoto structural similarity
search to return methods with similar structures contained in them.

More Related Content

Similar to Integrating an Analytical Methods and Mass Spectral Database with Cheminformatics Capabilities

The US-EPA CompTox Chemicals Dashboard – an online data integration hub suppo...
The US-EPA CompTox Chemicals Dashboard – an online data integration hub suppo...The US-EPA CompTox Chemicals Dashboard – an online data integration hub suppo...
The US-EPA CompTox Chemicals Dashboard – an online data integration hub suppo...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Using Cheminformatics Approaches to Develop a Structure Searchable Database o...
Using Cheminformatics Approaches to Develop a Structure Searchable Database o...Using Cheminformatics Approaches to Develop a Structure Searchable Database o...
Using Cheminformatics Approaches to Develop a Structure Searchable Database o...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
An integrated data hub for per- and polyfluoroalkyl (PFAS) chemicals to suppo...
An integrated data hub for per- and polyfluoroalkyl (PFAS) chemicals to suppo...An integrated data hub for per- and polyfluoroalkyl (PFAS) chemicals to suppo...
An integrated data hub for per- and polyfluoroalkyl (PFAS) chemicals to suppo...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Predicting active compounds for lung cancer based on quantitative structure-a...
Predicting active compounds for lung cancer based on quantitative structure-a...Predicting active compounds for lung cancer based on quantitative structure-a...
Predicting active compounds for lung cancer based on quantitative structure-a...
IJECEIAES
 
Chemoinformatics—an introduction for computer scientists
Chemoinformatics—an introduction for computer scientistsChemoinformatics—an introduction for computer scientists
Chemoinformatics—an introduction for computer scientists
unyil96
 
Assessing Drug Safety Using AI
Assessing Drug Safety Using AIAssessing Drug Safety Using AI
Assessing Drug Safety Using AI
Databricks
 
Developing tools for high resolution mass spectrometry-based screening via th...
Developing tools for high resolution mass spectrometry-based screening via th...Developing tools for high resolution mass spectrometry-based screening via th...
Developing tools for high resolution mass spectrometry-based screening via th...
Andrew McEachran
 
Systematic reviews of topical fluorides for dental caries: a review of report...
Systematic reviews of topical fluorides for dental caries: a review of report...Systematic reviews of topical fluorides for dental caries: a review of report...
Systematic reviews of topical fluorides for dental caries: a review of report...
cathykr
 

Similar to Integrating an Analytical Methods and Mass Spectral Database with Cheminformatics Capabilities (20)

The US-EPA CompTox Chemicals Dashboard – an online data integration hub suppo...
The US-EPA CompTox Chemicals Dashboard – an online data integration hub suppo...The US-EPA CompTox Chemicals Dashboard – an online data integration hub suppo...
The US-EPA CompTox Chemicals Dashboard – an online data integration hub suppo...
 
Using Cheminformatics Approaches to Develop a Structure Searchable Database o...
Using Cheminformatics Approaches to Develop a Structure Searchable Database o...Using Cheminformatics Approaches to Develop a Structure Searchable Database o...
Using Cheminformatics Approaches to Develop a Structure Searchable Database o...
 
An integrated data hub for per- and polyfluoroalkyl (PFAS) chemicals to suppo...
An integrated data hub for per- and polyfluoroalkyl (PFAS) chemicals to suppo...An integrated data hub for per- and polyfluoroalkyl (PFAS) chemicals to suppo...
An integrated data hub for per- and polyfluoroalkyl (PFAS) chemicals to suppo...
 
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
 
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
 
How to handle discrepancies while you collect data for systemic review – pubrica
How to handle discrepancies while you collect data for systemic review – pubricaHow to handle discrepancies while you collect data for systemic review – pubrica
How to handle discrepancies while you collect data for systemic review – pubrica
 
COMPUTATIONAL TOOLS FOR PREDICTION OF NUCLEAR RECEPTOR MEDIATED EFFECTS
COMPUTATIONAL TOOLS FOR PREDICTION OF NUCLEAR RECEPTOR MEDIATED EFFECTSCOMPUTATIONAL TOOLS FOR PREDICTION OF NUCLEAR RECEPTOR MEDIATED EFFECTS
COMPUTATIONAL TOOLS FOR PREDICTION OF NUCLEAR RECEPTOR MEDIATED EFFECTS
 
The Future of Computational Models for Predicting Human Toxicities
The Future of Computational Models for Predicting Human ToxicitiesThe Future of Computational Models for Predicting Human Toxicities
The Future of Computational Models for Predicting Human Toxicities
 
Hdat pdf-draft
Hdat pdf-draftHdat pdf-draft
Hdat pdf-draft
 
A Systematic Literature Review On Health Recommender Systems
A Systematic Literature Review On Health Recommender SystemsA Systematic Literature Review On Health Recommender Systems
A Systematic Literature Review On Health Recommender Systems
 
Predicting active compounds for lung cancer based on quantitative structure-a...
Predicting active compounds for lung cancer based on quantitative structure-a...Predicting active compounds for lung cancer based on quantitative structure-a...
Predicting active compounds for lung cancer based on quantitative structure-a...
 
Chemoinformatics—an introduction for computer scientists
Chemoinformatics—an introduction for computer scientistsChemoinformatics—an introduction for computer scientists
Chemoinformatics—an introduction for computer scientists
 
Assessing Drug Safety Using AI
Assessing Drug Safety Using AIAssessing Drug Safety Using AI
Assessing Drug Safety Using AI
 
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
 
Developing tools for high resolution mass spectrometry-based screening via th...
Developing tools for high resolution mass spectrometry-based screening via th...Developing tools for high resolution mass spectrometry-based screening via th...
Developing tools for high resolution mass spectrometry-based screening via th...
 
Pallavi gupta
Pallavi guptaPallavi gupta
Pallavi gupta
 
Systematic reviews of topical fluorides for dental caries: a review of report...
Systematic reviews of topical fluorides for dental caries: a review of report...Systematic reviews of topical fluorides for dental caries: a review of report...
Systematic reviews of topical fluorides for dental caries: a review of report...
 
4th Annual Advancing the Pace of Chemical Risk Assessment
4th Annual Advancing the Pace of Chemical Risk Assessment4th Annual Advancing the Pace of Chemical Risk Assessment
4th Annual Advancing the Pace of Chemical Risk Assessment
 
Embi cri review-2012-final
Embi cri review-2012-finalEmbi cri review-2012-final
Embi cri review-2012-final
 
A method for mining infrequent causal associations and its application in fin...
A method for mining infrequent causal associations and its application in fin...A method for mining infrequent causal associations and its application in fin...
A method for mining infrequent causal associations and its application in fin...
 

Recently uploaded

Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
Bhagirath Gogikar
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 

Recently uploaded (20)

Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 

Integrating an Analytical Methods and Mass Spectral Database with Cheminformatics Capabilities

  • 1. Innovative Research for a Sustainable Future www.epa.gov/research Integrating an Analytical Methods and Mass Spectral Database with Cheminformatics Capabilities Gregory Janesch1, Erik Carr1, Vicente Samano2, Brian Meyer2 and Antony Williams3 1. ORAU Student Services Contractor to Center for Computational Toxicology & Exposure, Office of Research & Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, USA 2. Senior Environmental Employment Program, US Environmental Protection Agency, Research Triangle Park, USA 3. Center for Computational Toxicology & Exposure, Office of Research & Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, USA ` ACS West San Francisco, CA August 13-17, 2023 There are three kinds of data contained within the database. - Fact sheets are results-oriented documents with data associated with one or more substances including basic descriptions of health effects to monographs with NMR, Raman, and IR spectra. - Methods document an end-to-end analytical procedure for one or more substances, sometimes 100s of chemicals. The documents are curated to extract the chemical compounds and then annotated with information such as matrix and methodologies. - Spectra, in the form of lists of m/z-intensity pairs and parameters. In addition to the above information, records have assorted metadata stored in the database. These data include information such as experimental conditions, authors, a synopsis for the method or fact sheet, and other data depending on what kind of record it is. Data are open access and are derived from a variety of sources. These include online spectral databases, vendor methods, research groups, EPA databases and other government agencies. At the time of writing the database contains approximately: - 165,000 spectra (plus 600,000 externally linked spectra) - >700 fact sheets - >3300 methods General Searching Data Spectrum Search Description A large variety of sources for spectra, documented analytical procedures and methods, and other associated documentation exist and are, in theory, easily available with the usual web search. However, these sources are largely isolated from each other, not easy to find via general searches because of inconsistencies in chemical names and identifiers and then are highly varied in format. To address these challenges, the Analytical Methods and Open Spectra (AMOS) web application has been developed. AMOS is a database and associated web-based application containing several types of records searchable by common identifiers known to chemists (i.e., CASRNs, InChI Keys and chemical names). The authors thank the data curation team for their rigorous work in annotating and identifying information in the records. Chemical data extraction, curation and annotation is an essential part of this work. Primary search functionality searches all records for a single chemical substance. One half of the page (Fig.1) shows the searched compound (assuming a match) and yields a table of records containing that substance, the data source, associated methodology, and a short description of the record itself. Selecting a row in that table allows for viewing the contents of that record more closely, whether opening an analytical method or displaying a spectrum. For spectral data, an additional search option is available. If a mass range, methodology, and spectrum (as x,y pairs) are supplied, matching spectra with that mass and methodology, ranked by their similarity to the user-supplied spectrum will be returned. See Fig. 2. The top table lists the associated substance for the found spectrum (with associated DTXSID), the similarity of that spectrum, and a description of that spectrum. Below that table is an interactive plot of the overlap of the two spectra. Method Searches AMOS contains two functions for searching for methods. One is a simple table that lists all methods in the database (not pictured). This list can be filtered by several fields including matrix, analyte, and method name, allowing for quick discovery of methods that cover a known topic. The other, shown below, is a search for methods containing similar substances, thereby providing a starting point even for chemicals without methods. A substance is searched for and if methods exist they are returned. If there are no existing methods for that chemical then AMOS returns all methods which contain at least one substance with a sufficiently high Tanimoto structural similarity coefficient. This can be especially useful in cases where a substance does not have any methods associated with it at all – in the example below (see Fig. 3), the drug was only available starting in 2015, so there has been relatively little time to develop and publish methods for it. Acknowledgements Disclaimers This tool is currently internal to the US- EPA and still under development. Plans to release this to the public have not been finalized, but the process is hoped to be complete by early 2024.The data used in this application have not been thoroughly reviewed by the EPA and the user needs to exercise judgement in their use of the results. The views expressed in this poster are those of the authors and do not necessarily reflect the views or policies of the U.S. EPA Figure 1: The list of methods and LC-MS or GC-MS spectra associated with perfluorooctane- sulfonic acid (PFOS). Figure 2: A spectral similarity search result includes the similarity match for spectra and the list of associated chemical compounds. Figure 3: A search for a chemical with no matching methods then provides the associated structure to a Tanimoto structural similarity search to return methods with similar structures contained in them.