SlideShare a Scribd company logo
1 of 14
Download to read offline
Link Analysis of Life Science Linked Data
1
Wei Hu1, Honglei Qiu1, and Michel Dumontier2
1State Key Laboratory for Novel Software Technology, Nanjing University, China
2Center for Biomedical Informatics Research, Stanford University
@micheldumontier::ISWC 2015
Linked Data offers links between
datasets, but they are often
incomplete and may contain
errors.
@micheldumontier::ISWC 20152
Network Analysis
• Network analysis has long been
used to study link structures
– The structure of the Web
– Network medicine: cellular
networks and implications
@micheldumontier::ISWC 20153
Power law is scale free
A graph demonstrates the small world
phenomenon, if its clustering coefficient is
significantly higher than that of a random
graph on the same node set, and if the graph
has a shorter average distance.
BTC2010
The clustering coefficient quantifies how close
its neighbors are to be a clique. The average
distance is the average shortest path length
between all nodes in the graph.
Dataset link analysis
(using RDF data model)
Entity link analysis
(using cross-references)
Term link analysis
(using ontology matching)
@micheldumontier::ISWC 20154
@micheldumontier::ISWC 2015
Linked Data for the Life Sciences
5
Bio2RDF is an open source project to unify the
representation and interlinking of biological data using RDF.
chemicals/drugs/formulations,
genomes/genes/proteins, domains
Interactions, complexes & pathways
animal models and phenotypes
Disease, genetic markers, treatments
Terminologies & publications
• Release 3 (June 2014)
• 35 datasets
• 11B RDF triples
• 1B entities
• 2K classes
• 4K properties
Dataset Links
@micheldumontier::ISWC 20156
Network Properties
1. Well linked
2. Hubs and authorities
3. small-world phenomenon
Average distance = 2.77 vs 6
Clustering coefficient = 0.22 vs
0.13
4. robust on systematic removal
of nodes
Entity Link Analysis
How well do entities link to each other?
• 76% entity links involve a special kind of RDF triples
– e.g. <kegg:D03455, kegg:x-drugbank, drugbank:DB00002>
– x-relations have under-specified semantics
• May be truly identical, may refer to another related entity …
• Degree distribution
– Some do not follow power law
• Exponent is too large (close to 5)
7
BTC2010
@micheldumontier::ISWC 2015
symmetry of entity links varies
between different pairs of datasets
• Over 99% of links are reciprocated in DrugBank-PharmGKB and
OMIM-HGNC
– Suggests link sharing and synchronization
• Only 58% of links in DrugBank-KEGG and 51% of OMIM-Orphanet
links are reciprocal
– Suggests incomplete mapping
• 28% of OMIM-Orphanet links are malposed
– Suggests variation in model (omim:Phenotype to orphanet:Disorder)
8 @micheldumontier::ISWC 2015
Transitivity Analysis:
Find mismatches and discover new links
@micheldumontier::ISWC 20159
Evaluation of Entity Matching
How accurate are current entity matching approaches?
• Built a benchmark from the reciprocal links between similarly-typed
entities
• Evaluated several entity matching approaches
– Label similarity: Levenstein, Jaro-Winkler, N-gram, Jaccard
– Machine learning: Linear regression, logistic regression with 5 properties
• Many-to-one links are difficult to be discovered
10 @micheldumontier::ISWC 2015
Term Link Analysis
How similar are the topics in the data network?
• Use ontology matching to generate term link graph
– Falcon-AO (linguistic matchers + structural matcher + synonyms)
• Created 83K class mappings, 1.5K object property mappings, and 858 data
property mappings
– Similarity threshold = 0.9
– Top-5 popular labels for classes and properties
• Significant overlap in topics, does not follow power law as in broader SW
11 @micheldumontier::ISWC 2015
Correlation of Link Graphs
To what degree are each of the three link graphs are correlated?
• Spearman’s rank correlation coefficient:
– Entity link graph  dataset pairs: entity links / entities
– Term link graph  dataset pairs: term mappings / terms
– Dataset link graph  dataset pairs: shortest path length
• All positively correlated
– Closer datasets in distance have more linked entities and terms
– Number of linked entities contributes little to overlap of topics
12 @micheldumontier::ISWC 2015
Summary of Findings
• Dataset, entity and term link graphs do not necessarily share the same
characteristics with the Hypertext / Semantic Web
– Degree distribution of entity links does not follow power law
– Data hubs
• A significant number of entities have been linked using x-relations, but
their intended semantics differs
– Classes are identical or equivalent  entity links represent logical equivalence
• Symmetric and transitive entity links do exist, but their utility is weakened
due to their small number
– Meanings of entity links may shift during transitive closure
• Only matching the labels of entities may fail, while combining different
properties and using simple learning algorithms achieve good accuracy
13 @micheldumontier::ISWC 2015
dumontierlab.com
michel.dumontier@stanford.edu
Website: http://dumontierlab.com
Presentations: http://slideshare.com/micheldumontier
14 @micheldumontier::ISWC 2015

More Related Content

What's hot

Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)Michel Dumontier
 
CEDAR work bench for metadata management
CEDAR work bench for metadata managementCEDAR work bench for metadata management
CEDAR work bench for metadata managementPistoia Alliance
 
Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...FAIRDOM
 
Generating Biomedical Hypotheses Using Semantic Web Technologies
Generating Biomedical Hypotheses Using Semantic Web TechnologiesGenerating Biomedical Hypotheses Using Semantic Web Technologies
Generating Biomedical Hypotheses Using Semantic Web TechnologiesMichel Dumontier
 
Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244Yasel Cruz
 
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)Gregor Hagedorn
 
dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019dkNET
 
OpenTox - an open community and framework supporting predictive toxicology an...
OpenTox - an open community and framework supporting predictive toxicology an...OpenTox - an open community and framework supporting predictive toxicology an...
OpenTox - an open community and framework supporting predictive toxicology an...Barry Hardy
 
FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_PresentationYatpang Cheung
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialDmitry Grapov
 
Nonadaptive mastermind algorithms for string and vector databases, with case ...
Nonadaptive mastermind algorithms for string and vector databases, with case ...Nonadaptive mastermind algorithms for string and vector databases, with case ...
Nonadaptive mastermind algorithms for string and vector databases, with case ...Ecway Technologies
 
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ... Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...Syed Ahmad Chan Bukhari, PhD
 
Open PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesSciBite Limited
 
CINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIRCINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIRCINECAProject
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataPaul Groth
 

What's hot (19)

Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
 
Towards a gold standard and regarding quality in public domain chemistry data...
Towards a gold standard and regarding quality in public domain chemistry data...Towards a gold standard and regarding quality in public domain chemistry data...
Towards a gold standard and regarding quality in public domain chemistry data...
 
CEDAR work bench for metadata management
CEDAR work bench for metadata managementCEDAR work bench for metadata management
CEDAR work bench for metadata management
 
Canadian health census to lod
Canadian health census to lodCanadian health census to lod
Canadian health census to lod
 
Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...
 
Generating Biomedical Hypotheses Using Semantic Web Technologies
Generating Biomedical Hypotheses Using Semantic Web TechnologiesGenerating Biomedical Hypotheses Using Semantic Web Technologies
Generating Biomedical Hypotheses Using Semantic Web Technologies
 
Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244
 
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
 
dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019
 
OpenTox - an open community and framework supporting predictive toxicology an...
OpenTox - an open community and framework supporting predictive toxicology an...OpenTox - an open community and framework supporting predictive toxicology an...
OpenTox - an open community and framework supporting predictive toxicology an...
 
BioNLPSADI
BioNLPSADIBioNLPSADI
BioNLPSADI
 
Hosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry dataHosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry data
 
FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_Presentation
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -Tutorial
 
Nonadaptive mastermind algorithms for string and vector databases, with case ...
Nonadaptive mastermind algorithms for string and vector databases, with case ...Nonadaptive mastermind algorithms for string and vector databases, with case ...
Nonadaptive mastermind algorithms for string and vector databases, with case ...
 
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ... Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 
Open PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future Challenges
 
CINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIRCINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIR
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
 

Similar to Link Analysis of Life Sciences Linked Data

Hamalt genetics based peer to-peer network architecture to encourage the coo...
Hamalt  genetics based peer to-peer network architecture to encourage the coo...Hamalt  genetics based peer to-peer network architecture to encourage the coo...
Hamalt genetics based peer to-peer network architecture to encourage the coo...csandit
 
HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...
HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...
HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...cscpconf
 
HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...
HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...
HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...csandit
 
Distributed Link Prediction in Large Scale Graphs using Apache Spark
Distributed Link Prediction in Large Scale Graphs using Apache SparkDistributed Link Prediction in Large Scale Graphs using Apache Spark
Distributed Link Prediction in Large Scale Graphs using Apache SparkAnastasios Theodosiou
 
An approach for transforming of relational databases to owl ontology
An approach for transforming of relational databases to owl ontologyAn approach for transforming of relational databases to owl ontology
An approach for transforming of relational databases to owl ontologyIJwest
 
IRJET- A Survey on Link Prediction Techniques
IRJET-  	  A Survey on Link Prediction TechniquesIRJET-  	  A Survey on Link Prediction Techniques
IRJET- A Survey on Link Prediction TechniquesIRJET Journal
 
A Survey On Link Prediction In Social Networks
A Survey On Link Prediction In Social NetworksA Survey On Link Prediction In Social Networks
A Survey On Link Prediction In Social NetworksApril Smith
 
survey of different data dependence analysis techniques
 survey of different data dependence analysis techniques survey of different data dependence analysis techniques
survey of different data dependence analysis techniquesINFOGAIN PUBLICATION
 
Poster Abstracts
Poster AbstractsPoster Abstracts
Poster Abstractsbutest
 
Iaetsd similarity search in information networks using
Iaetsd similarity search in information networks usingIaetsd similarity search in information networks using
Iaetsd similarity search in information networks usingIaetsd Iaetsd
 
992 sms10 social_media_services
992 sms10 social_media_services992 sms10 social_media_services
992 sms10 social_media_servicessiyaza
 
Scale-Free Networks to Search in Unstructured Peer-To-Peer Networks
Scale-Free Networks to Search in Unstructured Peer-To-Peer NetworksScale-Free Networks to Search in Unstructured Peer-To-Peer Networks
Scale-Free Networks to Search in Unstructured Peer-To-Peer NetworksIOSR Journals
 
IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...
IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...
IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...IRJET Journal
 
01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)Duke Network Analysis Center
 
01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measuresdnac
 
IRJET- Link Prediction in Social Networks
IRJET- Link Prediction in Social NetworksIRJET- Link Prediction in Social Networks
IRJET- Link Prediction in Social NetworksIRJET Journal
 

Similar to Link Analysis of Life Sciences Linked Data (20)

Hamalt genetics based peer to-peer network architecture to encourage the coo...
Hamalt  genetics based peer to-peer network architecture to encourage the coo...Hamalt  genetics based peer to-peer network architecture to encourage the coo...
Hamalt genetics based peer to-peer network architecture to encourage the coo...
 
HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...
HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...
HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...
 
HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...
HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...
HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...
 
Keynote at AImWD
Keynote at AImWDKeynote at AImWD
Keynote at AImWD
 
G5234552
G5234552G5234552
G5234552
 
Distributed Link Prediction in Large Scale Graphs using Apache Spark
Distributed Link Prediction in Large Scale Graphs using Apache SparkDistributed Link Prediction in Large Scale Graphs using Apache Spark
Distributed Link Prediction in Large Scale Graphs using Apache Spark
 
An approach for transforming of relational databases to owl ontology
An approach for transforming of relational databases to owl ontologyAn approach for transforming of relational databases to owl ontology
An approach for transforming of relational databases to owl ontology
 
IRJET- A Survey on Link Prediction Techniques
IRJET-  	  A Survey on Link Prediction TechniquesIRJET-  	  A Survey on Link Prediction Techniques
IRJET- A Survey on Link Prediction Techniques
 
A Survey On Link Prediction In Social Networks
A Survey On Link Prediction In Social NetworksA Survey On Link Prediction In Social Networks
A Survey On Link Prediction In Social Networks
 
survey of different data dependence analysis techniques
 survey of different data dependence analysis techniques survey of different data dependence analysis techniques
survey of different data dependence analysis techniques
 
Poster Abstracts
Poster AbstractsPoster Abstracts
Poster Abstracts
 
Iaetsd similarity search in information networks using
Iaetsd similarity search in information networks usingIaetsd similarity search in information networks using
Iaetsd similarity search in information networks using
 
Content-based link prediction
Content-based link predictionContent-based link prediction
Content-based link prediction
 
992 sms10 social_media_services
992 sms10 social_media_services992 sms10 social_media_services
992 sms10 social_media_services
 
Scale-Free Networks to Search in Unstructured Peer-To-Peer Networks
Scale-Free Networks to Search in Unstructured Peer-To-Peer NetworksScale-Free Networks to Search in Unstructured Peer-To-Peer Networks
Scale-Free Networks to Search in Unstructured Peer-To-Peer Networks
 
M033059064
M033059064M033059064
M033059064
 
IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...
IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...
IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...
 
01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)
 
01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures
 
IRJET- Link Prediction in Social Networks
IRJET- Link Prediction in Social NetworksIRJET- Link Prediction in Social Networks
IRJET- Link Prediction in Social Networks
 

More from Michel Dumontier

A metadata standard for Knowledge Graphs
A metadata standard for Knowledge GraphsA metadata standard for Knowledge Graphs
A metadata standard for Knowledge GraphsMichel Dumontier
 
Data-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge GraphsData-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge GraphsMichel Dumontier
 
The Role of the FAIR Guiding Principles for an effective Learning Health System
The Role of the FAIR Guiding Principles for an effective Learning Health SystemThe Role of the FAIR Guiding Principles for an effective Learning Health System
The Role of the FAIR Guiding Principles for an effective Learning Health SystemMichel Dumontier
 
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...Michel Dumontier
 
The role of the FAIR Guiding Principles in a Learning Health System
The role of the FAIR Guiding Principles in a Learning Health SystemThe role of the FAIR Guiding Principles in a Learning Health System
The role of the FAIR Guiding Principles in a Learning Health SystemMichel Dumontier
 
Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...Michel Dumontier
 
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...Michel Dumontier
 
Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?Michel Dumontier
 
The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...Michel Dumontier
 
Keynote at the 2018 Maastricht University Dinner
Keynote at the 2018 Maastricht University DinnerKeynote at the 2018 Maastricht University Dinner
Keynote at the 2018 Maastricht University DinnerMichel Dumontier
 
The future of science and business - a UM Star Lecture
The future of science and business - a UM Star LectureThe future of science and business - a UM Star Lecture
The future of science and business - a UM Star LectureMichel Dumontier
 
Developing and assessing FAIR digital resources
Developing and assessing FAIR digital resourcesDeveloping and assessing FAIR digital resources
Developing and assessing FAIR digital resourcesMichel Dumontier
 
Advancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIRAdvancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIRMichel Dumontier
 
A Framework to develop the FAIR Metrics
A Framework to develop the FAIR MetricsA Framework to develop the FAIR Metrics
A Framework to develop the FAIR MetricsMichel Dumontier
 
FAIR principles and metrics for evaluation
FAIR principles and metrics for evaluationFAIR principles and metrics for evaluation
FAIR principles and metrics for evaluationMichel Dumontier
 
Towards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRnessTowards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRnessMichel Dumontier
 
Making the most of phenotypes in ontology-based biomedical knowledge discovery
Making the most of phenotypes in ontology-based biomedical knowledge discoveryMaking the most of phenotypes in ontology-based biomedical knowledge discovery
Making the most of phenotypes in ontology-based biomedical knowledge discoveryMichel Dumontier
 

More from Michel Dumontier (20)

A metadata standard for Knowledge Graphs
A metadata standard for Knowledge GraphsA metadata standard for Knowledge Graphs
A metadata standard for Knowledge Graphs
 
Data-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge GraphsData-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge Graphs
 
Evaluating FAIRness
Evaluating FAIRnessEvaluating FAIRness
Evaluating FAIRness
 
The Role of the FAIR Guiding Principles for an effective Learning Health System
The Role of the FAIR Guiding Principles for an effective Learning Health SystemThe Role of the FAIR Guiding Principles for an effective Learning Health System
The Role of the FAIR Guiding Principles for an effective Learning Health System
 
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
 
The role of the FAIR Guiding Principles in a Learning Health System
The role of the FAIR Guiding Principles in a Learning Health SystemThe role of the FAIR Guiding Principles in a Learning Health System
The role of the FAIR Guiding Principles in a Learning Health System
 
Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...
 
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
 
Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?
 
The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...
 
Keynote at the 2018 Maastricht University Dinner
Keynote at the 2018 Maastricht University DinnerKeynote at the 2018 Maastricht University Dinner
Keynote at the 2018 Maastricht University Dinner
 
The future of science and business - a UM Star Lecture
The future of science and business - a UM Star LectureThe future of science and business - a UM Star Lecture
The future of science and business - a UM Star Lecture
 
Are we FAIR yet?
Are we FAIR yet?Are we FAIR yet?
Are we FAIR yet?
 
Developing and assessing FAIR digital resources
Developing and assessing FAIR digital resourcesDeveloping and assessing FAIR digital resources
Developing and assessing FAIR digital resources
 
Advancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIRAdvancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIR
 
A Framework to develop the FAIR Metrics
A Framework to develop the FAIR MetricsA Framework to develop the FAIR Metrics
A Framework to develop the FAIR Metrics
 
FAIR principles and metrics for evaluation
FAIR principles and metrics for evaluationFAIR principles and metrics for evaluation
FAIR principles and metrics for evaluation
 
Towards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRnessTowards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRness
 
Ontologies
OntologiesOntologies
Ontologies
 
Making the most of phenotypes in ontology-based biomedical knowledge discovery
Making the most of phenotypes in ontology-based biomedical knowledge discoveryMaking the most of phenotypes in ontology-based biomedical knowledge discovery
Making the most of phenotypes in ontology-based biomedical knowledge discovery
 

Recently uploaded

Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Jshifa
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 

Recently uploaded (20)

Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 

Link Analysis of Life Sciences Linked Data

  • 1. Link Analysis of Life Science Linked Data 1 Wei Hu1, Honglei Qiu1, and Michel Dumontier2 1State Key Laboratory for Novel Software Technology, Nanjing University, China 2Center for Biomedical Informatics Research, Stanford University @micheldumontier::ISWC 2015
  • 2. Linked Data offers links between datasets, but they are often incomplete and may contain errors. @micheldumontier::ISWC 20152
  • 3. Network Analysis • Network analysis has long been used to study link structures – The structure of the Web – Network medicine: cellular networks and implications @micheldumontier::ISWC 20153 Power law is scale free A graph demonstrates the small world phenomenon, if its clustering coefficient is significantly higher than that of a random graph on the same node set, and if the graph has a shorter average distance. BTC2010 The clustering coefficient quantifies how close its neighbors are to be a clique. The average distance is the average shortest path length between all nodes in the graph.
  • 4. Dataset link analysis (using RDF data model) Entity link analysis (using cross-references) Term link analysis (using ontology matching) @micheldumontier::ISWC 20154
  • 5. @micheldumontier::ISWC 2015 Linked Data for the Life Sciences 5 Bio2RDF is an open source project to unify the representation and interlinking of biological data using RDF. chemicals/drugs/formulations, genomes/genes/proteins, domains Interactions, complexes & pathways animal models and phenotypes Disease, genetic markers, treatments Terminologies & publications • Release 3 (June 2014) • 35 datasets • 11B RDF triples • 1B entities • 2K classes • 4K properties
  • 6. Dataset Links @micheldumontier::ISWC 20156 Network Properties 1. Well linked 2. Hubs and authorities 3. small-world phenomenon Average distance = 2.77 vs 6 Clustering coefficient = 0.22 vs 0.13 4. robust on systematic removal of nodes
  • 7. Entity Link Analysis How well do entities link to each other? • 76% entity links involve a special kind of RDF triples – e.g. <kegg:D03455, kegg:x-drugbank, drugbank:DB00002> – x-relations have under-specified semantics • May be truly identical, may refer to another related entity … • Degree distribution – Some do not follow power law • Exponent is too large (close to 5) 7 BTC2010 @micheldumontier::ISWC 2015
  • 8. symmetry of entity links varies between different pairs of datasets • Over 99% of links are reciprocated in DrugBank-PharmGKB and OMIM-HGNC – Suggests link sharing and synchronization • Only 58% of links in DrugBank-KEGG and 51% of OMIM-Orphanet links are reciprocal – Suggests incomplete mapping • 28% of OMIM-Orphanet links are malposed – Suggests variation in model (omim:Phenotype to orphanet:Disorder) 8 @micheldumontier::ISWC 2015
  • 9. Transitivity Analysis: Find mismatches and discover new links @micheldumontier::ISWC 20159
  • 10. Evaluation of Entity Matching How accurate are current entity matching approaches? • Built a benchmark from the reciprocal links between similarly-typed entities • Evaluated several entity matching approaches – Label similarity: Levenstein, Jaro-Winkler, N-gram, Jaccard – Machine learning: Linear regression, logistic regression with 5 properties • Many-to-one links are difficult to be discovered 10 @micheldumontier::ISWC 2015
  • 11. Term Link Analysis How similar are the topics in the data network? • Use ontology matching to generate term link graph – Falcon-AO (linguistic matchers + structural matcher + synonyms) • Created 83K class mappings, 1.5K object property mappings, and 858 data property mappings – Similarity threshold = 0.9 – Top-5 popular labels for classes and properties • Significant overlap in topics, does not follow power law as in broader SW 11 @micheldumontier::ISWC 2015
  • 12. Correlation of Link Graphs To what degree are each of the three link graphs are correlated? • Spearman’s rank correlation coefficient: – Entity link graph  dataset pairs: entity links / entities – Term link graph  dataset pairs: term mappings / terms – Dataset link graph  dataset pairs: shortest path length • All positively correlated – Closer datasets in distance have more linked entities and terms – Number of linked entities contributes little to overlap of topics 12 @micheldumontier::ISWC 2015
  • 13. Summary of Findings • Dataset, entity and term link graphs do not necessarily share the same characteristics with the Hypertext / Semantic Web – Degree distribution of entity links does not follow power law – Data hubs • A significant number of entities have been linked using x-relations, but their intended semantics differs – Classes are identical or equivalent  entity links represent logical equivalence • Symmetric and transitive entity links do exist, but their utility is weakened due to their small number – Meanings of entity links may shift during transitive closure • Only matching the labels of entities may fail, while combining different properties and using simple learning algorithms achieve good accuracy 13 @micheldumontier::ISWC 2015