1. BIO2RDF : A Semantic Web Atlas of
post genomic knowledge about
Human and Mouse
François Belleau, Nicole Tourigny,
Benjamin Good and Jean Morissette
● Centre de Recherche du CHUL, Université Laval
● Département d'informatique et de génie logiciel, Université Laval
4. Evry, June 27, 2008 CHUL research center Laval University 4
Outline
Introduction
− Problem definition
− Proposed approach
− The 4 rules of linked data
− Related Work
Results
− Bio2RDF first knowledge map
− Semantic ranking
Paget query demo with SPARQL
Future work and Conclusion
5. Evry, June 27, 2008 CHUL research center Laval University 5
Problem definition
● The objective of data integration is to
make data distributed over a number of
distinct, heterogeneous databases
accessible via a single interface
[Davidson 1995].
● We already use global text search engine
on the web (Google, Yahoo).
● There is many specialized integrated
search tools in bioinformatics (NCBI
Entrez, EBI search, KEGG GenomeNet).
13. Evry, June 27, 2008 CHUL research center Laval University 13
Proposed approach
● Apply the semantic web model to data
integration in bioinformatics;
● Use a PageRank [Brin 1998] variation
adapted to semantic graph, a method
analog to Aleman-Meza group's work: the
LinkRank;
● Adopt standard (RDF, OWL) and use
existing software (Sesame, Virtuoso,
PiggyBank).
14. Evry, June 27, 2008 CHUL research center Laval University 14
Outline
Introduction
− Problem definition
− Proposed approach
− The 4 rules of linked data
− Related Work
Results
− Bio2RDF first knowledge map
− Semantic ranking
Paget query demo with SPARQL
Future work and Conclusion
16. Evry, June 27, 2008 CHUL research center Laval University 16
Rule #1: Use URIs as names for
things.
● Using normalized identifier to name
concept is already a reality in biology
domain.
● Hexokinase is GO:0004396
● Definition :
− Catalysis of the reaction: ATP + D-hexose =
ADP + D-hexose 6-phosphate.
● Synonym of EC:2.7.1.1
17. Evry, June 27, 2008 CHUL research center Laval University 17
Rule #2 : Use HTTP URIs so that
people can look up those names.
● Derefencable URL
● The Banff Manifesto rule for URN
− urn:bm:public_namespace:private_identifier
● Normalized URL according to Banff
Manifesto:
http://bio2rdf.org/public_namespace:private_identifier
● http://bio2rdf.org/go:0004396
20. Evry, June 27, 2008 CHUL research center Laval University 20
Outline
Introduction
− Problem definition
− Proposed approach
− The 4 rules of linked data
− Related Work
Results
− Bio2RDF first knowledge map
− Semantic ranking
Paget query demo with SPARQL
Future work and Conclusion
23. Evry, June 27, 2008 CHUL research center Laval University 23
Related work – Linked data map
● If we were to draw a map of the existing
relations between linked data from
bioinformatics database providers, what
would it look like?
● Could we measure the amount of post
genomic knowledge available related to a
mouse or human genome sequence?
● Could it help answer the what is known
question?
29. Evry, June 27, 2008 CHUL research center Laval University 29
Outline
Introduction
− Problem definition
− Proposed approach
− The 4 rules of linked data
− Related Work
Results
− Bio2RDF first knowledge map
− Semantic ranking
Paget query demo with SPARQL
Future work and Conclusion
33. Evry, June 27, 2008 CHUL research center Laval University 33
Outline
Introduction
− Problem definition
− Proposed approach
− The 4 rules of linked data
− Related Work
Results
− Bio2RDF first knowledge map
− Semantic ranking
Paget query demo with SPARQL
Future work and Conclusion
47. Evry, June 27, 2008 CHUL research center Laval University 47
Bio2RDF Semantic Web Atlas
in numbers
● 30 different datasources, 30 different
namespaces
− go, geneid, uniprot, pubmed, pdb, reactome, omim,
etc.
● 195 namespaces referencing non-rdfized
datasource
− cog, genethon, tigr, cath, goa, etc.
● 8 millions topics
● 65 millions triples
● 973 Mo, size of N3 format compressed data
− http://bio2rdf.org/download/bio2rdf-atlas-080414.n3.gz
48. Evry, June 27, 2008 CHUL research center Laval University 48
Bio2RDF Semantic Web Atlas
in statistics
● Openess Ratio (OR) of 0.58
● Averange Link Rank (ALR) of 4.7
● 8 millions topics are connected by 19 millions
relations within the graph
● 58 % of URIs are referencing the open world
outside the graph
● 19 % of knowledge gain because of the mashup
effect
49. Evry, June 27, 2008 CHUL research center Laval University 49
Outline
Introduction
− Problem definition
− Proposed approach
− The 4 rules of linked data
− Related Work
Results
− Bio2RDF first knowledge map
− Semantic ranking
Paget query demo with SPARQL
Future work and Conclusion
52. Evry, June 27, 2008 CHUL research center Laval University 52
SPARQL query in a URL
http://bio2rdf.org:8890/sparql?defaultgraph
uri=&query=CONSTRUCT+%7B%0D%0A%3Fs1+%3Fp1+%3Fo1+.%0D%0A
%3Fs1+%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22rdf
syntaxns%23type%3E+%3Ftype+.+%0D%0A%3Fs1+%3Chttp%3A%2F
%2Fwww.w3.org%2F2000%2F01%2Frdfschema%23label%3E+
%3Flabel.+%0D%0A%3Fs1+%3Chttp%3A%2F%2Fbio2rdf.org
%2Fbio2rdf%23linkRank%3E+%3FlinkRank.+%0D%0A%7D%0D
%0AWHERE+%7B%0D%0A%3Fs1+%3Fp1+%3Fo1+.+%0D%0A%3Fo1+bif
%3Acontains+%22paget%22+.%0D%0A%3Fs1+%3Chttp%3A%2F
%2Fwww.w3.org%2F1999%2F02%2F22rdfsyntaxns%23type%3E+
%3Ftype+.+%0D%0A%3Fs1+%3Chttp%3A%2F%2Fwww.w3.org
%2F2000%2F01%2Frdfschema%23label%3E+%3Flabel.+%0D%0A
%3Fs1+%3Chttp%3A%2F%2Fbio2rdf.org%2Fbio2rdf%23linkRank
%3E+%3FlinkRank.+%0D%0A%7D%0D%0A%0D%0A%0D%0A%0D
%0A&format=application%2Frdf%2Bxml&debug=on
56. Evry, June 27, 2008 CHUL research center Laval University 56
Outline
Introduction
− Problem definition
− Proposed approach
− The 4 rules of linked data
− Related Work
Results
− Bio2RDF first knowledge map
− Semantic ranking
Paget query demo with SPARQL
Future work and Conclusion
57. Evry, June 27, 2008 CHUL research center Laval University 57
Future works
● Create new rdfizer for public data source;
● Build a community of users around the
Bio2RDF project (visit the Google group);
● Connect more datasources to Bio2RDF by
building collaboration between research
groups;
● Offer a public SPARQL endpoint based on
Virtuoso server :
− http://bio2rdf.org:8890/sparql
60. Evry, June 27, 2008 CHUL research center Laval University 60
Acknowlegments
Jean Morissette
Nicole Tourigny
Benjamin Good
Bioinformatics lab’s team at CHUL Research Center :
Philippe Rigault
Marc-Alexandre Nolin
Thanks to the essential annotators and data provider
and to developers of open source project :
Sesame, Virtuoso and PiggyBank.
François Belleau was a recipient of a studentship from Génome Québec.
This work have been financed in part by the Atlas of Genomic Profiles of Steroid
Action, a Genome Canada project. BMG is funded by Pacific Century
and University of British Columbia Graduate Fellowships.
61. Evry, June 27, 2008 CHUL research center Laval University 61
http://bio2rdf.org
Query the graph with SPARQL
http://bio2rdf.org:8890/sparql
Download our software
http://sourceforge.net/projects/bio2rdf/
Download the Atlas data in N3 format
http://bio2rdf.org/download
Join our group
http://groups.google.ca/group/bio2rdf
Contact us at bio2rdf@gmail.com