SlideShare a Scribd company logo
1 of 17
BioStor Next
(AKA: Text-mining BHL: towards new interfaces to the biodiversity
literature)
@rdmpage
https://iphylo.blogspot.com
Biodiversity literature challenges
•Discovery (is it online?)
•Accessibility (can I read it?)
•Machine-friendly (can I process it?)
•Knowledge-friendly (is it linked?)
BHL makes stuff available… but where are the articles?
BHL Item 261937 before BioStor...
… and after BioStor
An article from BHL Item 261937 on BioStor
200K+ articles from BHL, running since 2009 (new version last week), https://biostor.org
• Extract figures from PDF
• Upload to Zenodo, each with a DOI
• Searchable at ocellus.punkish.org
• Can we do this for BHL?
Beyond the PDF…
(see plazi.ch)
Fig 1 Distribution of
Begonia sumbawaensis
Girm. (triangles), B.
brangbosangensis Girm.
(circles) and B.
jaranpusangensis Girm.
(square).
Fig. 2 Begonia
sumbawaensis Girm. A.
Habit. B. Female flower.
C. Style. D. Male flower.
E. Stamen. F. Fruit in
cross section. G. Fruit. H.
Stipule. I. Seed. J. Bract.
Drawn by A.
Kusumawati.
Fig. 3 Begonia
brangbosangensis Girm.
A. Habit. B. Male flower.
C. Stamen. D. Female
flower. E. Style. F.
Stipule. G. Seed. H. Fruit
in cross section. I. Fruit.
J. Ovary. Drawn by A.
Kusumawati.
Fig. 4 Begonia
jaranpusangensis Girm.
A. Habit. B. Male flower.
C. Male flower tepal. D.
Stamen. E. Female
flower. F. Female flower
tepal. G. Fruit. H. Style.
Drawn by A. Kusumawati
& Wahyudi.
Three new species of Begonia (Begoniaceae) from Sumbawa Island, Indonesia
(extracted from “born digital” PDF doi:10.3850/s2382581216000041)
Born digital is “easy” 
Three new species of Begonia (Begoniaceae) from Sumbawa Island, Indonesia
(extracted from BioStor PDF, ABBYY OCR)
Fig 1. Distribution of
Begonia sumbawaensis
Girm. (triangles), B.
brangbosangensis
Ginn,(circles) and B.
jaranpusangensis Girm.
(square).
Fig. 4. Begonia
jaranpusangensis Ginn. A.
Habit. B. Male flower. C. Male
flower tepal. D.Stamen. E.
Female flower. F. Female
flower tepal. G. Fruit. H. Style.
Drawn by A. Kusumawati&
Wahyudi.
Fig. 3. Begonia
brangbosangensis Ginn. A.
Habit. B. Male flower. C.
Stamen. D. Femaleflower. E.
Style. F. Stipule. G. Seed. H.
Fruit in cross section. I. Fruit.
J. Ovary. Drawn by
A.Kusumawati.
Scanned content not so easy 
Map of localities extracted from BioStor article
H3: A Hexagonal Hierarchical Geospatial Indexing System
https://github.com/uber/h3
Biodiversity knowledge graph
Linking BHL
Begonia sumbawaensis
https://www.ipni.org/n/77157221-1
BHL doesn’t know
that this is a new
species…
… but IPNI knows that
Begonia sumbawaensis
is described on this page!
Time to join the dots…
• Linking BHL to databases (in both directions)
• Link BHL entities to Wikidata (e.g., authors, journals,
articles) (happening already)
• Represent BHL content as linked data (e.g.,
annotations, cf. International Image Interoperability
Framework, IIIF)
Extract more articles
Geographic search
Atomise articles
Embed in knowledge graph

More Related Content

More from Roderic Page

In praise of grumpy old men: Open versus closed data and the challenge of cre...
In praise of grumpy old men: Open versus closed data and the challenge of cre...In praise of grumpy old men: Open versus closed data and the challenge of cre...
In praise of grumpy old men: Open versus closed data and the challenge of cre...Roderic Page
 
BHL, BioStor, and beyond
BHL, BioStor, and beyondBHL, BioStor, and beyond
BHL, BioStor, and beyondRoderic Page
 
Cisco Digital Catapult
Cisco Digital CatapultCisco Digital Catapult
Cisco Digital CatapultRoderic Page
 
Built in the 19th century, rebuilt for the 21st
Built in the 19th century, rebuilt for the 21stBuilt in the 19th century, rebuilt for the 21st
Built in the 19th century, rebuilt for the 21stRoderic Page
 
Two graphs, three responses
Two graphs, three responsesTwo graphs, three responses
Two graphs, three responsesRoderic Page
 
GrBio Workshop talk
GrBio Workshop talkGrBio Workshop talk
GrBio Workshop talkRoderic Page
 
Biodiversity Knowledge Graphs
Biodiversity Knowledge GraphsBiodiversity Knowledge Graphs
Biodiversity Knowledge GraphsRoderic Page
 
Visualing phylogenies: a personal view
Visualing phylogenies: a personal viewVisualing phylogenies: a personal view
Visualing phylogenies: a personal viewRoderic Page
 
Biodiversity informatics: digitising the living world
Biodiversity informatics: digitising the living worldBiodiversity informatics: digitising the living world
Biodiversity informatics: digitising the living worldRoderic Page
 
Ebbe Nielsen Challenge GBIF #gb21
Ebbe Nielsen Challenge GBIF #gb21Ebbe Nielsen Challenge GBIF #gb21
Ebbe Nielsen Challenge GBIF #gb21Roderic Page
 
GBIF Science Committee Report GB21, Delhi, India
GBIF Science Committee Report GB21, Delhi, IndiaGBIF Science Committee Report GB21, Delhi, India
GBIF Science Committee Report GB21, Delhi, IndiaRoderic Page
 
Building the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge GraphBuilding the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge GraphRoderic Page
 
Biodiversity informatics: why aren't we there yet?
Biodiversity informatics: why aren't we there yet?Biodiversity informatics: why aren't we there yet?
Biodiversity informatics: why aren't we there yet?Roderic Page
 
Something about links
Something about linksSomething about links
Something about linksRoderic Page
 
Why I blog instead of writing papers
Why I blog instead of writing papersWhy I blog instead of writing papers
Why I blog instead of writing papersRoderic Page
 
Surfacing the deep data of taxonomy
Surfacing the deep data of taxonomySurfacing the deep data of taxonomy
Surfacing the deep data of taxonomyRoderic Page
 
Making data sticky
Making data stickyMaking data sticky
Making data stickyRoderic Page
 
Late night thoughts of a jet-lagged phylogeneticist
Late night thoughts of a jet-lagged phylogeneticistLate night thoughts of a jet-lagged phylogeneticist
Late night thoughts of a jet-lagged phylogeneticistRoderic Page
 

More from Roderic Page (20)

In praise of grumpy old men: Open versus closed data and the challenge of cre...
In praise of grumpy old men: Open versus closed data and the challenge of cre...In praise of grumpy old men: Open versus closed data and the challenge of cre...
In praise of grumpy old men: Open versus closed data and the challenge of cre...
 
BHL, BioStor, and beyond
BHL, BioStor, and beyondBHL, BioStor, and beyond
BHL, BioStor, and beyond
 
Cisco Digital Catapult
Cisco Digital CatapultCisco Digital Catapult
Cisco Digital Catapult
 
Built in the 19th century, rebuilt for the 21st
Built in the 19th century, rebuilt for the 21stBuilt in the 19th century, rebuilt for the 21st
Built in the 19th century, rebuilt for the 21st
 
Two graphs, three responses
Two graphs, three responsesTwo graphs, three responses
Two graphs, three responses
 
GrBio Workshop talk
GrBio Workshop talkGrBio Workshop talk
GrBio Workshop talk
 
Biodiversity Knowledge Graphs
Biodiversity Knowledge GraphsBiodiversity Knowledge Graphs
Biodiversity Knowledge Graphs
 
Visualing phylogenies: a personal view
Visualing phylogenies: a personal viewVisualing phylogenies: a personal view
Visualing phylogenies: a personal view
 
Biodiversity informatics: digitising the living world
Biodiversity informatics: digitising the living worldBiodiversity informatics: digitising the living world
Biodiversity informatics: digitising the living world
 
Ebbe Nielsen Challenge GBIF #gb21
Ebbe Nielsen Challenge GBIF #gb21Ebbe Nielsen Challenge GBIF #gb21
Ebbe Nielsen Challenge GBIF #gb21
 
GBIF Science Committee Report GB21, Delhi, India
GBIF Science Committee Report GB21, Delhi, IndiaGBIF Science Committee Report GB21, Delhi, India
GBIF Science Committee Report GB21, Delhi, India
 
Building the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge GraphBuilding the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge Graph
 
GBIF ideas
GBIF ideasGBIF ideas
GBIF ideas
 
Biodiversity informatics: why aren't we there yet?
Biodiversity informatics: why aren't we there yet?Biodiversity informatics: why aren't we there yet?
Biodiversity informatics: why aren't we there yet?
 
Something about links
Something about linksSomething about links
Something about links
 
Why I blog instead of writing papers
Why I blog instead of writing papersWhy I blog instead of writing papers
Why I blog instead of writing papers
 
Social media
Social mediaSocial media
Social media
 
Surfacing the deep data of taxonomy
Surfacing the deep data of taxonomySurfacing the deep data of taxonomy
Surfacing the deep data of taxonomy
 
Making data sticky
Making data stickyMaking data sticky
Making data sticky
 
Late night thoughts of a jet-lagged phylogeneticist
Late night thoughts of a jet-lagged phylogeneticistLate night thoughts of a jet-lagged phylogeneticist
Late night thoughts of a jet-lagged phylogeneticist
 

Recently uploaded

Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)DHURKADEVIBASKAR
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Jshifa
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 

Recently uploaded (20)

Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 

BioStor Next

  • 1. BioStor Next (AKA: Text-mining BHL: towards new interfaces to the biodiversity literature) @rdmpage https://iphylo.blogspot.com
  • 2. Biodiversity literature challenges •Discovery (is it online?) •Accessibility (can I read it?) •Machine-friendly (can I process it?) •Knowledge-friendly (is it linked?)
  • 3. BHL makes stuff available… but where are the articles?
  • 4. BHL Item 261937 before BioStor...
  • 5. … and after BioStor
  • 6. An article from BHL Item 261937 on BioStor 200K+ articles from BHL, running since 2009 (new version last week), https://biostor.org
  • 7. • Extract figures from PDF • Upload to Zenodo, each with a DOI • Searchable at ocellus.punkish.org • Can we do this for BHL? Beyond the PDF… (see plazi.ch)
  • 8. Fig 1 Distribution of Begonia sumbawaensis Girm. (triangles), B. brangbosangensis Girm. (circles) and B. jaranpusangensis Girm. (square). Fig. 2 Begonia sumbawaensis Girm. A. Habit. B. Female flower. C. Style. D. Male flower. E. Stamen. F. Fruit in cross section. G. Fruit. H. Stipule. I. Seed. J. Bract. Drawn by A. Kusumawati. Fig. 3 Begonia brangbosangensis Girm. A. Habit. B. Male flower. C. Stamen. D. Female flower. E. Style. F. Stipule. G. Seed. H. Fruit in cross section. I. Fruit. J. Ovary. Drawn by A. Kusumawati. Fig. 4 Begonia jaranpusangensis Girm. A. Habit. B. Male flower. C. Male flower tepal. D. Stamen. E. Female flower. F. Female flower tepal. G. Fruit. H. Style. Drawn by A. Kusumawati & Wahyudi. Three new species of Begonia (Begoniaceae) from Sumbawa Island, Indonesia (extracted from “born digital” PDF doi:10.3850/s2382581216000041) Born digital is “easy” 
  • 9. Three new species of Begonia (Begoniaceae) from Sumbawa Island, Indonesia (extracted from BioStor PDF, ABBYY OCR) Fig 1. Distribution of Begonia sumbawaensis Girm. (triangles), B. brangbosangensis Ginn,(circles) and B. jaranpusangensis Girm. (square). Fig. 4. Begonia jaranpusangensis Ginn. A. Habit. B. Male flower. C. Male flower tepal. D.Stamen. E. Female flower. F. Female flower tepal. G. Fruit. H. Style. Drawn by A. Kusumawati& Wahyudi. Fig. 3. Begonia brangbosangensis Ginn. A. Habit. B. Male flower. C. Stamen. D. Femaleflower. E. Style. F. Stipule. G. Seed. H. Fruit in cross section. I. Fruit. J. Ovary. Drawn by A.Kusumawati. Scanned content not so easy 
  • 10. Map of localities extracted from BioStor article
  • 11.
  • 12. H3: A Hexagonal Hierarchical Geospatial Indexing System https://github.com/uber/h3
  • 13.
  • 15. Linking BHL Begonia sumbawaensis https://www.ipni.org/n/77157221-1 BHL doesn’t know that this is a new species… … but IPNI knows that Begonia sumbawaensis is described on this page!
  • 16. Time to join the dots… • Linking BHL to databases (in both directions) • Link BHL entities to Wikidata (e.g., authors, journals, articles) (happening already) • Represent BHL content as linked data (e.g., annotations, cf. International Image Interoperability Framework, IIIF)
  • 17. Extract more articles Geographic search Atomise articles Embed in knowledge graph

Editor's Notes

  1. 10 + 2 mins
  2. https://en.wikipedia.org/wiki/FAIR_data#/media/File:FAIR_data_principles.jpg
  3. https://biostor.org/reference/249060
  4. https://ocellus.punkish.org/images.html?q=begonia&size=30&page=1#carousel
  5. https://doi.org/10.3850/s2382581216000041
  6. https://biostor.org/reference/217669
  7. http://biostor.org/reference/252604
  8. https://github.com/uber/h3
  9. https://www.dropbox.com/s/7bd0zg0srz2ra9d/Screenshot%202019-10-17%2011.29.08.png?dl=0