This talk was given at the Open Repositories 2017 in Brisbane, Australia. It discussed how digitised literature in the Biodiversity Heritage Library can be used in many ways, including as a source of scientific data; beautiful historic artworks; and to provide the taxonomic community with sometimes rare or inaccessible first descriptions of new species.
Creating a network of connections: how the Biodiversity Heritage Library adds social value to science
1. Creating a network
of connections
How the Biodiversity Heritage Library
adds social value to science
Elycia Wallis, Constance Rinaldo, Jane Smith
@elyw @BioDivLibrary @BHL_Au
Open Repositories 2017 | Brisbane, Australia
2. Natural history literature and archives contain
information that is critical to studying life on Earth
SPECIES
DESCRIPTIONS
DISTRIBUTION
RECORDS
HISTORY OF
SCIENTIFIC
DISCOVERY
CLIMATE
RECORDS
INFORMATION
ON EXTINCT
SPECIES
SCIENTIFIC
OBSERVATIONS
ECOSYSTEM
PROFILES
SCIENTIFIC
ILLUSTRATIONS
3. “The cultivation
of natural history
cannot be
efficiently carried
out without
reference to an
extensive library.”
Charles Darwin, et al (1847)
4. The Biodiversity Heritage Library
(www.biodiversitylibrary.org) is an open access digital
library for biodiversity literature and archives.
5. Inspiring Discovery through Free Access
to Biodiversity Knowledge
10 years of inspiring discovery
15th-21st centuries
through free & open access
to biodiversity literature & archives
from the
Mission
The Biodiversity Heritage Library improves research
methodology by collaboratively making biodiversity
literature openly available to the world as part of a
global biodiversity community.
10. Museums Victoria
Australian Museum
Western
Australian
Museum
Queensland Museum
Royal Society of
Western Australia
South Australian Museum
GeoScience Australia
Royal Botanic Gardens
VIC
Field Naturalists Club
of Victoria
Western
Australian
Herbarium*
Royal Society of Victoria*
*Newest contributors (2017)
Royal Society
of Queensland
Linnaean Society of NSW
Australian Institute of
Marine Sciences
13. The Naturalist’s Miscellany
“Of all the Mammalia
yet known it seems
the most
extraordinary…”
“…at first view, it
naturally excites
the idea of some
deceptive
preparation by
artificial means.”
14.
15. Platypus apicalis White, 1846
Platypus apicalis, New Zealand Pinhole Borer.
Image from PaDIL website http://www.padil.gov.au/pests-and-diseases/pest/main/135769#
Photograph credit Simon Hinkley and Ken Walker, Museums Victoria
16.
17. A hand-book to the marsupialia and monotremata / by Richard Lydekker
http://www.biodiversitylibrary.org/bibliography/15228#/summary
http://dx.doi.org/10.5962/bhl.title.15228
20. Underutilised resource
Inaccessible in their current state
• Single hard copy
• Often uncatalogued
• Hand-written (in the field, on a
lap – doesn’t make for the
neatest script)
• Unsearchable
22. DATE: 26 September 1948
OBSERVATIONS
LOCATION:
Lake Corangamite
BEHAVIOUR: nesting
23. SILVER GULLS (26.9.48)
300 nests on 1 island
15 islands of similar size
Estimates 4500 nests
Nesting success
~ 1.5 eggs/nest
=7000 new gulls from this year
from this locality
24. Historic data provides value today
2012 2014
Grampians
National Park
Images: Heath Warwick & Nicole Kearney / Museums Victoria
Historic observations
• past species’
abundance
and distribution
• future biological
surveys
• threatened
species
management
1931
34. First thing to talk about is finding original names in BHL. Particularly for
species described a long time ago, very useful
35. 110,500+
IMAGES IN FLICKR
TOTAL IMAGES
TAGGED33,200+
249+MILLION
TOTAL VIEWS ON IMAGES
OF TOTAL FLICKR
COLLECTION TAGGED
TAGGED IMAGES IN
EOL
30% 18,000+
BHL FLICKR NAMED 1 OF WIRED’S
27 MUST-FOLLOW FEEDS IN
THE WORLD OF SCIENCE
*Stats as of June 2017.
WWW.FLICKR.COM/BIODIVLIBRARY
36.
37.
38. Common name
Scientific name
Location
Source library
Tagged = discoverable (citizen scientists)
Original scientific name
taxonomy:bionomial=
Artist name
artist:name=
Accepted scientific name
taxonomy:bionomial=
Machine-readable tags
Common name
taxonomy:common=
48. Thank You!
Elycia Wallis
29 June 2017 | OR2017
Stay Connected with BHL!
Follow @BioDivLibrary and @BHL_Au on social media
Join our Mailing List: library.si.edu/bhl-newsletter-signup
@elyw
Editor's Notes
Today I’d like to talk to you about the Biodiversity Heritage Library – a project that I am the Australian lead for.
The Biodiversity Heritage Library is principally a digital full text library, that contains information that is critical to studying life on Earth. In BHL you’ll find:
Species descriptions
Distribution records, which can help researchers and conservationists examine past population distribution and abundance and determine who it has changed over time
Historic climate records that are important for modern-day climate change research
Records of our history of scientific discovery, including expeditions that document modern science’s first encounters with various regions, cultures, and ecosystems
Literature and archives may be the only remaining record for extinct species
valuable and beautiful scientific illustrations
And they document ecosystems, allowing researchers to identify the various components of those ecosystems and assess how those ecosystems have changed over time
Historically, much of this literature was only available in a few select libraries in the developed world. Lack of access to literature, for whatever reason, is a major impediment to the efficiency of scientific research.
The project I’ll talk to you about today is the Biodiversity Heritage Library and how we are now using this resource to create a web of connections into literature to provide greater value to scientists than what can be gained from having the literature alone available. Simply put, BHL is an open access digital library for biodiversity literature and archives.
BHL has now been going for 10 years with its vision of “Inspiring discovery through free access to biodiversity knowledge.” BHL was born in 2006, and since then we have continuously provided free and open access to collections from the 15th-21st centuries. A majority of our collections are in the public domain, but we also work with rights holders to secure permission to digitize in copyright content and make it freely and openly available in BHL under Creative Commons licenses.
BHL operates as a consortium of natural history and botanical institutions and libraries around the world that work together to develop the library and digitize their own natural history collections and make them freely available in BHL. BHL participation is divided into Members, Affiliates, and Partners, each of which have varying degree of administrative and governance privileges. As of May 2017, we have 18 Members, 15 Affiliates, and a total of over 60 Partners across every continent (except Antarctica) contributing to BHL.
To date, BHL’s collections include over 52 million pages, which comprise over 120,000 titles and over 203,000 volumes. As previously mentioned, we also work with rights holders to secure permission to digitize in-copyright content in BHL. To date, we’ve received permission for over 615 in-copyright titles, amounting to agreements with over 260 licensors.
We believe that inspiring discovery isn’t just about providing access to literature and archives. It’s also about providing tools and services that make it easy for users to locate material of interest and keep that content in a format that meets their needs. Additionally, our data can be freely accessed and downloaded through a variety of APIs (application programming interfaces) and data exports. We also support a variety of reference management tools, including an integration with Mendeley and bibliographic downloads in BibTeX and RIS formats. Users can also freely download our content, either by full PDF downloads or by selecting specific pages to create custom PDFs. We’ve generated over 594,000 custom PDFs to date. We are also working to index the articles in our collection to allow users to search by article, not just monograph or journal title. To date, we’ve indexed over 232,000 articles. We also work with CrossRef to assign DOIs to content in our collections, allowing users to easily cite our materials. To date, we’ve assigned over 113,000 DOIs to monographs and a few select articles.
The Australian contribution to BHL is somewhat more modest but we are still proud of what a small team has been able to achieve.
The BHL Australia project is led by Museums Victoria in collaboration with the Atlas of Living Australia. We have contributors including museums, herbaria and societies. We have worked over the past couple of years to negotiate licensing agreements with publishers of Australian journals to allow us to upload in copyright titles and have been very heartened by their willing cooperation and positive response.
In Australia Museums Victoria has invested in scanning equipment and workflow training and provides scanning services to other organisations who wish to be a part of BHL. In this way, we have enabled a number of smaller publishers to contribute.
The majority of scanning at Museums Victoria is done by highly trained volunteers who have been with the project for over 6 years now and I would like to publicly acknowledge their contribution which has been invaluable.
Technical
The first example of BHL is the very ‘traditional’ use of this literature. Taxonomists have an annoying habit of changing their minds about what a species should be called – that is what evolutionary relationships a species has to others. What a species is called now is not necessarily what it has always been called.
One fascinating volume in BHL is called the Naturalist’s Miscellany written by an English zoologist called George Shaw in 1799.
Some of Australia’s most iconic species are first described in this work.
We all know what this is, right?
George Kearsley Shaw – https://en.Wikipedia.org/wiki/George_Shaw
And here is the first description – Platypus anatinus. Easy.
Or not.
That a German biologist called Herbst had gotten there first! He coined the name Platypus for a genus of wood boring insects in 1793, six years before Shaw used it for actual Platypuses. There are still many wood boring Platypus species – the ALA for example lists 24 known in Australia. They are actually a very important commercial species as they are the animals that put holes in recently cut timbers.
So actual Platypuses were redesignated as Ornithorhynchus by a German scientist called JF Blumenbach in 1800. He erected the genus Ornithorhynchus but named the species paradoxus.
In the end the Platypus retained its species name of anatinus from Shaw and genus name from Blumenthal. The name changes can be traced with a keen eye through the literature. This is the kind of ‘bread and butter’ use of BHL for many taxonomists.
The second example I’d like to give is for something out of our archives.
This box was discovered by one of our history curators in our archives. It was labelled “Estate of Graham Brown – note books”. Inside this box were 5 historic field diaries – the meticulous observations of an eminent Victorian ornithologist.
The box did have a digital record in our archives database, but this record contained no more information than what’s written on this box.
Why should we care about someone’s old diaries? Historic field diaries chronicle the scientific expeditions undertaken over time to explore, research and discover the natural history of our world. They are filled with descriptions of new discoveries and frontiers.
But, despite the wealth of information they contain, field diaries are a hugely underutilised resource. And this is because they are inaccessible in their current state. Unlike the published books where there at least multiple copies, field diaries usually only exist as a single hard copy, stored in a single location. And handwritten in the field, in historic scripts, they can be very hard to read. And as hand-written documents, they’re unsearchable.
And often field diaries in museum collections are uncatalogued, or if they are catalogued their electronic records contain insufficient information for researchers to be able to find them.
But field diaries are FULL of data!
And in diaries, historic observations are linked to the two key pieces of information that make an observation useful to science – the DATE the observation was made and the LOCATION of that observation. These are the observations made by Graham Brown on the 26 September 1948 at Lake Corangamite.
And he also gathered information about behaviour.
And sometimes that contextual information is very rich and detailed.
Historic observations can provide invaluable insights into past species’ abundance and distribution. They can be used to plan future biological surveys and they can inform threatened species management.
These are bird observations in the Grampians National Park from 1931. The Grampians is an area of great conservation interest for Museum Victoria. Here are our scientists surveying the same area in 2012 and again in 2014.
Historic occurrence records are now more important than ever, as they can provide a critical baseline for climate change studies.
These are scientific papers of research conducted using data from historic field diaries. They have told me what an arduous task this was and how much easier their work – this critical research - would be if this data was more accessible.
But the biggest barrier to making these diaries searchable and their contents available to researchers is the handwriting. Machine recognition of handwriting is still poor – despite excellent research efforts still continuing. Field diaries are notorious for having particularly bad handwriting so there’s not yet any choice but to use a human to make the transcription.
Many organisations have developed tools to transcribe handwritten material and much of this work is being done online by crowd-sourced volunteers. Many of these tools, like the Smithsonian Transcription Centre are very simple. They consist of an image of the original page and a free-text box for the volunteers to transcribe into.
There is also active research into machine recognition of handwriting but that’s not yet close to being able to read scribbly field notes.
The Atlas of Living Australia also works with colleagues at the Australian Museum to provide a crowdsource transcription tool.
It was originally designed for the transcription of handwritten specimens labels, but is now also used to transcribe survey sheets and diaries.
It was DigiVol’s flexibility that made it attractive to our project. We were able to create a custom template with a verbatim text field and a table for capturing our historic observation data – date, location, scientific name and common name – as well as a field for recording mentions of people and organisations.
To cut a long story short, the transcription project was very successful. To make the information widely available, we have uploaded the Field diaries to BHL.
Many people have suggested that the transcriptions should logically go into the place you’d usually look for the OCR of a printed book. However, that has some technical issues so at the moment the transcripts are uploaded as a separate file – still searchable but not inline with the handwritten version.
At the moment you get a catalogue record that looks a bit like this – with the page images as one ‘volume’ and the transcript as a separate volume. There’s lots of potential here for a better solution.
My final examples involves beautiful images. We often think only about the text but this time we’re focusing on the images.
To date, we have over 110,500 images in Flickr, and these images have been viewed over 249 million times. Flickr also provides a citizen science opportunity for us. In order to make the images more discoverable, we ask our community to add tags to these images that indicate the species depicted. To date, over 33,200 images have been tagged by volunteers, amounting to 30% of the total Flickr collection. These tags not only mean that users can search for images of specific species, but they also allow us to share these resources with other databases
I’ll illustrate this first using a publication by my museum, called the Prodromus of the Zoology of Victoria. Images from the Prodromus uploaded to BHL are extracted and collected together in an album in Flickr. Particularly the older scientific literature is full of amazing images, made all the more impressive when they’re collected together.
Let’s use the example of the stunning Tasselled Anglerfish. Loaded into Flickr, this image is tagged back to BHL.
We can automatically tag images loaded into Flickr with a number of tags that might make the image more discoverable within Flickr.
But we can also add machine tags. These tags are done manually by an amazing group of citizen scientists who provide an invaluable service by finding scientific names, artist names, dates, and localities.
Having the machine tags then allow the images to be further shared to other sites. In this case sharing is to the Encyclopedia of Life (EOL). EOL is an online encyclopedia dedicated to creating a web page for every species. BHL has partnered with EOL since the beginning of our library. EOL automatically ingests any images in the BHL Flickr that have been tagged with a scientific name machine tag and associates them with the corresponding species page in EOL. To date, over 18,000 of our tagged images have been ingested into EOL. This allows our resources to be exposed to a whole new audience through EOL, and also enhances EOL with rich visual content.
And here’s that illustration now also appearing in EOL.
And if you go into that image, you can see all the reciprocal links back again. Realising the goal of a networked artefact.
Closer to home, we’ve recently been working on a similar project – this time sharing images through to the Atlas of Living Australia, Australia’s national natural sciences data aggregator.
This time we’re working with a series of botanical magazines produced by Kew Gardens since 1787. The Herbarium in NSW is planning an exhibition of images of Australian plants published over the years in this magazine and were keen to see the images in the Atlas of Living Australia, as well as in Flickr, where they can make their own collection of pictures that will also appear in the exhibition.
In this case the tag needed is a country tag. This flags the image to be harvested into the Atlas, and the taxonomy: binomial tag allows it to be matched to its species page.
And here’s the image appearing along with the others used to illustrate this species.
And again, reciprocal links back to Flickr and to BHL mean that this images is fully linked between different instances where it appears.
And then as a last note, putting the images on Flickr provide other opportunities for dissemination. This Tumblr, for example, is run by Michelle Marshall – one of the amazing citizen scientists who put vast amounts of time and effort into doing the taxon tagging. Michelle then takes the images and further republishes them to her Tumblr, as well as other social media – extending their reach even further.
https://histsciart.tumblr.com/