CityLIS talk, Feb 1st 2016

Farces and Failures
Ben O’Steen, British Library Labs
@benosteen

Names and labels we choose shape
the questions that people will ask
and the assumptions that they
make.
For example - “Labs”

https://www.flickr.com/photos/internetarchivebookimages/14763474682

In a nutshell:
British Library Labs works with researchers on
their specific problems, trying to assess how
widely this problem is felt.
With their help, we talk to communities of
researchers and try to pinpoint what they need
as opposed to what they think they need to ask
us.

What were the
researcher's initial
preconceptions of working
with the British Library?

“Give me all of collection X!”
Common for researchers to want all of a named
collection.
Also common for us to give names to a collection
based on who paid for it, or what project 'collated'
it.

Farces...
A common plot
mechanism:
A conversation where
the participants leave in
agreement but with two
very different ideas of
what was actually
discussed.

Some common farce-inducing words
Collection

Collection
Access

Collection
Access
Content

Collection
Access
Content
Metadata

Collection
Access
Content
Metadata
Crowdsourced

Microsoft Books digitisation project
● Started in 2007, but stopped in 2009 due to the
cancellation of the MS Book search project.
● Digitised approximately 49k works, (~65k
volumes).
● Online from 2012 via a “standard” page-turning
interface, but very low usage statistics.

“I am interested
in travel
accounts in
Europe during
the 19th
Century”

2013 Competition winners
http://labs.bl.uk/Ideas+for+Labs
Pieter
Francois

Bias in digitisation
The tool was made to give a statistically valid sample.
Due to the paltry amount digitised, it showed how
skewed the digital corpus is, compared to the overall
holdings.
Allen B. Riddell in “Where are the novels?”* estimates
that using HathiTrust’s corpus:
“... about 58%—somewhere between 47% and 68%—of
the 2,903 novels [all publications in English between
1800 and 1836] have publicly accessible scans.”
* (2012) https://ariddell.org/where-are-the-novels.html

John Cooper, https://www.flickr.com/photos/atomicshed/2436324958 CC-BY-NC-ND 2.0

[ ]The square brackets of the soul...

What about some of our metadata?

Katrina
Navickas, our
researcher, in
period costume!
http://turbulentlondon.com/2015/10/01/following-the-chartists-around-london/

“Access”
The newspapers were accessible. We
had access to the newspapers but...
We didn't have access to them.
Keyword search fails miserably, and
bulk access is an issue.

Simple data structure would've
helped!
All projects to date would’ve been made
incredibly easier if:
• Every thing had a URL.
• The URL linked to a page that tells you all
about that thing.
• It should link to other, related things.
• The page was machine-readable - never
assumed a human would always read it.
• Access to all data – images, XML, etc

Uptake?
Hard to measure but:
•13-20 million hits on average every
month, over 330,000,000 hits to
date.
•Almost every image has been seen at
least 20 times.
•Over 500,000 tags added by
volunteers and machine algorithms.
•Iterative crowdsourcing is key.

Iterative crowdsourcing?
(The term is stolen with permission from Mia Ridge.)
1. Crowdsource broad facts and
subcollections of related items will
emerge.
2. No 'one-size-fits-all': Subcollections
allow for more focussed curation.
Goto step 1

Purposefully contextless
● Presenting them through Flickr removed the
illustration's context.
– Did this help or hinder?
● Wished to stimulate research with the illustrations
themselves (linotypes, etchings, etc). CS research
was primarily 'Vision'

It wasn't perfect, it was an experiment
“You know, the whole thing about perfectionism. The
perfectionism is very dangerous, because of course if
your fidelity to perfectionism is too high, you never
do anything.
Because doing anything results in— It’s actually kind of
tragic because it means you sacrifice how gorgeous and
perfect it is in your head for what it really is.”
- As told to Leonard Lopate on WNYC on March 4, 1996.
(emphasis my own)
http://blankonblank.org/interviews/david-foster-wallace-on-
ambition/

Fear of imperfection
Encourages us to value the systems that provide
access above the outcomes that could occur.
Adherence to a specification, and 'hit' counts are
easy to measure.
Once you've built one interface, people are loath
to make any others that would run in parallel.

Metaphors don't translate well
between media
Why do we assume that physical facsimiles
are anything but a comforting solution?

Tagathon found nearly 30,000
maps!

Georeferencing -
http://bl.uk/maps

Not just research use!
http://www.playingbythebook.net/2014/03/18/barbapapas-new-house-a-book-so-
good-im-featuring-it-for-a-second-time/

Burning Man Festival
David Normal created light boxes around the
Burning man, using the British Library’s Flickr Images

“Crossroads of Curiosity”
launched on 20th June 2015

“Crowdsourcing”
Found lots of really bad assumptions using this term:
●
A crowd of people, each doing a small bit
●
You must have special software for it
●
If you build it, they will come – free labour!
●
It's totally untrustworthy
●
It's easy
●
It fixes all problems
●
It's cheap

“Crowdsourcing”
●
A crowd of people, each doing a small bit
%
done
Crowd
Zooniverse usage concurs
with this distribution

“Crowdsourcing”
●
You must have special software for it
Capturing input, showing progress and engaging with
volunteers is what is important.
Spreadsheets can be a wonderful thing!

“Crowdsourcing”
●
If you build it, they will come – free labour!

“Crowdsourcing”
●
It's totally untrustworthy
●
It's easy
●
It fixes all problems
●
It's cheap

Investigation into the unusual
●
Can we avoid the keyboard and mouse?
●
Can we make use of casual interaction, as
opposed to the usual “group of experts”?
●
Can useful games be made with this constraint?
●
Can they be fun, as well as rewarding?
●
Which age ranges understand what an arcade
machine even is?

In Summary:
●
Be careful with the words you use, especially those you
think everyone understands
●
Things do not need to be catalogued or perfect to be
useful to people
●
Wanting access to everything is the default
●
A singular presentation of a collection is a risky strategy
– only mimicking the physical may not be the best idea
●
Experts are where you find them, look after them once
you do!
●
Make space to experiment, to fail and to learn from your
mistakes.

My contact details:
ben.osteen@bl.uk
@benosteen
Links:
http://labs.bl.uk
http://mechanicalcurator.tumblr.com
https://flickr.com/photos/britishlibrary
https://github.com/bl-labs
http://britishlibrary.typepad.co.uk/digital-scholarship/2013/12/a-million-first-steps.html

CityLIS talk, Feb 1st 2016

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (12)

Similar to CityLIS talk, Feb 1st 2016

Similar to CityLIS talk, Feb 1st 2016 (20)

More from benosteen

More from benosteen (20)

Recently uploaded

Recently uploaded (20)

CityLIS talk, Feb 1st 2016