7. In a nutshell:
British Library Labs works with researchers on
their specific problems, trying to assess how
widely this problem is felt.
With their help, we talk to communities of
researchers and try to pinpoint what they need
as opposed to what they think they need to ask
us.
10. “Give me all of collection X!”
Common for researchers to want all of a named
collection.
Also common for us to give names to a collection
based on who paid for it, or what project 'collated'
it.
11. Farces...
A common plot
mechanism:
A conversation where
the participants leave in
agreement but with two
very different ideas of
what was actually
discussed.
17. Microsoft Books digitisation project
● Started in 2007, but stopped in 2009 due to the
cancellation of the MS Book search project.
● Digitised approximately 49k works, (~65k
volumes).
● Online from 2012 via a “standard” page-turning
interface, but very low usage statistics.
20. Bias in digitisation
The tool was made to give a statistically valid sample.
Due to the paltry amount digitised, it showed how
skewed the digital corpus is, compared to the overall
holdings.
Allen B. Riddell in “Where are the novels?”* estimates
that using HathiTrust’s corpus:
“... about 58%—somewhere between 47% and 68%—of
the 2,903 novels [all publications in English between
1800 and 1836] have publicly accessible scans.”
* (2012) https://ariddell.org/where-are-the-novels.html
32. “Access”
The newspapers were accessible. We
had access to the newspapers but...
We didn't have access to them.
Keyword search fails miserably, and
bulk access is an issue.
33. Simple data structure would've
helped!
All projects to date would’ve been made
incredibly easier if:
• Every thing had a URL.
• The URL linked to a page that tells you all
about that thing.
• It should link to other, related things.
• The page was machine-readable - never
assumed a human would always read it.
• Access to all data – images, XML, etc
34.
35.
36.
37. Uptake?
Hard to measure but:
•13-20 million hits on average every
month, over 330,000,000 hits to
date.
•Almost every image has been seen at
least 20 times.
•Over 500,000 tags added by
volunteers and machine algorithms.
•Iterative crowdsourcing is key.
38. Iterative crowdsourcing?
(The term is stolen with permission from Mia Ridge.)
1. Crowdsource broad facts and
subcollections of related items will
emerge.
2. No 'one-size-fits-all': Subcollections
allow for more focussed curation.
Goto step 1
39. Purposefully contextless
● Presenting them through Flickr removed the
illustration's context.
– Did this help or hinder?
● Wished to stimulate research with the illustrations
themselves (linotypes, etchings, etc). CS research
was primarily 'Vision'
40.
41. It wasn't perfect, it was an experiment
“You know, the whole thing about perfectionism. The
perfectionism is very dangerous, because of course if
your fidelity to perfectionism is too high, you never
do anything.
Because doing anything results in— It’s actually kind of
tragic because it means you sacrifice how gorgeous and
perfect it is in your head for what it really is.”
- As told to Leonard Lopate on WNYC on March 4, 1996.
(emphasis my own)
http://blankonblank.org/interviews/david-foster-wallace-on-
ambition/
42. Fear of imperfection
Encourages us to value the systems that provide
access above the outcomes that could occur.
Adherence to a specification, and 'hit' counts are
easy to measure.
Once you've built one interface, people are loath
to make any others that would run in parallel.
43.
44. Metaphors don't translate well
between media
Why do we assume that physical facsimiles
are anything but a comforting solution?
54. “Crowdsourcing”
Found lots of really bad assumptions using this term:
●
A crowd of people, each doing a small bit
●
You must have special software for it
●
If you build it, they will come – free labour!
●
It's totally untrustworthy
●
It's easy
●
It fixes all problems
●
It's cheap
55. “Crowdsourcing”
●
A crowd of people, each doing a small bit
%
done
Crowd
Zooniverse usage concurs
with this distribution
56. “Crowdsourcing”
●
You must have special software for it
Capturing input, showing progress and engaging with
volunteers is what is important.
Spreadsheets can be a wonderful thing!
59. Investigation into the unusual
●
Can we avoid the keyboard and mouse?
●
Can we make use of casual interaction, as
opposed to the usual “group of experts”?
●
Can useful games be made with this constraint?
●
Can they be fun, as well as rewarding?
●
Which age ranges understand what an arcade
machine even is?
63. In Summary:
●
Be careful with the words you use, especially those you
think everyone understands
●
Things do not need to be catalogued or perfect to be
useful to people
●
Wanting access to everything is the default
●
A singular presentation of a collection is a risky strategy
– only mimicking the physical may not be the best idea
●
Experts are where you find them, look after them once
you do!
●
Make space to experiment, to fail and to learn from your
mistakes.