The future of cataloging needs to be understood through its past. This presentation describes cataloging and catalogs from the book and card catalog to the present. It highlights the problems that arise when working with data that was designed over 100 years ago for the card catalog. This data no longer meets the needs of users. No, no solutions are provided, but it suggests that there is an urgency in finding some.
10. 20th century – things got faster
■ Increase in paper production, faster printing technology
■ Increased rate of publication -> increase in library size
■ Increase in literacy -> more and more diverse users
■ 1960's -> faster card production using computer typography (MARC)
11. Purpose of MAchine Readable
Cataloging
■ Produce printed cards identical to those produced before
■ Some minor sorting functions
■ A document mark-up language
12. Printed cards & shared cataloging still
meant local library work
14. Tada! Online cataloging (aka: OCLC,
RLIN, PICA)
■ MARC records + 1970'sOhio College Library Center, then others
■ Customized cards (with locations and call numbers)
■ Reprint of cards for correction of errors (no more erasing or white-out)
■ Production of cards increased greatly from 1970 to late 1980's
However….
■ More cards meant more filing
15. Card catalogs were huge
~6-8 cards per
item
X
Replacement
cards for errors
or changes
Library of Congress
1937
16. Filled whole rooms
~6-8 cards per
item
X
Replacement
cards for errors
or changes
Yale
1970's?
17. Filing became the problem
~6-8 cards per
item
X
Replacement
cards for errors
or changes
Yale
1970's?
25. Author
Title
Subject
All entries are in context
based on their headings
in alphabetical order
All headings are a complete
name or subject
Bibliographic record is the focus
Context is not visible
25
29. Boats and boating--Erie, Lake--Maps.
Books and reading--Lake Erie region.
Lake Erie, Battle of, 1813.
Erie, Lake--Navigation
Cooking, French
Alps, French (France)
French--America--History
French American literature
De la Cruz, Melissa
Cervantes Saavedra, Miguel de
30. Results no longer ordered by
headings
1. Cat breeds
2. Cat breeds
3. Cat breeds – History
4. Cat breeds – Handbooks,
manuals, etc.
5. Cat breeds
6. Cat breeds -Thailand
Order of display:
Where did the headings go?
31. Cat breeds
Arco book of cats / by Grace Pond
Champion cats of the world / by Catherine Ing
…
Cat breeds – Handbooks, manuals, etc.
The complete cat owner's manual / Susie Page
…
Cat breeds – History
Fifty years of pedigree cats [by] May Eustace & Elizabeth Towe
...
Cat breeds – Thailand
Mǣo Thai / Sutthilak ʻAmphanwong
What catalogers wanted: heading-ordered display
32. • Canals and Rivers of Britain
• The Crimson Hair Murders
• Darwin
• Darwin; A Graphic Biography : the Really Exciting and
Dramatic Story of A Man Who Mostly Stayed at Home
andWrote Some Books
• Darwin; Business Evolving in the Information Age
• Darwin's Radio
• Emma Darwin, A Century of Family Letters, 1792-1896
• Java Cookbook
Keyword search means non-coherent set
(kw = "darwin")
Titles:
33. • Darwin, Charles, 1809-1882 – Influence
• Darwin, Charles, 1809-1882 — Juvenile Literature
• Darwin, Charles, 1809-1882 — Comic Books, Strips, Etc
• Darwin Family
• DNAViruses — Fiction
• Java (Computer program language)
• Mystery Fiction
• Rivers--Great Britain
• Women Molecular Biologists — Fiction
Subjects:
34. • Bear, Greg
• Byrne, Eugene
• Darwin, Charles, 1809-1882
• Darwin, EmmaWedgwood, 1808-1896
• Darwin, Ian F.
• Darwin, Andrew
• Teilhet, Darwin L.
Authors:
45. FRBR is not our
future
It is 20 years old this
year
Based on relational
database technology
■ 1990 – Stockholm meeting (IFLA)
■ 1992 –Terms of reference completed
■ 1994 – First draft for comment
■ 1998 – Final draft
■ 2009 – Current draft
■ 2013 – RDA implemented
■ 2017 – Library Reference Model
47. “FRBR is not a data
model. FRBR is not
a metadata
scheme. FRBR is
not a system
design structure. It
is a conceptual
model of the
bibliographic
universe.” B.
Tillett, 2005
Tom Delsey
52. LRM: based on FRBR, which is 20 years
old; adds "explore"
■ FindTo bring together information about one or more resources of interest by
searching on any relevant criteria
■ Identify To clearly understand the nature of the resources found and to distinguish
between similar resources
■ SelectTo determine the suitability of the resources found, and to be enabled to either
accept or reject specific resources
■ ObtainTo access the content of the resource
■ ExploreTo discover resources using the relationships between them and thus place
the resources in a context
53. Implementation?What should the
catalog be?
Find: "To facilitate this task, the information system
seeks to enable effective searching by offering
appropriate search elements or functionality."
54. Implementation?
Find: "To facilitate this task, the
information system seeks to
enable effective searching by
offering appropriate search
elements or functionality."
55.
56. FRBR/LRM
doesn't solve
our problems
Not a technology/Is a technology
No record design/Is a record design
Forget the diagrams – they have problems
It does not produce significantly different
bibliographic data!
RDA is the only implementation, and it isn’t a
technology or record design, it's cataloging rules
58. New set of problems the catalog must
solve
■ Scarcity
■ Expert readers
■ Limited formats
■ Limited access
■ Users were local
■ Abundance
■ Everyone
■ Multiple media
■ The Internet
■ Users are remote
Then Now
Evolution
65. Not all
resources are
equal
Yet we treat them all the same.
Do we really want someone to select the
least important book on the topic?
Could ranking be based on importance?
Suitability to the user? ??
76. If we are to have a future: a manifesto
■ We have to address that our data is inappropriate for today's uses and users
– "Cataloging" needs to be an essential part of "creating the catalog", not separate
from the technology that makes use of it
– Catalog data has to be based on what users KNOW, not what we think they
SHOULD know
– We need to provide information ABOUT authors/works/subjects, not just headings
– We have to see resource abundance as a major issue to be addressed
– The catalog should place works in (intellectual) context
– The catalog must reflect what the user cares about (content) not what the library
cares about (purchase and inventory)
– The catalog needs to provide better subject access! Not just known item searching
■ [your ideas here]
77. We can't because …
■ It's too expensive
■ It won't be authoritative
■ We already have "too much" data
■ …
■ We can't change
■ We don't want to change
No!
We
Have
To
Change!
78. A 12-step program begins …
1. Admit you have a problem
(We have a problem)
79. In summary
■ We haven't had a major change in the content of our data in ~130-150 years
■ We have not accepted that technological change requires data designed for that
technology
■ We have not addressed the fact that our current challenge is resource abundance
■ We think our users are 19th century scholars
■ We treat the catalog technology and the catalog data as entirely separate operations
About a man, Scrooge, who needs to change his ways, because he is heading to disaster, and is harming others.
It's all technology – books, cards, computers, beaming into your brain …
This is a simple bibliographic example with just an author, a title, and a subject. In a card catalog, each of these headings becomes a separate card, and each card has all of the bibliographic information. EACH ONE PROVIDES A DIFFERENT CONTEXT. They get filed in the card catalog each in their own location in the alphabetical order, and have no connection to each other except for what the card says. If you pull one out, the others remain.
A search in a card catalog lands you in an alphabetically ordered list of headings, and each heading is associated with a single book. You can see the entries that follow, even if they don’t meet your criteria, and you can see the entries that precede it. Eventually, you run into entries that have nothing to do with your query.
Sharing of catalog copy was the topic of a committee formed at the same time that the American Library Association was creating – the committee on cooperation. This means that it was already a topic in 1876.
Purpose of MARC – produce printed cards faster and cheaper
Library of Congress issued cards, but only a single card type: main entry. Libraries could buy as many as they wished on the Main Entry card, and LC indicated the additional cards that it had added to its own catalog.
If you didn't like what was there, you had to make changes on the cards; sometimes with great finesse, sometimes just taking a pen and scribbling on the card to show what was different.
Entry vocabulary access did not keep up with card production. For subjects, libraries left copies of the LC subject headings near the card catalog. Not everyone could make sense of these.
Catalogs in large libraries were huge. They filled whole rooms. Filing was tedious, error prone, and could not keep up with card production.
And this was the problem in the late 1970's when I was among those hired at the University of California to use MARC records, that were the by-products of OCLC and RLIN used by the libraries to produce cards.
Large libraries were running 100K-150K behind in filing into their catalogs. That meant that books were on the shelf for 3 months before the cards could be found in the catalogs. Online cataloging created this problem when it used computers to speed up card production. Filing was the final "human" step that had not been automated.
Catalogs in large libraries were huge. They filled whole rooms. Filing was tedious, error prone, and could not keep up with card production.
And this was the problem in the late 1970's when I was among those hired at the University of California to use MARC records, that were the by-products of OCLC and RLIN used by the libraries to produce cards.
Large libraries were running 100K-150K behind in filing into their catalogs. That meant that books were on the shelf for 3 months before the cards could be found in the catalogs. Online cataloging created this problem when it used computers to speed up card production. Filing was the final "human" step that had not been automated.
Catalogs in large libraries were huge. They filled whole rooms. Filing was tedious, error prone, and could not keep up with card production.
And this was the problem in the late 1970's when I was among those hired at the University of California to use MARC records, that were the by-products of OCLC and RLIN used by the libraries to produce cards.
Large libraries were running 100K-150K behind in filing into their catalogs. That meant that books were on the shelf for 3 months before the cards could be found in the catalogs. Online cataloging created this problem when it used computers to speed up card production. Filing was the final "human" step that had not been automated.
The first step in using MARC records at the University of California was to create microfiche that imitated the alphabetical access of the card catalog. Basically, the computer was a filing and printing machine. The microfiche catalog only covered the most recently cataloged items, those for which there was a MARC record. So users had to use both the card catalog and the supplementary microfiche catalog to find items.
The first step in using MARC records at the University of California was to create microfiche that imitated the alphabetical access of the card catalog. Basically, the computer was a filing and printing machine. The microfiche catalog only covered the most recently cataloged items, those for which there was a MARC record. So users had to use both the card catalog and the supplementary microfiche catalog to find items.
By the early 1980's it became possible to imagine making the machine-readable catalog available via networked computers. Remember, NO INTERNET YET beyond a few research sites. Networking was the big difficulty.
So we began creating an online catalog, even though none of us had ever seen one.
First, we were trying to put this (MARC) in to this (DB relational design)
Computers were good at direct access but not so good at the kind of linear, alphabetical browsing that the card catalog excelled at.
In addition, for online access, sorting turned out to be one of the big problems, because computers of the time were slow, memory was expensive, and our previous sort routines over our files of probably no more than 350K MARC records took weeks to complete.
The card catalog, and the cataloging that created the data for the catalog, assumed that there would be multiple entries in the catalog for each item, one entry per heading. Users would “find” resources by approaching the catalog headings in alphabetical order. Under each heading, they would see a bibliographic description.
With the computer-based catalog, the author, title and subject do not result in separate copies of the bibliographic entity – they are all on a single record. That record is entered once into the database. To access the record, indexes are made (either full headings or keyword access, but most often the latter) that point to the single copy of the record. In most designs, no matter what headings your search on, the result is that the record is retrieved. Also, in most systems, if your search matches more than one keyword in the record, the record is retrieved only once. This is a normal function of database management systems, to retrieve each item only once.
We quit retrieving headings, and we started retrieving records.
Why? How did we get here?
.
Card catalog: one item at a time
Online catalog: 2.5 screens
Prior to computers there really wasn't SEARCH, just BROWSE
Users were thrilled
Catalogers were not. They were creating headings.
This is a typical result for a keyword search in an online catalog. The records are retrieved and sometimes placed in a particular order, such as alphabetical by author and then title. (That doesn’t seem to be the case here and I don’t know what the order is in this case.) Nowhere here do we see what caused these to be retrieved – although we might in a longer display - nor is there any context provided that relates to the search. Looking at the first six records retrieved, the subject headings are: … so you can see that the value added in providing an order of the headings has been lost.
What I think catalogers would like to see is something like this. However, even this isn’t right depending on the search that was done. creating an organization within the retrieved set is not the same as creating an organization within the library collection. The retrieved set could be, heading-wise, incoherent.
Facets work for things that neatly divide the set, and preferably with a division that has a limited number of members and is coherent. This isn’t the case with authors…
What is it that we are trying to create? What is the catalog today?
Not subjects!
Lost subjects, kind of literature, character, and users
We’ve still got some major problems. One is that our data is still in separate bib and authority files, with no real links between them. Another is that we have a lot of redundancy in our data. But note that the problem that I’ve talked about up until now, performing database-like actions on data designed for physical collocation and alphabetical order, is not one that we have yet addressed.
We’ve still got some major problems. One is that our data is still in separate bib and authority files, with no real links between them. Another is that we have a lot of redundancy in our data. But note that the problem that I’ve talked about up until now, performing database-like actions on data designed for physical collocation and alphabetical order, is not one that we have yet addressed.
20 years old, based on relational database technology
These three diagrams are what most people know about FRBR. However, it should be obvious that a few simple diagrams do not carry all of the information of the full text of the study group’s document. This all proves the power of 5-7 – the number of items that we can absorb in a single thought. If these hadn’t been in the document, we probably wouldn’t have even noticed FRBR, and by “we” I mean those of us in technology.
are not new – same as cutter
are not user tasks, they are actions that you can perform on the catalog
leave out a whole host of things like downloading citations, APIs, etc.; no concept that users should act on the catalog – compute, add information, mash-up.
Half a page; compared to attributes in chapter 6
"information system" can only act on data that is there.
Data that is there determines what searching can be done.
And yet we continue with practices that are clearly no longer helping.
Although it may have made sense at one time to distinguish authors by date, not only are the dates not much of a distinction (multiple persons born in the same year) but the user does not have that information. Author's date of birth isn't available.
What we show the user. We care more about what we bought than what the user needs.
Bibliographic data is not what users are seeking. It may not even be useful or helpful. It may not speak to users at all. Yet this is what we give them, even though we know it is wrong. I want to give you some background that I hope will give you the confidence to say this outloud and in all the wrong places. Until we overcome this, we are not going to make much progress
Where and when something was printed is no longer indicative of where it fits into
The FRBR user tasks completely break down if you have more than two screens of results, and you almost ALWAYS have more than two screens of results. How can anyone perform the tasks of identify or select with this much data.
Unfortunately faceting doesn't help either when you have a large result set.
We treat everything in the catalog equally, they all get the same metadata. There is no distinction between the most important book on a topic and one that is of little interest. There is also no distinction between those items that are suitable for beginners (people arriving at the topic for the first time) and experts in the subject area.
Instead we are using data that was appropriate to a catalog that existed about 50 years ago.
Catalog data determines what functions are available, regardless of what users need or want.
We are like taxis in a world of Uber. We missed the moments when we should have changed; when we should have changed our data, when we should have changed how we define our goals, how we provide our services. We have clung to our traditions while others innovated. So others have gone forward while we have resisted change.
The hardest thing to do today is to tell the people creating library data that it isn't what we need, and that we cannot use it in the way that they intend.
When we moved from the card to the computer, cataloging ceased to be a creation of the catalog. The catalog is no longer created through the act of cataloging, but by a technology that is superimposed and does not get to influence the content of the catalog data
Obscure rules that are only understood by librarians are not helping users
FRBR doesn't help this (nor the LRM); RDA doesn't help this (same old description, no subject access); BIBFRAME doesn't help this (has become "MARC in RDF")