SlideShare a Scribd company logo
1 of 21
National Library of Scotland
Leabharlann Nàiseanta na h-Alba
MARC records for archived websites
on the Archive of Tomorrow Project
Mark Simon Haydn, Metadata Analyst, Archive of Tomorrow project, National Library of Scotland
Agnieszka Kurzeja, Metadata Co-ordinator, Cambridge University Library
CILIP Metadata & Discovery Group Conference 2023
#CILIPMDG2023
National Library of Scotland
Leabharlann Nàiseanta na h-Alba
Archive of Tomorrow project
• 18-month NLS-led collaboration between Legal Deposit libraries to collect
wide-range of health discourse online, improving access to website captures
available through the UK Web Archive
• Initial collecting focus on wide range of COVID-19 resources, expanding to
provide dedicated subcollections of wide-ranging health topics
• Project team including Web Archivists at NLS, CUL, Bodleian & University of
Edinburgh as well as Project Manager, Rights Officer & and Metadata Analyst;
project also appointed two AoT research fellows and collaborated with
Cambridge University Library Metadata Co-Ordinator
• Collection available at webarchive.org.uk/en/ukwa/collection/4028 and
data.nls.uk; research workshops held at NLS, Edinburgh University,
Cambridge University Library
National Library of Scotland
Leabharlann Nàiseanta na h-Alba
webarchive.org.uk
National Library of Scotland
Leabharlann Nàiseanta na h-Alba
National Library of Scotland
Leabharlann Nàiseanta na h-Alba
Open Access
Onsite (LDL) access only
National Library of Scotland
Leabharlann Nàiseanta na h-Alba
Collection (“Wellbeing”) & Target (“Adopting Positivity
Substack”) metadata in JSON format
- Derivative metadata for researchers (data.nls.uk)
- Repurposed to populate catalogue records
- Licenced for reuse
Wellbeing
Blogs and Social Media
Talking about Health
Health Organisations and Services
Medicine & Health
National Library of Scotland
Leabharlann Nàiseanta na h-Alba
• No in-built ACT metadata export available;
first test records populated with TSV
exports manually generated by BL
• BL developed API to enable metadata
requests on demand, standardising output
of ACT target and collection MD:
• Previous NLS experience crosswalking
volunteer ISBD input into minimum viable
bib record
National Library of Scotland
Leabharlann Nàiseanta na h-Alba
HMSO crosswalk and normalization rule for volunteer cataloguing developed by Carol Hunter and Ian Horobin
AOT crosswalk and AOT normalization rule (DROOL)
National Library of Scotland
Leabharlann Nàiseanta na h-Alba
Excel transformations
(008)
https://www.oclc.org/content/dam/research
/publications/2018/oclcresearch-wam-
recommendations.pdf
National Library of Scotland
Leabharlann Nàiseanta na h-Alba
Normalisation rules (Drools)
replaceControlContents "LDR.{6,1}" with "m“
replaceContents "041.a.EN" with "eng“
if(exists "041.{0,*}.a.EN")
addControlField "007.cr#cnu###zznzz“
addField "040.{-,-}.a.StEdNL" if (not exists "040.a")
addField "336.{-,-}.a.text"
addSubField "336.{-,-}.b.txt"
removeField "362" if (exists "362.{-,-}.a.REF|N/A|VALUE!")
changeField "265" to "264"
Examples at https://developers.exlibrisgroup.com/blog/alma-normalization-
rule-examples
National Library of Scotland
Leabharlann Nàiseanta na h-Alba
National Library of Scotland
Leabharlann Nàiseanta na h-Alba
Enhancements
Variety in how creator organisations are described:
NHS, N.H.S., National Health Service
NLS Web Archivist Eilidh MacGlone assigns Wikidata QIDs
during QC workflow:
QID added to unused ACT field
↳ VIAF ID extracted from Wikidata entry
↳ Linked LC/NACO authority record
reconciled using OpenRefine
↳ Authorised name cropped and paired
with JSON URI, with ISNIs where
available
National Library of Scotland
Leabharlann Nàiseanta na h-Alba
LCSH & FAST analogues for ACT Collection and Subject terms
developed by Agnieszka Kurzeja, Metadata Co-Ordinator,
Cambridge University Libraries
National Library of Scotland
Leabharlann Nàiseanta na h-Alba
Searching for Library of Congress Subject Headings
National Library of Scotland
Leabharlann Nàiseanta na h-Alba
National Library of Scotland
Leabharlann Nàiseanta na h-Alba
FAST Conversion
National Library of Scotland
Leabharlann Nàiseanta na h-Alba
FAST dataset download @ OCLC
+
Prepared RDF files from
National Library of Wales
Short-form target descriptions
paired with target URIs
WARC -> WAT full text?
⚠️
👷🚧👷
Loading FAST.nt vocab
using Docker, Bash
National Library of Scotland
Leabharlann Nàiseanta na h-Alba
annif.info
National Library of Scotland
Leabharlann Nàiseanta na h-Alba
National Library of Scotland
Leabharlann Nàiseanta na h-Alba
Challenges
Two stumbling blocks for using ANNIF at scale:
- Requires wide spread of tech skills to prepare vocabulary files, train engine, run at
command line (eased by ANNIF Google Group: https://groups.google.com/g/annif-users)
- Most effective use would involve easy access to target full text (WARC-derivative WAT);
currently only available at target level
Findings
Accessibility as priority, improving discovery of web archives through catalogue
Value of minimal viable records, data normalisation facilitating creation of RDA-compliant
MARC records at scale
National Library of Scotland
Leabharlann Nàiseanta na h-Alba
ukwa.discourse.group
Mark Simon Haydn – Metadata Analyst, Archive of Tomorrow Project
m.haydn@nls.uk
Agnieszka Kurzeja – Metadata Co-ordinator
ak550@cam.ac.uk
Except for images or where otherwise stated this presentation is © National Library of Scotland and is licensed under the Creative Commons
Attribution 4.0 International Licence. To view a copy of this license, visit: http://creativecommons.org/licenses/by/4.0/

More Related Content

Similar to MARC records for archived websites on the Archive of Tomorrow project / Mark Haydn (National Library of Scotland) and Agnieszka Kurzeja (Cambridge University Library).

The NBK and the UK Distributed Print Book Collection / Rozz Evans (University...
The NBK and the UK Distributed Print Book Collection / Rozz Evans (University...The NBK and the UK Distributed Print Book Collection / Rozz Evans (University...
The NBK and the UK Distributed Print Book Collection / Rozz Evans (University...
CILIP MDG
 

Similar to MARC records for archived websites on the Archive of Tomorrow project / Mark Haydn (National Library of Scotland) and Agnieszka Kurzeja (Cambridge University Library). (20)

Europeana datainaction nov2012
Europeana datainaction nov2012Europeana datainaction nov2012
Europeana datainaction nov2012
 
The NBK and the UK Distributed Print Book Collection / Rozz Evans (University...
The NBK and the UK Distributed Print Book Collection / Rozz Evans (University...The NBK and the UK Distributed Print Book Collection / Rozz Evans (University...
The NBK and the UK Distributed Print Book Collection / Rozz Evans (University...
 
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
 
Update From OCLC Research May 2008
Update From OCLC Research May 2008Update From OCLC Research May 2008
Update From OCLC Research May 2008
 
Opening up the archives: from basement to browser
Opening up the archives: from basement to browserOpening up the archives: from basement to browser
Opening up the archives: from basement to browser
 
Web archiving challenges and opportunities
Web archiving challenges and opportunitiesWeb archiving challenges and opportunities
Web archiving challenges and opportunities
 
Securing continuing access to ejournal content
Securing continuing access to ejournal contentSecuring continuing access to ejournal content
Securing continuing access to ejournal content
 
PEPRS and the Keepers Registry
PEPRS and the Keepers RegistryPEPRS and the Keepers Registry
PEPRS and the Keepers Registry
 
Elibrary technical strategy
Elibrary technical strategyElibrary technical strategy
Elibrary technical strategy
 
Intro nsl-sc-july
Intro nsl-sc-julyIntro nsl-sc-july
Intro nsl-sc-july
 
SCONUL Summer Conference 2019 - Alison Selina & Suzi Robinson
SCONUL Summer Conference 2019 - Alison Selina & Suzi RobinsonSCONUL Summer Conference 2019 - Alison Selina & Suzi Robinson
SCONUL Summer Conference 2019 - Alison Selina & Suzi Robinson
 
The Archivists' Toolkit presented at MARAC, November 13, 2010
The Archivists' Toolkit presented at MARAC, November 13, 2010The Archivists' Toolkit presented at MARAC, November 13, 2010
The Archivists' Toolkit presented at MARAC, November 13, 2010
 
NLW Linked Open Data Sets
NLW Linked Open Data SetsNLW Linked Open Data Sets
NLW Linked Open Data Sets
 
A Service Perspective: Unlocking metadata to enhance discoverability and conn...
A Service Perspective: Unlocking metadata to enhance discoverability and conn...A Service Perspective: Unlocking metadata to enhance discoverability and conn...
A Service Perspective: Unlocking metadata to enhance discoverability and conn...
 
Innovative Services in Libraries: Trends, Issues and Challenges
Innovative Services in Libraries: Trends, Issues and ChallengesInnovative Services in Libraries: Trends, Issues and Challenges
Innovative Services in Libraries: Trends, Issues and Challenges
 
Metadata and me
Metadata and meMetadata and me
Metadata and me
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
 
So many records, so little time
So many records, so little time So many records, so little time
So many records, so little time
 
Professor Dame Wendy Hall - Saving the Web
Professor Dame Wendy Hall - Saving the WebProfessor Dame Wendy Hall - Saving the Web
Professor Dame Wendy Hall - Saving the Web
 
Emea, March 2011
Emea, March 2011 Emea, March 2011
Emea, March 2011
 

More from CILIP MDG

Poster: The West Midlands Evidence Repository (WMER) : a regional collaborati...
Poster: The West Midlands Evidence Repository (WMER) : a regional collaborati...Poster: The West Midlands Evidence Repository (WMER) : a regional collaborati...
Poster: The West Midlands Evidence Repository (WMER) : a regional collaborati...
CILIP MDG
 
BFI Reuben Library : an RDA implementation story / Anastasia Kerameos (BFI Re...
BFI Reuben Library : an RDA implementation story / Anastasia Kerameos (BFI Re...BFI Reuben Library : an RDA implementation story / Anastasia Kerameos (BFI Re...
BFI Reuben Library : an RDA implementation story / Anastasia Kerameos (BFI Re...
CILIP MDG
 
Community forward : developing descriptive cataloguing of rare materials (RDA...
Community forward : developing descriptive cataloguing of rare materials (RDA...Community forward : developing descriptive cataloguing of rare materials (RDA...
Community forward : developing descriptive cataloguing of rare materials (RDA...
CILIP MDG
 
The West Midlands Evidence Repository (WMER) : a regional collaboration proje...
The West Midlands Evidence Repository (WMER) : a regional collaboration proje...The West Midlands Evidence Repository (WMER) : a regional collaboration proje...
The West Midlands Evidence Repository (WMER) : a regional collaboration proje...
CILIP MDG
 
Authority of assertion in repository contributions to the PID graph / George ...
Authority of assertion in repository contributions to the PID graph / George ...Authority of assertion in repository contributions to the PID graph / George ...
Authority of assertion in repository contributions to the PID graph / George ...
CILIP MDG
 

More from CILIP MDG (20)

UK Committee on RDA, RDA Day: New Tools for the Future of Cataloguing - Jenny...
UK Committee on RDA, RDA Day: New Tools for the Future of Cataloguing - Jenny...UK Committee on RDA, RDA Day: New Tools for the Future of Cataloguing - Jenny...
UK Committee on RDA, RDA Day: New Tools for the Future of Cataloguing - Jenny...
 
Challenges to implementation - Jenny Wright
Challenges to implementation - Jenny WrightChallenges to implementation - Jenny Wright
Challenges to implementation - Jenny Wright
 
Application Profiles in RDA - Jenny Wright
Application Profiles in RDA - Jenny WrightApplication Profiles in RDA - Jenny Wright
Application Profiles in RDA - Jenny Wright
 
The Official RDA Toolkit - Opportunities for Efficiency - Thurstan Young
The Official RDA Toolkit - Opportunities for Efficiency - Thurstan YoungThe Official RDA Toolkit - Opportunities for Efficiency - Thurstan Young
The Official RDA Toolkit - Opportunities for Efficiency - Thurstan Young
 
The Official RDA Toolkit - Opportunities for Enrichment - Thurstan Youing
The Official RDA Toolkit - Opportunities for Enrichment - Thurstan YouingThe Official RDA Toolkit - Opportunities for Enrichment - Thurstan Youing
The Official RDA Toolkit - Opportunities for Enrichment - Thurstan Youing
 
UKCoR RDA Day 2023 - "Only" Connect
UKCoR RDA Day 2023 - "Only" ConnectUKCoR RDA Day 2023 - "Only" Connect
UKCoR RDA Day 2023 - "Only" Connect
 
RDA methods, scenarios, tools - Gordon Dunsire
RDA methods, scenarios, tools - Gordon DunsireRDA methods, scenarios, tools - Gordon Dunsire
RDA methods, scenarios, tools - Gordon Dunsire
 
Poster: What’s in a name? Re-Discovering cataloguing and index through metada...
Poster: What’s in a name? Re-Discovering cataloguing and index through metada...Poster: What’s in a name? Re-Discovering cataloguing and index through metada...
Poster: What’s in a name? Re-Discovering cataloguing and index through metada...
 
Poster: Revamping our in-house cataloguing training / Victoria Parkinson (Kin...
Poster: Revamping our in-house cataloguing training / Victoria Parkinson (Kin...Poster: Revamping our in-house cataloguing training / Victoria Parkinson (Kin...
Poster: Revamping our in-house cataloguing training / Victoria Parkinson (Kin...
 
Poster: FAST : can it lighten the load, and what is the impact? / Jenny Wrigh...
Poster: FAST : can it lighten the load, and what is the impact? / Jenny Wrigh...Poster: FAST : can it lighten the load, and what is the impact? / Jenny Wrigh...
Poster: FAST : can it lighten the load, and what is the impact? / Jenny Wrigh...
 
Poster: The West Midlands Evidence Repository (WMER) : a regional collaborati...
Poster: The West Midlands Evidence Repository (WMER) : a regional collaborati...Poster: The West Midlands Evidence Repository (WMER) : a regional collaborati...
Poster: The West Midlands Evidence Repository (WMER) : a regional collaborati...
 
Poster: Updating the Wessex Classification Scheme for UK health libraries : a...
Poster: Updating the Wessex Classification Scheme for UK health libraries : a...Poster: Updating the Wessex Classification Scheme for UK health libraries : a...
Poster: Updating the Wessex Classification Scheme for UK health libraries : a...
 
Revamping in-house cataloguing training / Victoria Parkinson (King's College ...
Revamping in-house cataloguing training / Victoria Parkinson (King's College ...Revamping in-house cataloguing training / Victoria Parkinson (King's College ...
Revamping in-house cataloguing training / Victoria Parkinson (King's College ...
 
UK NACO funnel : progress, obstacles, and solutions / Martin Kelleher (Univer...
UK NACO funnel : progress, obstacles, and solutions / Martin Kelleher (Univer...UK NACO funnel : progress, obstacles, and solutions / Martin Kelleher (Univer...
UK NACO funnel : progress, obstacles, and solutions / Martin Kelleher (Univer...
 
Ship[w]right[e]s? : the challenges of cataloguing reports from scientific exp...
Ship[w]right[e]s? : the challenges of cataloguing reports from scientific exp...Ship[w]right[e]s? : the challenges of cataloguing reports from scientific exp...
Ship[w]right[e]s? : the challenges of cataloguing reports from scientific exp...
 
BFI Reuben Library : an RDA implementation story / Anastasia Kerameos (BFI Re...
BFI Reuben Library : an RDA implementation story / Anastasia Kerameos (BFI Re...BFI Reuben Library : an RDA implementation story / Anastasia Kerameos (BFI Re...
BFI Reuben Library : an RDA implementation story / Anastasia Kerameos (BFI Re...
 
RDA implementation at the British Library / Thurstan Young (British Library)
RDA implementation at the British Library / Thurstan Young (British Library)RDA implementation at the British Library / Thurstan Young (British Library)
RDA implementation at the British Library / Thurstan Young (British Library)
 
Community forward : developing descriptive cataloguing of rare materials (RDA...
Community forward : developing descriptive cataloguing of rare materials (RDA...Community forward : developing descriptive cataloguing of rare materials (RDA...
Community forward : developing descriptive cataloguing of rare materials (RDA...
 
The West Midlands Evidence Repository (WMER) : a regional collaboration proje...
The West Midlands Evidence Repository (WMER) : a regional collaboration proje...The West Midlands Evidence Repository (WMER) : a regional collaboration proje...
The West Midlands Evidence Repository (WMER) : a regional collaboration proje...
 
Authority of assertion in repository contributions to the PID graph / George ...
Authority of assertion in repository contributions to the PID graph / George ...Authority of assertion in repository contributions to the PID graph / George ...
Authority of assertion in repository contributions to the PID graph / George ...
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 

MARC records for archived websites on the Archive of Tomorrow project / Mark Haydn (National Library of Scotland) and Agnieszka Kurzeja (Cambridge University Library).

  • 1. National Library of Scotland Leabharlann Nàiseanta na h-Alba MARC records for archived websites on the Archive of Tomorrow Project Mark Simon Haydn, Metadata Analyst, Archive of Tomorrow project, National Library of Scotland Agnieszka Kurzeja, Metadata Co-ordinator, Cambridge University Library CILIP Metadata & Discovery Group Conference 2023 #CILIPMDG2023
  • 2. National Library of Scotland Leabharlann Nàiseanta na h-Alba Archive of Tomorrow project • 18-month NLS-led collaboration between Legal Deposit libraries to collect wide-range of health discourse online, improving access to website captures available through the UK Web Archive • Initial collecting focus on wide range of COVID-19 resources, expanding to provide dedicated subcollections of wide-ranging health topics • Project team including Web Archivists at NLS, CUL, Bodleian & University of Edinburgh as well as Project Manager, Rights Officer & and Metadata Analyst; project also appointed two AoT research fellows and collaborated with Cambridge University Library Metadata Co-Ordinator • Collection available at webarchive.org.uk/en/ukwa/collection/4028 and data.nls.uk; research workshops held at NLS, Edinburgh University, Cambridge University Library
  • 3. National Library of Scotland Leabharlann Nàiseanta na h-Alba webarchive.org.uk
  • 4. National Library of Scotland Leabharlann Nàiseanta na h-Alba
  • 5. National Library of Scotland Leabharlann Nàiseanta na h-Alba Open Access Onsite (LDL) access only
  • 6. National Library of Scotland Leabharlann Nàiseanta na h-Alba Collection (“Wellbeing”) & Target (“Adopting Positivity Substack”) metadata in JSON format - Derivative metadata for researchers (data.nls.uk) - Repurposed to populate catalogue records - Licenced for reuse Wellbeing Blogs and Social Media Talking about Health Health Organisations and Services Medicine & Health
  • 7. National Library of Scotland Leabharlann Nàiseanta na h-Alba • No in-built ACT metadata export available; first test records populated with TSV exports manually generated by BL • BL developed API to enable metadata requests on demand, standardising output of ACT target and collection MD: • Previous NLS experience crosswalking volunteer ISBD input into minimum viable bib record
  • 8. National Library of Scotland Leabharlann Nàiseanta na h-Alba HMSO crosswalk and normalization rule for volunteer cataloguing developed by Carol Hunter and Ian Horobin AOT crosswalk and AOT normalization rule (DROOL)
  • 9. National Library of Scotland Leabharlann Nàiseanta na h-Alba Excel transformations (008) https://www.oclc.org/content/dam/research /publications/2018/oclcresearch-wam- recommendations.pdf
  • 10. National Library of Scotland Leabharlann Nàiseanta na h-Alba Normalisation rules (Drools) replaceControlContents "LDR.{6,1}" with "m“ replaceContents "041.a.EN" with "eng“ if(exists "041.{0,*}.a.EN") addControlField "007.cr#cnu###zznzz“ addField "040.{-,-}.a.StEdNL" if (not exists "040.a") addField "336.{-,-}.a.text" addSubField "336.{-,-}.b.txt" removeField "362" if (exists "362.{-,-}.a.REF|N/A|VALUE!") changeField "265" to "264" Examples at https://developers.exlibrisgroup.com/blog/alma-normalization- rule-examples
  • 11. National Library of Scotland Leabharlann Nàiseanta na h-Alba
  • 12. National Library of Scotland Leabharlann Nàiseanta na h-Alba Enhancements Variety in how creator organisations are described: NHS, N.H.S., National Health Service NLS Web Archivist Eilidh MacGlone assigns Wikidata QIDs during QC workflow: QID added to unused ACT field ↳ VIAF ID extracted from Wikidata entry ↳ Linked LC/NACO authority record reconciled using OpenRefine ↳ Authorised name cropped and paired with JSON URI, with ISNIs where available
  • 13. National Library of Scotland Leabharlann Nàiseanta na h-Alba LCSH & FAST analogues for ACT Collection and Subject terms developed by Agnieszka Kurzeja, Metadata Co-Ordinator, Cambridge University Libraries
  • 14. National Library of Scotland Leabharlann Nàiseanta na h-Alba Searching for Library of Congress Subject Headings
  • 15. National Library of Scotland Leabharlann Nàiseanta na h-Alba
  • 16. National Library of Scotland Leabharlann Nàiseanta na h-Alba FAST Conversion
  • 17. National Library of Scotland Leabharlann Nàiseanta na h-Alba FAST dataset download @ OCLC + Prepared RDF files from National Library of Wales Short-form target descriptions paired with target URIs WARC -> WAT full text? ⚠️ 👷🚧👷 Loading FAST.nt vocab using Docker, Bash
  • 18. National Library of Scotland Leabharlann Nàiseanta na h-Alba annif.info
  • 19. National Library of Scotland Leabharlann Nàiseanta na h-Alba
  • 20. National Library of Scotland Leabharlann Nàiseanta na h-Alba Challenges Two stumbling blocks for using ANNIF at scale: - Requires wide spread of tech skills to prepare vocabulary files, train engine, run at command line (eased by ANNIF Google Group: https://groups.google.com/g/annif-users) - Most effective use would involve easy access to target full text (WARC-derivative WAT); currently only available at target level Findings Accessibility as priority, improving discovery of web archives through catalogue Value of minimal viable records, data normalisation facilitating creation of RDA-compliant MARC records at scale
  • 21. National Library of Scotland Leabharlann Nàiseanta na h-Alba ukwa.discourse.group Mark Simon Haydn – Metadata Analyst, Archive of Tomorrow Project m.haydn@nls.uk Agnieszka Kurzeja – Metadata Co-ordinator ak550@cam.ac.uk Except for images or where otherwise stated this presentation is © National Library of Scotland and is licensed under the Creative Commons Attribution 4.0 International Licence. To view a copy of this license, visit: http://creativecommons.org/licenses/by/4.0/

Editor's Notes

  1. AK/MH