SlideShare a Scribd company logo
1 of 127
Download to read offline
From Ambition to Go Live
The National Library Board of Singapore’s Journey
to an operational Linked Data Management
and Discovery System
Richard Wallis
Evangelist and Founder
Data Liberate
richard.wallis@dataliberate.com
@dataliberate
15th Semantic Web in Libraries Conference
SWIB23 - 12th September 2023 - Berlin
Independent Consultant, Evangelist & Founder
W3C Community Groups:
• Bibframe2Schema (Chair) – Standardised conversion path(s)
• Schema Bib Extend (Chair) - Bibliographic data
• Schema Architypes (Chair) - Archives
• Financial Industry Business Ontology – Financial schema.org
• Tourism Structured Web Data (Co-Chair)
• Schema Course Extension
• Schema IoT Community
• Educational & Occupational Credentials in Schema.org
richard.wallis@dataliberate.com — @dataliberate
40+ Years – Computing
30+ Years – Cultural Heritage technology
20+ Years – Semantic Web & Linked Data
Worked With:
• Google – Schema.org vocabulary, site, extensions. documentation and community
• OCLC – Global library cooperative
• FIBO – Financial Industry Business Ontology Group
• Various Clients – Implementing/understanding Linked Data, Schema.org:
National Library Board Singapore
British Library — Stanford University — Europeana
2
3
National Library Board Singapore
Public Libraries
Network of 28 Public Libraries,
including 2 partner libraries*
Reading Programmes and Initiatives
Programmes and Exhibitions
targeted at Singapore communities
*Partner libraries are libraries which are partner owned
and
funded but managed by NLB/NLB’s subsidiary Libraries
and Archives Solutions Pte Ltd. Library@Chinatown and
the Lifelong Learning Institute Library are Partner libraries.
National Archives
Transferred from NHB to NLB in Nov
2012
Custodian of Singapore’s
Collective Memory: Responsible
for Collection, Preservation and
Management of
Singapore’s Public and Private
Archival Records
Promotes Public Interest in our
Nation’s
History and Heritage
National Library
Preserving Singapore’s Print
and Literary Heritage, and
Intellectual memory
Reference Collections
Legal Deposit (including
electronic)
4
Over
560,000
Singapore &
SEA items
Over 147,000
Chinese, Malay &
Tamil Languages
items
Reference Collection
Over 62,000
Social Sciences
& Humanities
items
Over 39,000
Science &
Technology
items
Over
53,000
Arts items
Over 19,000
Rare Materials
items
Archival Materials
Over 290,000
Government files &
Parliament papers
Over 190,000
Audiovisual & sound
recordings
Over 70,000
Maps & building
plans
Over
1.14m
Photographs
Over 35,000
Oral history
interviews
Over 55,000
Speeches & press
releases
Over
7,000
Posters
National Library Board Singapore
Over 5m
print collection
Over 2.4m
music tracks
78
databases
Over 7,400
e-newspapers and
e-magazines titles
Over
8,000
e-learning
courses
Over 1.7m
e-books and
audio books
Lending Collection
5
National Library Board Online Services
7
The Ambition
• To enable the discovery & display of entitles from different sources
in a combined interface
• To bring together resources physical and digital
• To bring together diverse systems across the National Library,
National Archives, and Public Libraries in a Linked Data Environment
• To provide a staff interface to view and manage all entities, their
descriptions and relationships
8
The Ambition
• To enable the discovery & display of entitles from different sources
in a combined interface
• To bring together resources physical and digital
• To bring together diverse systems across the National Library,
National Archives, and Public Libraries in a Linked Data Environment
• To provide a staff interface to view and manage all entities, their
descriptions and relationships
9
The Ambition
• To enable the discovery & display of entitles from different sources
in a combined interface
• To bring together resources physical and digital
• To bring together diverse systems across the National Library,
National Archives, and Public Libraries in a Linked Data Environment
• To provide a staff interface to view and manage all entities, their
descriptions and relationships
10
The Ambition
• To enable the discovery & display of entitles from different sources
in a combined interface
• To bring together resources physical and digital
• To bring together diverse systems across the National Library,
National Archives, and Public Libraries in a Linked Data Environment
• To provide a staff interface to view and manage all entities, their
descriptions and relationships
11
The Journey Starts(March 2021)
A Linked Data Management System (LDMS)
• Cloud-based System
• Management Interface
• Source data curated at source
– ILS, Authorities, CMS, NAS
• Linked Data
• Public Interface – Google friendly
– Schema.org
• Shared standards
– Bibframe, Schema.org
12
The Journey Starts(March 2021)
A Linked Data Management System (LDMS)
• Cloud-based System
• Management Interface
• Source data curated at source
– ILS, Authorities, CMS, NAS
• Linked Data
• Public Interface – Google friendly
– Schema.org
• Shared standards
– Bibframe, Schema.org
13
The Journey Starts(March 2021)
A Linked Data Management System (LDMS)
• Cloud-based System
• Management Interface
• Source data curated at source
– ILS, Authorities, CMS, NAS
• Linked Data
• Public Interface – Google friendly
– Schema.org
• Shared standards
– Bibframe, Schema.org
14
The Journey Starts(March 2021)
A Linked Data Management System (LDMS)
• Cloud-based System
• Management Interface
• Source data curated at source
– ILS, Authorities, CMS, NAS
• Linked Data
• Public Interface – Google friendly
– Schema.org
• Shared standards
– Bibframe, Schema.org
15
The Journey Starts(March 2021)
A Linked Data Management System (LDMS)
• Cloud-based System
• Management Interface
• Source data curated at source
– ILS, Authorities, CMS, NAS
• Linked Data
• Public Interface – Google friendly
– Schema.org
• Shared standards
– Bibframe, Schema.org
16
The Journey Starts(March 2021)
A Linked Data Management System (LDMS)
• Cloud-based System
• Management Interface
• Source data curated at source
– ILS, Authorities, CMS, NAS
• Linked Data
• Public Interface – Google friendly
– Schema.org
• Shared standards
– Bibframe, Schema.org
17
The Journey Starts(March 2021)
A Linked Data Management System (LDMS)
• Cloud-based System
• Management Interface
• Source data curated at source
– ILS, Authorities, CMS, NAS
• Linked Data
• Public Interface – Google friendly
– Schema.org
• Shared standards
– Bibframe, Schema.org
18
Contract Awarded
metaphactory platform
Low-code knowledge graph platform
Semantic knowledge modeling
Semantic search & discovery
AWS Partner
Public sector partner
Singapore based
Linked Data, Structured data, Semantic
Web, bibliographic meta data, Schema.org
and management systems consultant
19
Initial Challenges
• Data – lots of it!
– Record-based formats: MARC-XML, DC-XML, CSV
• Entities described in the data – lots and lots!
• Regular updates – daily, weekly, monthly
• Automatic ingestion
• Not the single source of truth
• Manual update capability
• Entity-based search & discovery
• Web friendly – Schema.org output
20
Initial Challenges
• Data – lots of it!
– Record-based formats: MARC-XML, DC-XML, CSV
• Entities described in the data – lots and lots!
• Regular updates – daily, weekly, monthly
• Automatic ingestion
• Not the single source of truth
• Manual update capability
• Entity-based search & discovery
• Web friendly – Schema.org output
21
Initial Challenges
• Data – lots of it!
– Record-based formats: MARC-XML, DC-XML, CSV
• Entities described in the data – lots and lots!
• Regular updates – daily, weekly, monthly
• Automatic ingestion
• Not the single source of truth
• Manual update capability
• Entity-based search & discovery
• Web friendly – Schema.org output
22
Initial Challenges
• Data – lots of it!
– Record-based formats: MARC-XML, DC-XML, CSV
• Entities described in the data – lots and lots!
• Regular updates – daily, weekly, monthly
• Automatic ingestion
• Not the single source of truth
• Manual update capability
• Entity-based search & discovery
• Web friendly – Schema.org output
23
Initial Challenges
• Data – lots of it!
– Record-based formats: MARC-XML, DC-XML, CSV
• Entities described in the data – lots and lots!
• Regular updates – daily, weekly, monthly
• Automatic ingestion
• Not the single source of truth
• Manual update capability
• Entity-based search & discovery
• Web friendly – Schema.org output
24
Initial Challenges
• Data – lots of it!
– Record-based formats: MARC-XML, DC-XML, CSV
• Entities described in the data – lots and lots!
• Regular updates – daily, weekly, monthly
• Automatic ingestion
• Not the single source of truth
• Manual update capability
• Entity-based search & discovery
• Web friendly – Schema.org output
25
Initial Challenges
• Data – lots of it!
– Record-based formats: MARC-XML, DC-XML, CSV
• Entities described in the data – lots and lots!
• Regular updates – daily, weekly, monthly
• Automatic ingestion
• Not the single source of truth
• Manual update capability
• Entity-based search & discovery
• Web friendly – Schema.org output
26
Initial Challenges
• Data – lots of it!
– Record-based formats: MARC-XML, DC-XML, CSV
• Entities described in the data – lots and lots!
• Regular updates – daily, weekly, monthly
• Automatic ingestion
• Not the single source of truth
• Manual update capability
• Entity-based search & discovery
• Web friendly – Schema.org output
27
Basic Data Model
• Linked Data
– BIBFRAME to capture detail of bibliographic records
– Schema.org to deliver structured data for search engines
– Schema.org representation of CMS, NAS, TTE data
– Schema.org enrichment of BIBFRAME
• Schema.org as the ‘lingua franca’ vocabulary of the Knowledge graph
– All entities described using Schema.org as a minimum.
28
Basic Data Model
• Linked Data
– BIBFRAME to capture detail of bibliographic records
– Schema.org to deliver structured data for search engines
– Schema.org representation of CMS, NAS, TTE data
– Schema.org enrichment of BIBFRAME
• Schema.org as the ‘lingua franca’ vocabulary of the Knowledge graph
– All entities described using Schema.org as a minimum.
29
Basic Data Model
• Linked Data
– BIBFRAME to capture detail of bibliographic records
– Schema.org to deliver structured data for search engines
– Schema.org representation of CMS, NAS, TTE data
– Schema.org enrichment of BIBFRAME
• Schema.org as the ‘lingua franca’ vocabulary of the Knowledge graph
– All entities described using Schema.org as a minimum.
30
Basic Data Model
• Linked Data
– BIBFRAME to capture detail of bibliographic records
– Schema.org to deliver structured data for search engines
– Schema.org representation of CMS, NAS, TTE data
– Schema.org enrichment of BIBFRAME
• Schema.org as the ‘lingua franca’ vocabulary of the Knowledge graph
– All entities described using Schema.org as a minimum.
31
Basic Data Model
• Linked Data
– BIBFRAME to capture detail of bibliographic records
– Schema.org to deliver structured data for search engines
– Schema.org representation of CMS, NAS, TTE data
– Schema.org enrichment of BIBFRAME
• Schema.org as the ‘lingua franca’ vocabulary of the Knowledge graph
– All entities described using Schema.org as a minimum.
32
Data Data Data!
Data Source Source Records Entity Count Update Frequency
ILS 1.4m 56.8m Daily
CMS 82k 228k Weekly
NAS 1.6m 6.7m Monthly
TTE 3k 317k Monthly
3.1m 70.4m
33
Data Ingest Pipelines – ILS
• Source data in MARC-XML files
• Step 1: marc2bibframe2 scripts
– Open source – shared by Library of Congress
– “standard” approach
– Output BIBFRAME as individual RDF-XML files
• Step 2: bibframe2schema.org script
– Open source – Bibframe2Schema.org Community Group
– SPARQL-based script
– Output Schema.org enriched individual BIBFRAME RDF files for loading into
Knowledge graph triplestore
34
Data Ingest Pipelines – ILS
• Source data in MARC-XML files
• Step 1: marc2bibframe2 scripts
– Open source – shared by Library of Congress
– “standard” approach
– Output BIBFRAME as individual RDF-XML files
• Step 2: bibframe2schema.org script
– Open source – Bibframe2Schema.org Community Group
– SPARQL-based script
– Output Schema.org enriched individual BIBFRAME RDF files for loading into
Knowledge graph triplestore
35
Data Ingest Pipelines – ILS
• Source data in MARC-XML files
• Step 1: marc2bibframe2 scripts
– Open source – shared by Library of Congress
– “standard” approach
– Output BIBFRAME as individual RDF-XML files
• Step 2: bibframe2schema.org script
– Open source – Bibframe2Schema.org Community Group
– SPARQL-based script
– Output Schema.org enriched individual BIBFRAME RDF files for loading into
Knowledge graph triplestore
36
Data Ingestion Pipelines – TTE (Authorities)
• Source data in bespoke CSV format
• Exported in collection-based files
– People, Places, organisations, time periods, etc.
• Interrelated references so needed to be considered as a whole
• Bespoke python script
• Creating Schema.org entity descriptions
• Output RDF format file for loading into Knowledge graph
triplestore
37
Data Ingestion Pipelines – TTE (Authorities)
• Source data in bespoke CSV format
• Exported in collection-based files
– People, Places, organisations, time periods, etc.
• Interrelated references so needed to be considered as a whole
• Bespoke python script
• Creating Schema.org entity descriptions
• Output RDF format file for loading into Knowledge graph
triplestore
38
Data Ingestion Pipelines – TTE (Authorities)
• Source data in bespoke CSV format
• Exported in collection-based files
– People, Places, organisations, time periods, etc.
• Interrelated references so needed to be considered as a whole
• Bespoke python script
• Creating Schema.org entity descriptions
• Output RDF format file for loading into Knowledge graph
triplestore
39
Data Ingestion Pipelines – NAS & CMS
• Source data in DC-XML files
• Step 1: DC-XML to DC-Terms RDF
– Bespoke XSLT script to translate from DC record to DCT-RDF
• Step 2: TTE lookup
– Lookup identified values against TTE entities in Knowledge graph
• use URI if matched
• Step 3: Schema.org entity creation
– SPARQL script create schema representation of DCT entities
• Output individual RDF format files for loading into Knowledge graph
40
Data Ingestion Pipelines – NAS & CMS
• Source data in DC-XML files
• Step 1: DC-XML to DC-Terms RDF
– Bespoke XSLT script to translate from DC record to DCT-RDF
• Step 2: TTE lookup
– Lookup identified values against TTE entities in Knowledge graph
• use URI if matched
• Step 3: Schema.org entity creation
– SPARQL script create schema representation of DCT entities
• Output individual RDF format files for loading into Knowledge graph
41
Data Ingestion Pipelines – NAS & CMS
• Source data in DC-XML files
• Step 1: DC-XML to DC-Terms RDF
– Bespoke XSLT script to translate from DC record to DCT-RDF
• Step 2: TTE lookup
– Lookup identified values against TTE entities in Knowledge graph
• use URI if matched
• Step 3: Schema.org entity creation
– SPARQL script create schema representation of DCT entities
• Output individual RDF format files for loading into Knowledge graph
42
Data Ingestion Pipelines – NAS & CMS
• Source data in DC-XML files
• Step 1: DC-XML to DC-Terms RDF
– Bespoke XSLT script to translate from DC record to DCT-RDF
• Step 2: TTE lookup
– Lookup identified values against TTE entities in Knowledge graph
• use URI if matched
• Step 3: Schema.org entity creation
– SPARQL script create schema representation of DCT entities
• Output individual RDF format files for loading into Knowledge graph
43
Technical Architecture (simplified)
Hosted on Amazon Web Services
Batch Scripts
import control
Etc.
SOURCE DATA
IMPORT
44
Technical Architecture (simplified)
Hosted on Amazon Web Services
Pipeline
processing
Batch Scripts
import control
Etc.
SOURCE DATA
IMPORT
45
Technical Architecture (simplified)
Hosted on Amazon Web Services
GraphDB
Cluster
GraphDB
Cluster
GraphDB
Cluster
GraphDB
Cluster
Pipeline
processing
Batch Scripts
import control
Etc.
SOURCE DATA
IMPORT
46
PUBLIC ACCESS
Technical Architecture (simplified)
Hosted on Amazon Web Services
DI
GraphDB
Cluster
GraphDB
Cluster
GraphDB
Cluster
GraphDB
Cluster
Pipeline
processing
Batch Scripts
import control
Etc.
SOURCE DATA
IMPORT
47
CURATOR ACCESS
PUBLIC ACCESS
Technical Architecture (simplified)
Hosted on Amazon Web Services
DI
GraphDB
Cluster
GraphDB
Cluster
GraphDB
Cluster
GraphDB
Cluster
Pipeline
processing
Batch Scripts
import control
Etc.
SOURCE DATA
IMPORT
DMI
48
A need for entity reconciliation …..
• Lots (and lots and lots) of source entities – 70.4 million entities
• Lots of duplication
– Lee, Kuan Yew – 1st Prime Minister of Singapore
• 160 individual entities in ILS source data
– Singapore Art Museum
• Entities from source data
• 21 CMS, 1 NAS, 66 ILS, 1 TTE
• Users only want 1 of each!
49
A need for entity reconciliation …..
• Lots (and lots and lots) of source entities – 70.4 million entities
• Lots of duplication
– Lee, Kuan Yew – 1st Prime Minister of Singapore
• 160 individual entities in ILS source data
– Singapore Art Museum
• Entities from source data
• 21 CMS, 1 NAS, 66 ILS, 1 TTE
• Users only want 1 of each!
50
A need for entity reconciliation …..
• Lots (and lots and lots) of source entities – 70.4 million entities
• Lots of duplication
– Lee, Kuan Yew – 1st Prime Minister of Singapore
• 160 individual entities in ILS source data
– Singapore Art Museum
• Entities from source data
• 21 CMS, 1 NAS, 66 ILS, 1 TTE
• Users only want 1 of each!
51
A need for entity reconciliation …..
• Lots (and lots and lots) of source entities – 70.4 million entities
• Lots of duplication
– Lee, Kuan Yew – 1st Prime Minister of Singapore
• 160 individual entities in ILS source data
– Singapore Art Museum
• Entities from source data
• 21 CMS, 1 NAS, 66 ILS, 1 TTE
• Users only want 1 of each!
52
70.4(million) into 6.1(million) entities does go!
• We know that now - but how did we get there?
• Requirements hurdles to be cleared:
– Not a single source of truth
– Regular automatic updates – add / update / delete
– Manual management of combined entities
• Suppression of incorrect or ’private’ attributes from display
• Addition of attributes not in source data
• Creating / breaking relationships between entities
• Near real-time updates
53
70.4(million) into 6.1(million) entities does go!
• We know that now - but how did we get there?
• Requirements hurdles to be cleared:
– Not a single source of truth
– Regular automatic updates – add / update / delete
– Manual management of combined entities
• Suppression of incorrect or ’private’ attributes from display
• Addition of attributes not in source data
• Creating / breaking relationships between entities
• Near real-time updates
54
70.4(million) into 6.1(million) entities does go!
• We know that now - but how did we get there?
• Requirements hurdles to be cleared:
– Not a single source of truth
– Regular automatic updates – add / update / delete
– Manual management of combined entities
• Suppression of incorrect or ’private’ attributes from display
• Addition of attributes not in source data
• Creating / breaking relationships between entities
• Near real-time updates
55
70.4(million) into 6.1(million) entities does go!
• We know that now - but how did we get there?
• Requirements hurdles to be cleared:
– Not a single source of truth
– Regular automatic updates – add / update / delete
– Manual management of combined entities
• Suppression of incorrect or ’private’ attributes from display
• Addition of attributes not in source data
• Creating / breaking relationships between entities
• Near real-time updates
56
70.4(million) into 6.1(million) entities does go!
• We know that now - but how did we get there?
• Requirements hurdles to be cleared:
– Not a single source of truth
– Regular automatic updates – add / update / delete
– Manual management of combined entities
• Suppression of incorrect or ’private’ attributes from display
• Addition of attributes not in source data
• Creating / breaking relationships between entities
• Near real-time updates
57
Adaptive Data Model Concepts
• Source entitles
– Individual representation of source data
• Aggregation entities
– Tracking relationships between source entities for the same thing
– No copying of attributes
• Primary Entities
– Searchable by users
– Displayable to users
– Consolidation of aggregated source data & managed attributes
58
Adaptive Data Model Concepts
• Source entitles
– Individual representation of source data
• Aggregation entities
– Tracking relationships between source entities for the same thing
– No copying of attributes
• Primary Entities
– Searchable by users
– Displayable to users
– Consolidation of aggregated source data & managed attributes
59
Adaptive Data Model Concepts
• Source entitles
– Individual representation of source data
• Aggregation entities
– Tracking relationships between source entities for the same thing
– No copying of attributes
• Primary Entities
– Searchable by users
– Displayable to users
– Consolidation of aggregated source data & managed attributes
60
61
62
63
64
65
66
67
68
69
Entity Matching
• Candidate matches based on schema:names
– Lucene indexes matching
– Levenshtein similarity refined
– Same entity type
– Entity type specific rules eg:
• Work: name + creator / contributor / author / sameAs
• Person: name + birthDate / deathDate / sameAs
70
Entity Matching
• Candidate matches based on schema:names
– Lucene indexes matching
– Levenshtein similarity refined
– Same entity type
– Entity type specific rules eg:
• Work: name + creator / contributor / author / sameAs
• Person: name + birthDate / deathDate / sameAs
71
Entity Matching
• Candidate matches based on schema:names
– Lucene indexes matching
– Levenshtein similarity refined
– Same entity type
– Entity type specific rules eg:
• Work: name + creator / contributor / author / sameAs
• Person: name + birthDate / deathDate / sameAs
72
Entity Matching
• Candidate matches based on schema:names
– Lucene indexes matching
– Levenshtein similarity refined
– Same entity type
– Entity type specific rules eg:
• Work: name + creator / contributor / author / sameAs
• Person: name + birthDate / deathDate / sameAs
73
Entity Matching
• Candidate matches based on schema:names
– Lucene indexes matching
– Levenshtein similarity refined
– Same entity type
– Entity type specific rules eg:
• Work: name + creator / contributor / author / sameAs
• Person: name + birthDate / deathDate / sameAs
74
The entity iceberg
75
The entity iceberg
Primary
76
The entity iceberg
Primary
Discovery
77
The entity iceberg
Primary
Aggregation
Discovery
78
The entity iceberg
Primary
Aggregation
Source
Ingestion
Pipelines
Discovery
79
The entity iceberg
Primary
Aggregation
Source
Ingestion
Pipelines
Discovery
Management
80
Data Management Interface
• Rapidly developed – easily modified
• Using mataphactory’s semantic visual interface
• A collaboration environment for data curators
• Pre-publish entity management
• Search & discovery – emulates DI
• Dashboard view
• Entity management / Creation
81
Data Management Interface
• Rapidly developed – easily modified
• Using mataphactory’s semantic visual interface
• A collaboration environment for data curators
• Pre-publish entity management
• Search & discovery – emulates DI
• Dashboard view
• Entity management / Creation
82
Data Management Interface
• Rapidly developed – easily modified
• Using mataphactory’s semantic visual interface
• A collaboration environment for data curators
• Pre-publish entity management
• Search & discovery – emulates DI
• Dashboard view
• Entity management / Creation
83
Data Management Interface
• Rapidly developed – easily modified
• Using mataphactory’s semantic visual interface
• A collaboration environment for data curators
• Pre-publish entity management
• Search & discovery – emulates DI
• Dashboard view
• Entity management / Creation
84
Data Management Interface
• Rapidly developed – easily modified
• Using mataphactory’s semantic visual interface
• A collaboration environment for data curators
• Pre-publish entity management
• Search & discovery – emulates DI
• Dashboard view
• Entity management / Creation
85
Data Management Interface
• Rapidly developed – easily modified
• Using mataphactory’s semantic visual interface
• A collaboration environment for data curators
• Pre-publish entity management
• Search & discovery – emulates DI
• Dashboard view
• Entity management / Creation
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
Discovery Interface (DI)
104
105
106
107
108
109
110
111
LDMS System Live – December 2022
• Live – data curation team using DMI
– Entity curation
– Manually merging & splitting aggregations
• Live – data ingest pipelines
– Daily / Weekly / Monthly
– Delta update exports from source systems
• Live – Discovery Interface ready for deployment
• Live – Support
– Issue resolution & updates
112
LDMS System Live – December 2022
• Live – data curation team using DMI
– Entity curation
– Manually merging & splitting aggregations
• Live – data ingest pipelines
– Daily / Weekly / Monthly
– Delta update exports from source systems
• Live – Discovery Interface ready for deployment
• Live – Support
– Issue resolution & updates
113
LDMS System Live – December 2022
• Live – data curation team using DMI
– Entity curation
– Manually merging & splitting aggregations
• Live – data ingest pipelines
– Daily / Weekly / Monthly
– Delta update exports from source systems
• Live – Discovery Interface ready for deployment
• Live – Support
– Issue resolution & updates
114
LDMS System Live – December 2022
• Live – data curation team using DMI
– Entity curation
– Manually merging & splitting aggregations
• Live – data ingest pipelines
– Daily / Weekly / Monthly
– Delta update exports from source systems
• Live – Discovery Interface ready for deployment
• Live – Support
– Issue resolution & updates
115
Discovery Interface
• Live and ready for deployment
– Performance, load and user tested
• Designed & developed by NLB Data Team to demonstrate
characteristics and benefits of a Linked Data service
• NLB services rolled-out by National Library & Public Library divisions
– Recently identified a service for Linked Data integration
• Currently undergoing fine tuning to meet service’s UI requirements
116
Discovery Interface
• Live and ready for deployment
– Performance, load and user tested
• Designed & developed by NLB Data Team to demonstrate
characteristics and benefits of a Linked Data service
• NLB services rolled-out by National Library & Public Library divisions
– Recently identified a service for Linked Data integration
• Currently undergoing fine tuning to meet service’s UI requirements
117
Discovery Interface
• Live and ready for deployment
– Performance, load and user tested
• Designed & developed by NLB Data Team to demonstrate
characteristics and benefits of a Linked Data service
• NLB services rolled-out by National Library & Public Library divisions
– Recently identified a service for Linked Data integration
• Currently undergoing fine tuning to meet service’s UI requirements
118
Daily Delta Ingestion & Reconciliation
119
• Inaccurate source data
– Not apparent in source systems
– Very obvious when aggregated in LDMS
• Eg. Subject and Person URIs duplicated
– Identified and analysed in DMI
– Reported to source curators and fixed
– Updated in next delta update
– Data issues identified in LDMS – source data quality enhanced
Some Challenges Addressed on the Way
120
• Inaccurate source data
– Not apparent in source systems
– Very obvious when aggregated in LDMS
• Eg. Subject and Person URIs duplicated
– Identified and analysed in DMI
– Reported to source curators and fixed
– Updated in next delta update
– Data issues identified in LDMS – source data quality enhanced
Some Challenges Addressed on the Way
121
• Asian name matching
– Often consisting of several short 3-4 letter names
– Lucene matches not of much help
– Introduction of Levenshtein similarity matching helped
– Tuning is challenging – trading off European & Asian names
Some Challenges Addressed on the Way
122
• Asian name matching
– Often consisting of several short 3-4 letter names
– Lucene matches not of much help
– Introduction of Levenshtein similarity matching helped
– Tuning is challenging – trading off European & Asian names
Some Challenges Addressed on the Way
137
From Ambition to Go Live – A Summary
• Ambition based on previous years of trials, experimentation, and understanding of
Linked Data potential
• Ambition to deliver a production LDMS to provide Linked Data curation and
management capable of delivering a public discovery service
• With experienced commercial partners & tools as part of a globally distributed team
– Benefiting from Open Source community developments where appropriate
• Unique and challenging requirements
– Distributed sources of truth in disparate data formats
– Auto updating, consolidated entity view – plus individual entity management
– Web friendly – providing structured data for search engines
• Live and operational actively delivering benefits from December 2022
138
From Ambition to Go Live – A Summary
• Ambition based on previous years of trials, experimentation, and understanding of
Linked Data potential
• Ambition to deliver a production LDMS to provide Linked Data curation and
management capable of delivering a public discovery service
• With experienced commercial partners & tools as part of a globally distributed team
– Benefiting from Open Source community developments where appropriate
• Unique and challenging requirements
– Distributed sources of truth in disparate data formats
– Auto updating, consolidated entity view – plus individual entity management
– Web friendly – providing structured data for search engines
• Live and operational actively delivering benefits from December 2022
139
From Ambition to Go Live – A Summary
• Ambition based on previous years of trials, experimentation, and understanding of
Linked Data potential
• Ambition to deliver a production LDMS to provide Linked Data curation and
management capable of delivering a public discovery service
• With experienced commercial partners & tools as part of a globally distributed team
– Benefiting from Open Source community developments where appropriate
• Unique and challenging requirements
– Distributed sources of truth in disparate data formats
– Auto updating, consolidated entity view – plus individual entity management
– Web friendly – providing structured data for search engines
• Live and operational actively delivering benefits from December 2022
140
From Ambition to Go Live – A Summary
• Ambition based on previous years of trials, experimentation, and understanding of
Linked Data potential
• Ambition to deliver a production LDMS to provide Linked Data curation and
management capable of delivering a public discovery service
• With experienced commercial partners & tools as part of a globally distributed team
– Benefiting from Open Source community developments where appropriate
• Unique and challenging requirements
– Distributed sources of truth in disparate data formats
– Auto updating, consolidated entity view – plus individual entity management
– Web friendly – providing structured data for search engines
• Live and operational actively delivering benefits from December 2022
141
From Ambition to Go Live – A Summary
• Ambition based on previous years of trials, experimentation, and understanding of
Linked Data potential
• Ambition to deliver a production LDMS to provide Linked Data curation and
management capable of delivering a public discovery service
• With experienced commercial partners & tools as part of a globally distributed team
– Benefiting from Open Source community developments where appropriate
• Unique and challenging requirements
– Distributed sources of truth in disparate data formats
– Auto updating, consolidated entity view – plus individual entity management
– Web friendly – providing structured data for search engines
• Live and operational actively delivering benefits from December 2022
From Ambition to Go Live
The National Library Board of Singapore’s Journey
to an operational Linked Data Management
and Discovery System
Richard Wallis
Evangelist and Founder
Data Liberate
richard.wallis@dataliberate.com
@dataliberate
15th Semantic Web in Libraries Conference
SWIB23 - 12th September 2023 - Berlin

More Related Content

What's hot

BẢNG KIỂM ĐẶT SONDE HẬU MÔN
BẢNG KIỂM ĐẶT SONDE HẬU MÔNBẢNG KIỂM ĐẶT SONDE HẬU MÔN
BẢNG KIỂM ĐẶT SONDE HẬU MÔNSoM
 
TIẾP CẬN CHẨN ĐOÁN TRẺ NGHI NGỜ MẮC BỆNH THẬN
TIẾP CẬN CHẨN ĐOÁN TRẺ NGHI NGỜ MẮC BỆNH THẬNTIẾP CẬN CHẨN ĐOÁN TRẺ NGHI NGỜ MẮC BỆNH THẬN
TIẾP CẬN CHẨN ĐOÁN TRẺ NGHI NGỜ MẮC BỆNH THẬNSoM
 
Bệnh án tim mạch
Bệnh án tim mạchBệnh án tim mạch
Bệnh án tim mạchVien Do
 
Khám phản xạ cam giác
Khám phản xạ cam giácKhám phản xạ cam giác
Khám phản xạ cam giácangTrnHong
 
ẢNH HƯỞNG HUYẾT ĐỘNG CỦA THÔNG KHĨ ÁP LỰC DƯƠNG
ẢNH HƯỞNG HUYẾT ĐỘNG CỦA THÔNG KHĨ ÁP LỰC DƯƠNGẢNH HƯỞNG HUYẾT ĐỘNG CỦA THÔNG KHĨ ÁP LỰC DƯƠNG
ẢNH HƯỞNG HUYẾT ĐỘNG CỦA THÔNG KHĨ ÁP LỰC DƯƠNGSoM
 
Kajian puisi mignonne allons voir si la rose
Kajian puisi mignonne allons voir si la roseKajian puisi mignonne allons voir si la rose
Kajian puisi mignonne allons voir si la roseMaulida Hannah
 

What's hot (6)

BẢNG KIỂM ĐẶT SONDE HẬU MÔN
BẢNG KIỂM ĐẶT SONDE HẬU MÔNBẢNG KIỂM ĐẶT SONDE HẬU MÔN
BẢNG KIỂM ĐẶT SONDE HẬU MÔN
 
TIẾP CẬN CHẨN ĐOÁN TRẺ NGHI NGỜ MẮC BỆNH THẬN
TIẾP CẬN CHẨN ĐOÁN TRẺ NGHI NGỜ MẮC BỆNH THẬNTIẾP CẬN CHẨN ĐOÁN TRẺ NGHI NGỜ MẮC BỆNH THẬN
TIẾP CẬN CHẨN ĐOÁN TRẺ NGHI NGỜ MẮC BỆNH THẬN
 
Bệnh án tim mạch
Bệnh án tim mạchBệnh án tim mạch
Bệnh án tim mạch
 
Khám phản xạ cam giác
Khám phản xạ cam giácKhám phản xạ cam giác
Khám phản xạ cam giác
 
ẢNH HƯỞNG HUYẾT ĐỘNG CỦA THÔNG KHĨ ÁP LỰC DƯƠNG
ẢNH HƯỞNG HUYẾT ĐỘNG CỦA THÔNG KHĨ ÁP LỰC DƯƠNGẢNH HƯỞNG HUYẾT ĐỘNG CỦA THÔNG KHĨ ÁP LỰC DƯƠNG
ẢNH HƯỞNG HUYẾT ĐỘNG CỦA THÔNG KHĨ ÁP LỰC DƯƠNG
 
Kajian puisi mignonne allons voir si la rose
Kajian puisi mignonne allons voir si la roseKajian puisi mignonne allons voir si la rose
Kajian puisi mignonne allons voir si la rose
 

Similar to From Ambition to Go Live

Contextual Computing: Laying a Global Data Foundation
Contextual Computing: Laying a Global Data FoundationContextual Computing: Laying a Global Data Foundation
Contextual Computing: Laying a Global Data FoundationRichard Wallis
 
Contextual Computing - Knowledge Graphs & Web of Entities
Contextual Computing - Knowledge Graphs & Web of EntitiesContextual Computing - Knowledge Graphs & Web of Entities
Contextual Computing - Knowledge Graphs & Web of EntitiesRichard Wallis
 
Schema.org: Where did that come from!
Schema.org: Where did that come from!Schema.org: Where did that come from!
Schema.org: Where did that come from!Richard Wallis
 
CILIP Conference - x metadata evolution the final mile - Richard Wallis
CILIP Conference - x metadata evolution the final mile - Richard WallisCILIP Conference - x metadata evolution the final mile - Richard Wallis
CILIP Conference - x metadata evolution the final mile - Richard WallisCILIP
 
Marc and beyond: 3 Linked Data Choices
 Marc and beyond: 3 Linked Data Choices  Marc and beyond: 3 Linked Data Choices
Marc and beyond: 3 Linked Data Choices Richard Wallis
 
LD4L OCLC Data Strategy
LD4L OCLC Data StrategyLD4L OCLC Data Strategy
LD4L OCLC Data StrategyRichard Wallis
 
Linked Data: from Library Entities to the Web of Data
Linked Data: from Library Entities to the Web of DataLinked Data: from Library Entities to the Web of Data
Linked Data: from Library Entities to the Web of DataRichard Wallis
 
Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...
Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...
Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...Marcus Smith
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked DataEUCLID project
 
Telling the World and Our Users What We Have
Telling the World and Our Users What We HaveTelling the World and Our Users What We Have
Telling the World and Our Users What We HaveRichard Wallis
 
Three Linked Data choices for Libraries
Three Linked Data choices for LibrariesThree Linked Data choices for Libraries
Three Linked Data choices for LibrariesRichard Wallis
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked DataMarin Dimitrov
 
WorldCat, Works, and Schema.org
WorldCat, Works, and Schema.orgWorldCat, Works, and Schema.org
WorldCat, Works, and Schema.orgRichard Wallis
 
Schema.org where did that come from?
Schema.org where did that come from?Schema.org where did that come from?
Schema.org where did that come from?Richard Wallis
 
Entification: The Route to 'Useful' Library Data
Entification: The Route to 'Useful' Library DataEntification: The Route to 'Useful' Library Data
Entification: The Route to 'Useful' Library DataRichard Wallis
 
The Web of Data is Our Opportunity
The Web of Data is Our OpportunityThe Web of Data is Our Opportunity
The Web of Data is Our OpportunityRichard Wallis
 
Introduction to APIs and Linked Data
Introduction to APIs and Linked DataIntroduction to APIs and Linked Data
Introduction to APIs and Linked DataAdrian Stevenson
 
Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...
Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...
Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...ariadnenetwork
 
Structured Data: It's All about the Graph | Richard Wallis, Data Liberate
Structured Data: It's All about the Graph | Richard Wallis, Data LiberateStructured Data: It's All about the Graph | Richard Wallis, Data Liberate
Structured Data: It's All about the Graph | Richard Wallis, Data LiberateClick Consult (Part of Ceuta Group)
 

Similar to From Ambition to Go Live (20)

Contextual Computing: Laying a Global Data Foundation
Contextual Computing: Laying a Global Data FoundationContextual Computing: Laying a Global Data Foundation
Contextual Computing: Laying a Global Data Foundation
 
Contextual Computing - Knowledge Graphs & Web of Entities
Contextual Computing - Knowledge Graphs & Web of EntitiesContextual Computing - Knowledge Graphs & Web of Entities
Contextual Computing - Knowledge Graphs & Web of Entities
 
Schema.org: Where did that come from!
Schema.org: Where did that come from!Schema.org: Where did that come from!
Schema.org: Where did that come from!
 
CILIP Conference - x metadata evolution the final mile - Richard Wallis
CILIP Conference - x metadata evolution the final mile - Richard WallisCILIP Conference - x metadata evolution the final mile - Richard Wallis
CILIP Conference - x metadata evolution the final mile - Richard Wallis
 
Marc and beyond: 3 Linked Data Choices
 Marc and beyond: 3 Linked Data Choices  Marc and beyond: 3 Linked Data Choices
Marc and beyond: 3 Linked Data Choices
 
LD4L OCLC Data Strategy
LD4L OCLC Data StrategyLD4L OCLC Data Strategy
LD4L OCLC Data Strategy
 
Linked Data: from Library Entities to the Web of Data
Linked Data: from Library Entities to the Web of DataLinked Data: from Library Entities to the Web of Data
Linked Data: from Library Entities to the Web of Data
 
Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...
Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...
Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Telling the World and Our Users What We Have
Telling the World and Our Users What We HaveTelling the World and Our Users What We Have
Telling the World and Our Users What We Have
 
Three Linked Data choices for Libraries
Three Linked Data choices for LibrariesThree Linked Data choices for Libraries
Three Linked Data choices for Libraries
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
WorldCat, Works, and Schema.org
WorldCat, Works, and Schema.orgWorldCat, Works, and Schema.org
WorldCat, Works, and Schema.org
 
Schema.org where did that come from?
Schema.org where did that come from?Schema.org where did that come from?
Schema.org where did that come from?
 
NISO Webinar: Library Linked Data: From Vision to Reality
NISO Webinar: Library Linked Data: From Vision to RealityNISO Webinar: Library Linked Data: From Vision to Reality
NISO Webinar: Library Linked Data: From Vision to Reality
 
Entification: The Route to 'Useful' Library Data
Entification: The Route to 'Useful' Library DataEntification: The Route to 'Useful' Library Data
Entification: The Route to 'Useful' Library Data
 
The Web of Data is Our Opportunity
The Web of Data is Our OpportunityThe Web of Data is Our Opportunity
The Web of Data is Our Opportunity
 
Introduction to APIs and Linked Data
Introduction to APIs and Linked DataIntroduction to APIs and Linked Data
Introduction to APIs and Linked Data
 
Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...
Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...
Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...
 
Structured Data: It's All about the Graph | Richard Wallis, Data Liberate
Structured Data: It's All about the Graph | Richard Wallis, Data LiberateStructured Data: It's All about the Graph | Richard Wallis, Data Liberate
Structured Data: It's All about the Graph | Richard Wallis, Data Liberate
 

More from Richard Wallis

Structured Data: It's All About the Graph!
Structured Data: It's All About the Graph!Structured Data: It's All About the Graph!
Structured Data: It's All About the Graph!Richard Wallis
 
Schema.org Structured data the What, Why, & How
Schema.org Structured data the What, Why, & HowSchema.org Structured data the What, Why, & How
Schema.org Structured data the What, Why, & HowRichard Wallis
 
Structured data: Where did that come from & why are Google asking for it
Structured data: Where did that come from & why are Google asking for itStructured data: Where did that come from & why are Google asking for it
Structured data: Where did that come from & why are Google asking for itRichard Wallis
 
Schema.org - An Extending Influence
Schema.org - An Extending InfluenceSchema.org - An Extending Influence
Schema.org - An Extending InfluenceRichard Wallis
 
Schema.org - Extending Benefits
Schema.org - Extending BenefitsSchema.org - Extending Benefits
Schema.org - Extending BenefitsRichard Wallis
 
Identifying The Benefit of Linked Data
Identifying The Benefit of Linked DataIdentifying The Benefit of Linked Data
Identifying The Benefit of Linked DataRichard Wallis
 
Web Driven Revolution For Library Data
Web Driven Revolution For Library DataWeb Driven Revolution For Library Data
Web Driven Revolution For Library DataRichard Wallis
 
The Web of Data is Our Oyster
The Web of Data is Our OysterThe Web of Data is Our Oyster
The Web of Data is Our OysterRichard Wallis
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in LibrariesRichard Wallis
 
Schema.org: What It Means For You and Your Library
Schema.org: What It Means For You and Your LibrarySchema.org: What It Means For You and Your Library
Schema.org: What It Means For You and Your LibraryRichard Wallis
 
Designing Linked Data Software & Services for Libraries
Designing Linked Data Software & Services for LibrariesDesigning Linked Data Software & Services for Libraries
Designing Linked Data Software & Services for LibrariesRichard Wallis
 
The Power of Sharing Linked Data: Bibliothekartag 2014
The Power of Sharing Linked Data: Bibliothekartag 2014The Power of Sharing Linked Data: Bibliothekartag 2014
The Power of Sharing Linked Data: Bibliothekartag 2014Richard Wallis
 
The Power of Sharing Linked Data - ELAG 2014 Workshop
The Power of Sharing Linked Data - ELAG 2014 WorkshopThe Power of Sharing Linked Data - ELAG 2014 Workshop
The Power of Sharing Linked Data - ELAG 2014 WorkshopRichard Wallis
 
The Simple Power of the Link - ELAG 2014 Workshop
The Simple Power of the Link - ELAG 2014 WorkshopThe Simple Power of the Link - ELAG 2014 Workshop
The Simple Power of the Link - ELAG 2014 WorkshopRichard Wallis
 
Why schema.org for Libraries
Why schema.org for LibrariesWhy schema.org for Libraries
Why schema.org for LibrariesRichard Wallis
 

More from Richard Wallis (18)

Structured Data: It's All About the Graph!
Structured Data: It's All About the Graph!Structured Data: It's All About the Graph!
Structured Data: It's All About the Graph!
 
Schema.org Structured data the What, Why, & How
Schema.org Structured data the What, Why, & HowSchema.org Structured data the What, Why, & How
Schema.org Structured data the What, Why, & How
 
Structured data: Where did that come from & why are Google asking for it
Structured data: Where did that come from & why are Google asking for itStructured data: Where did that come from & why are Google asking for it
Structured data: Where did that come from & why are Google asking for it
 
FIBO & Schema.org
FIBO & Schema.orgFIBO & Schema.org
FIBO & Schema.org
 
Schema.org - An Extending Influence
Schema.org - An Extending InfluenceSchema.org - An Extending Influence
Schema.org - An Extending Influence
 
Schema.org - Extending Benefits
Schema.org - Extending BenefitsSchema.org - Extending Benefits
Schema.org - Extending Benefits
 
Identifying The Benefit of Linked Data
Identifying The Benefit of Linked DataIdentifying The Benefit of Linked Data
Identifying The Benefit of Linked Data
 
Web Driven Revolution For Library Data
Web Driven Revolution For Library DataWeb Driven Revolution For Library Data
Web Driven Revolution For Library Data
 
The Web of Data is Our Oyster
The Web of Data is Our OysterThe Web of Data is Our Oyster
The Web of Data is Our Oyster
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in Libraries
 
Links and Entities
Links and EntitiesLinks and Entities
Links and Entities
 
Schema.org: What It Means For You and Your Library
Schema.org: What It Means For You and Your LibrarySchema.org: What It Means For You and Your Library
Schema.org: What It Means For You and Your Library
 
Extending Schema.org
Extending Schema.orgExtending Schema.org
Extending Schema.org
 
Designing Linked Data Software & Services for Libraries
Designing Linked Data Software & Services for LibrariesDesigning Linked Data Software & Services for Libraries
Designing Linked Data Software & Services for Libraries
 
The Power of Sharing Linked Data: Bibliothekartag 2014
The Power of Sharing Linked Data: Bibliothekartag 2014The Power of Sharing Linked Data: Bibliothekartag 2014
The Power of Sharing Linked Data: Bibliothekartag 2014
 
The Power of Sharing Linked Data - ELAG 2014 Workshop
The Power of Sharing Linked Data - ELAG 2014 WorkshopThe Power of Sharing Linked Data - ELAG 2014 Workshop
The Power of Sharing Linked Data - ELAG 2014 Workshop
 
The Simple Power of the Link - ELAG 2014 Workshop
The Simple Power of the Link - ELAG 2014 WorkshopThe Simple Power of the Link - ELAG 2014 Workshop
The Simple Power of the Link - ELAG 2014 Workshop
 
Why schema.org for Libraries
Why schema.org for LibrariesWhy schema.org for Libraries
Why schema.org for Libraries
 

Recently uploaded

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 

Recently uploaded (20)

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 

From Ambition to Go Live

  • 1. From Ambition to Go Live The National Library Board of Singapore’s Journey to an operational Linked Data Management and Discovery System Richard Wallis Evangelist and Founder Data Liberate richard.wallis@dataliberate.com @dataliberate 15th Semantic Web in Libraries Conference SWIB23 - 12th September 2023 - Berlin
  • 2. Independent Consultant, Evangelist & Founder W3C Community Groups: • Bibframe2Schema (Chair) – Standardised conversion path(s) • Schema Bib Extend (Chair) - Bibliographic data • Schema Architypes (Chair) - Archives • Financial Industry Business Ontology – Financial schema.org • Tourism Structured Web Data (Co-Chair) • Schema Course Extension • Schema IoT Community • Educational & Occupational Credentials in Schema.org richard.wallis@dataliberate.com — @dataliberate 40+ Years – Computing 30+ Years – Cultural Heritage technology 20+ Years – Semantic Web & Linked Data Worked With: • Google – Schema.org vocabulary, site, extensions. documentation and community • OCLC – Global library cooperative • FIBO – Financial Industry Business Ontology Group • Various Clients – Implementing/understanding Linked Data, Schema.org: National Library Board Singapore British Library — Stanford University — Europeana 2
  • 3. 3 National Library Board Singapore Public Libraries Network of 28 Public Libraries, including 2 partner libraries* Reading Programmes and Initiatives Programmes and Exhibitions targeted at Singapore communities *Partner libraries are libraries which are partner owned and funded but managed by NLB/NLB’s subsidiary Libraries and Archives Solutions Pte Ltd. Library@Chinatown and the Lifelong Learning Institute Library are Partner libraries. National Archives Transferred from NHB to NLB in Nov 2012 Custodian of Singapore’s Collective Memory: Responsible for Collection, Preservation and Management of Singapore’s Public and Private Archival Records Promotes Public Interest in our Nation’s History and Heritage National Library Preserving Singapore’s Print and Literary Heritage, and Intellectual memory Reference Collections Legal Deposit (including electronic)
  • 4. 4 Over 560,000 Singapore & SEA items Over 147,000 Chinese, Malay & Tamil Languages items Reference Collection Over 62,000 Social Sciences & Humanities items Over 39,000 Science & Technology items Over 53,000 Arts items Over 19,000 Rare Materials items Archival Materials Over 290,000 Government files & Parliament papers Over 190,000 Audiovisual & sound recordings Over 70,000 Maps & building plans Over 1.14m Photographs Over 35,000 Oral history interviews Over 55,000 Speeches & press releases Over 7,000 Posters National Library Board Singapore Over 5m print collection Over 2.4m music tracks 78 databases Over 7,400 e-newspapers and e-magazines titles Over 8,000 e-learning courses Over 1.7m e-books and audio books Lending Collection
  • 5. 5 National Library Board Online Services
  • 6. 7 The Ambition • To enable the discovery & display of entitles from different sources in a combined interface • To bring together resources physical and digital • To bring together diverse systems across the National Library, National Archives, and Public Libraries in a Linked Data Environment • To provide a staff interface to view and manage all entities, their descriptions and relationships
  • 7. 8 The Ambition • To enable the discovery & display of entitles from different sources in a combined interface • To bring together resources physical and digital • To bring together diverse systems across the National Library, National Archives, and Public Libraries in a Linked Data Environment • To provide a staff interface to view and manage all entities, their descriptions and relationships
  • 8. 9 The Ambition • To enable the discovery & display of entitles from different sources in a combined interface • To bring together resources physical and digital • To bring together diverse systems across the National Library, National Archives, and Public Libraries in a Linked Data Environment • To provide a staff interface to view and manage all entities, their descriptions and relationships
  • 9. 10 The Ambition • To enable the discovery & display of entitles from different sources in a combined interface • To bring together resources physical and digital • To bring together diverse systems across the National Library, National Archives, and Public Libraries in a Linked Data Environment • To provide a staff interface to view and manage all entities, their descriptions and relationships
  • 10. 11 The Journey Starts(March 2021) A Linked Data Management System (LDMS) • Cloud-based System • Management Interface • Source data curated at source – ILS, Authorities, CMS, NAS • Linked Data • Public Interface – Google friendly – Schema.org • Shared standards – Bibframe, Schema.org
  • 11. 12 The Journey Starts(March 2021) A Linked Data Management System (LDMS) • Cloud-based System • Management Interface • Source data curated at source – ILS, Authorities, CMS, NAS • Linked Data • Public Interface – Google friendly – Schema.org • Shared standards – Bibframe, Schema.org
  • 12. 13 The Journey Starts(March 2021) A Linked Data Management System (LDMS) • Cloud-based System • Management Interface • Source data curated at source – ILS, Authorities, CMS, NAS • Linked Data • Public Interface – Google friendly – Schema.org • Shared standards – Bibframe, Schema.org
  • 13. 14 The Journey Starts(March 2021) A Linked Data Management System (LDMS) • Cloud-based System • Management Interface • Source data curated at source – ILS, Authorities, CMS, NAS • Linked Data • Public Interface – Google friendly – Schema.org • Shared standards – Bibframe, Schema.org
  • 14. 15 The Journey Starts(March 2021) A Linked Data Management System (LDMS) • Cloud-based System • Management Interface • Source data curated at source – ILS, Authorities, CMS, NAS • Linked Data • Public Interface – Google friendly – Schema.org • Shared standards – Bibframe, Schema.org
  • 15. 16 The Journey Starts(March 2021) A Linked Data Management System (LDMS) • Cloud-based System • Management Interface • Source data curated at source – ILS, Authorities, CMS, NAS • Linked Data • Public Interface – Google friendly – Schema.org • Shared standards – Bibframe, Schema.org
  • 16. 17 The Journey Starts(March 2021) A Linked Data Management System (LDMS) • Cloud-based System • Management Interface • Source data curated at source – ILS, Authorities, CMS, NAS • Linked Data • Public Interface – Google friendly – Schema.org • Shared standards – Bibframe, Schema.org
  • 17. 18 Contract Awarded metaphactory platform Low-code knowledge graph platform Semantic knowledge modeling Semantic search & discovery AWS Partner Public sector partner Singapore based Linked Data, Structured data, Semantic Web, bibliographic meta data, Schema.org and management systems consultant
  • 18. 19 Initial Challenges • Data – lots of it! – Record-based formats: MARC-XML, DC-XML, CSV • Entities described in the data – lots and lots! • Regular updates – daily, weekly, monthly • Automatic ingestion • Not the single source of truth • Manual update capability • Entity-based search & discovery • Web friendly – Schema.org output
  • 19. 20 Initial Challenges • Data – lots of it! – Record-based formats: MARC-XML, DC-XML, CSV • Entities described in the data – lots and lots! • Regular updates – daily, weekly, monthly • Automatic ingestion • Not the single source of truth • Manual update capability • Entity-based search & discovery • Web friendly – Schema.org output
  • 20. 21 Initial Challenges • Data – lots of it! – Record-based formats: MARC-XML, DC-XML, CSV • Entities described in the data – lots and lots! • Regular updates – daily, weekly, monthly • Automatic ingestion • Not the single source of truth • Manual update capability • Entity-based search & discovery • Web friendly – Schema.org output
  • 21. 22 Initial Challenges • Data – lots of it! – Record-based formats: MARC-XML, DC-XML, CSV • Entities described in the data – lots and lots! • Regular updates – daily, weekly, monthly • Automatic ingestion • Not the single source of truth • Manual update capability • Entity-based search & discovery • Web friendly – Schema.org output
  • 22. 23 Initial Challenges • Data – lots of it! – Record-based formats: MARC-XML, DC-XML, CSV • Entities described in the data – lots and lots! • Regular updates – daily, weekly, monthly • Automatic ingestion • Not the single source of truth • Manual update capability • Entity-based search & discovery • Web friendly – Schema.org output
  • 23. 24 Initial Challenges • Data – lots of it! – Record-based formats: MARC-XML, DC-XML, CSV • Entities described in the data – lots and lots! • Regular updates – daily, weekly, monthly • Automatic ingestion • Not the single source of truth • Manual update capability • Entity-based search & discovery • Web friendly – Schema.org output
  • 24. 25 Initial Challenges • Data – lots of it! – Record-based formats: MARC-XML, DC-XML, CSV • Entities described in the data – lots and lots! • Regular updates – daily, weekly, monthly • Automatic ingestion • Not the single source of truth • Manual update capability • Entity-based search & discovery • Web friendly – Schema.org output
  • 25. 26 Initial Challenges • Data – lots of it! – Record-based formats: MARC-XML, DC-XML, CSV • Entities described in the data – lots and lots! • Regular updates – daily, weekly, monthly • Automatic ingestion • Not the single source of truth • Manual update capability • Entity-based search & discovery • Web friendly – Schema.org output
  • 26. 27 Basic Data Model • Linked Data – BIBFRAME to capture detail of bibliographic records – Schema.org to deliver structured data for search engines – Schema.org representation of CMS, NAS, TTE data – Schema.org enrichment of BIBFRAME • Schema.org as the ‘lingua franca’ vocabulary of the Knowledge graph – All entities described using Schema.org as a minimum.
  • 27. 28 Basic Data Model • Linked Data – BIBFRAME to capture detail of bibliographic records – Schema.org to deliver structured data for search engines – Schema.org representation of CMS, NAS, TTE data – Schema.org enrichment of BIBFRAME • Schema.org as the ‘lingua franca’ vocabulary of the Knowledge graph – All entities described using Schema.org as a minimum.
  • 28. 29 Basic Data Model • Linked Data – BIBFRAME to capture detail of bibliographic records – Schema.org to deliver structured data for search engines – Schema.org representation of CMS, NAS, TTE data – Schema.org enrichment of BIBFRAME • Schema.org as the ‘lingua franca’ vocabulary of the Knowledge graph – All entities described using Schema.org as a minimum.
  • 29. 30 Basic Data Model • Linked Data – BIBFRAME to capture detail of bibliographic records – Schema.org to deliver structured data for search engines – Schema.org representation of CMS, NAS, TTE data – Schema.org enrichment of BIBFRAME • Schema.org as the ‘lingua franca’ vocabulary of the Knowledge graph – All entities described using Schema.org as a minimum.
  • 30. 31 Basic Data Model • Linked Data – BIBFRAME to capture detail of bibliographic records – Schema.org to deliver structured data for search engines – Schema.org representation of CMS, NAS, TTE data – Schema.org enrichment of BIBFRAME • Schema.org as the ‘lingua franca’ vocabulary of the Knowledge graph – All entities described using Schema.org as a minimum.
  • 31. 32 Data Data Data! Data Source Source Records Entity Count Update Frequency ILS 1.4m 56.8m Daily CMS 82k 228k Weekly NAS 1.6m 6.7m Monthly TTE 3k 317k Monthly 3.1m 70.4m
  • 32. 33 Data Ingest Pipelines – ILS • Source data in MARC-XML files • Step 1: marc2bibframe2 scripts – Open source – shared by Library of Congress – “standard” approach – Output BIBFRAME as individual RDF-XML files • Step 2: bibframe2schema.org script – Open source – Bibframe2Schema.org Community Group – SPARQL-based script – Output Schema.org enriched individual BIBFRAME RDF files for loading into Knowledge graph triplestore
  • 33. 34 Data Ingest Pipelines – ILS • Source data in MARC-XML files • Step 1: marc2bibframe2 scripts – Open source – shared by Library of Congress – “standard” approach – Output BIBFRAME as individual RDF-XML files • Step 2: bibframe2schema.org script – Open source – Bibframe2Schema.org Community Group – SPARQL-based script – Output Schema.org enriched individual BIBFRAME RDF files for loading into Knowledge graph triplestore
  • 34. 35 Data Ingest Pipelines – ILS • Source data in MARC-XML files • Step 1: marc2bibframe2 scripts – Open source – shared by Library of Congress – “standard” approach – Output BIBFRAME as individual RDF-XML files • Step 2: bibframe2schema.org script – Open source – Bibframe2Schema.org Community Group – SPARQL-based script – Output Schema.org enriched individual BIBFRAME RDF files for loading into Knowledge graph triplestore
  • 35. 36 Data Ingestion Pipelines – TTE (Authorities) • Source data in bespoke CSV format • Exported in collection-based files – People, Places, organisations, time periods, etc. • Interrelated references so needed to be considered as a whole • Bespoke python script • Creating Schema.org entity descriptions • Output RDF format file for loading into Knowledge graph triplestore
  • 36. 37 Data Ingestion Pipelines – TTE (Authorities) • Source data in bespoke CSV format • Exported in collection-based files – People, Places, organisations, time periods, etc. • Interrelated references so needed to be considered as a whole • Bespoke python script • Creating Schema.org entity descriptions • Output RDF format file for loading into Knowledge graph triplestore
  • 37. 38 Data Ingestion Pipelines – TTE (Authorities) • Source data in bespoke CSV format • Exported in collection-based files – People, Places, organisations, time periods, etc. • Interrelated references so needed to be considered as a whole • Bespoke python script • Creating Schema.org entity descriptions • Output RDF format file for loading into Knowledge graph triplestore
  • 38. 39 Data Ingestion Pipelines – NAS & CMS • Source data in DC-XML files • Step 1: DC-XML to DC-Terms RDF – Bespoke XSLT script to translate from DC record to DCT-RDF • Step 2: TTE lookup – Lookup identified values against TTE entities in Knowledge graph • use URI if matched • Step 3: Schema.org entity creation – SPARQL script create schema representation of DCT entities • Output individual RDF format files for loading into Knowledge graph
  • 39. 40 Data Ingestion Pipelines – NAS & CMS • Source data in DC-XML files • Step 1: DC-XML to DC-Terms RDF – Bespoke XSLT script to translate from DC record to DCT-RDF • Step 2: TTE lookup – Lookup identified values against TTE entities in Knowledge graph • use URI if matched • Step 3: Schema.org entity creation – SPARQL script create schema representation of DCT entities • Output individual RDF format files for loading into Knowledge graph
  • 40. 41 Data Ingestion Pipelines – NAS & CMS • Source data in DC-XML files • Step 1: DC-XML to DC-Terms RDF – Bespoke XSLT script to translate from DC record to DCT-RDF • Step 2: TTE lookup – Lookup identified values against TTE entities in Knowledge graph • use URI if matched • Step 3: Schema.org entity creation – SPARQL script create schema representation of DCT entities • Output individual RDF format files for loading into Knowledge graph
  • 41. 42 Data Ingestion Pipelines – NAS & CMS • Source data in DC-XML files • Step 1: DC-XML to DC-Terms RDF – Bespoke XSLT script to translate from DC record to DCT-RDF • Step 2: TTE lookup – Lookup identified values against TTE entities in Knowledge graph • use URI if matched • Step 3: Schema.org entity creation – SPARQL script create schema representation of DCT entities • Output individual RDF format files for loading into Knowledge graph
  • 42. 43 Technical Architecture (simplified) Hosted on Amazon Web Services Batch Scripts import control Etc. SOURCE DATA IMPORT
  • 43. 44 Technical Architecture (simplified) Hosted on Amazon Web Services Pipeline processing Batch Scripts import control Etc. SOURCE DATA IMPORT
  • 44. 45 Technical Architecture (simplified) Hosted on Amazon Web Services GraphDB Cluster GraphDB Cluster GraphDB Cluster GraphDB Cluster Pipeline processing Batch Scripts import control Etc. SOURCE DATA IMPORT
  • 45. 46 PUBLIC ACCESS Technical Architecture (simplified) Hosted on Amazon Web Services DI GraphDB Cluster GraphDB Cluster GraphDB Cluster GraphDB Cluster Pipeline processing Batch Scripts import control Etc. SOURCE DATA IMPORT
  • 46. 47 CURATOR ACCESS PUBLIC ACCESS Technical Architecture (simplified) Hosted on Amazon Web Services DI GraphDB Cluster GraphDB Cluster GraphDB Cluster GraphDB Cluster Pipeline processing Batch Scripts import control Etc. SOURCE DATA IMPORT DMI
  • 47. 48 A need for entity reconciliation ….. • Lots (and lots and lots) of source entities – 70.4 million entities • Lots of duplication – Lee, Kuan Yew – 1st Prime Minister of Singapore • 160 individual entities in ILS source data – Singapore Art Museum • Entities from source data • 21 CMS, 1 NAS, 66 ILS, 1 TTE • Users only want 1 of each!
  • 48. 49 A need for entity reconciliation ….. • Lots (and lots and lots) of source entities – 70.4 million entities • Lots of duplication – Lee, Kuan Yew – 1st Prime Minister of Singapore • 160 individual entities in ILS source data – Singapore Art Museum • Entities from source data • 21 CMS, 1 NAS, 66 ILS, 1 TTE • Users only want 1 of each!
  • 49. 50 A need for entity reconciliation ….. • Lots (and lots and lots) of source entities – 70.4 million entities • Lots of duplication – Lee, Kuan Yew – 1st Prime Minister of Singapore • 160 individual entities in ILS source data – Singapore Art Museum • Entities from source data • 21 CMS, 1 NAS, 66 ILS, 1 TTE • Users only want 1 of each!
  • 50. 51 A need for entity reconciliation ….. • Lots (and lots and lots) of source entities – 70.4 million entities • Lots of duplication – Lee, Kuan Yew – 1st Prime Minister of Singapore • 160 individual entities in ILS source data – Singapore Art Museum • Entities from source data • 21 CMS, 1 NAS, 66 ILS, 1 TTE • Users only want 1 of each!
  • 51. 52 70.4(million) into 6.1(million) entities does go! • We know that now - but how did we get there? • Requirements hurdles to be cleared: – Not a single source of truth – Regular automatic updates – add / update / delete – Manual management of combined entities • Suppression of incorrect or ’private’ attributes from display • Addition of attributes not in source data • Creating / breaking relationships between entities • Near real-time updates
  • 52. 53 70.4(million) into 6.1(million) entities does go! • We know that now - but how did we get there? • Requirements hurdles to be cleared: – Not a single source of truth – Regular automatic updates – add / update / delete – Manual management of combined entities • Suppression of incorrect or ’private’ attributes from display • Addition of attributes not in source data • Creating / breaking relationships between entities • Near real-time updates
  • 53. 54 70.4(million) into 6.1(million) entities does go! • We know that now - but how did we get there? • Requirements hurdles to be cleared: – Not a single source of truth – Regular automatic updates – add / update / delete – Manual management of combined entities • Suppression of incorrect or ’private’ attributes from display • Addition of attributes not in source data • Creating / breaking relationships between entities • Near real-time updates
  • 54. 55 70.4(million) into 6.1(million) entities does go! • We know that now - but how did we get there? • Requirements hurdles to be cleared: – Not a single source of truth – Regular automatic updates – add / update / delete – Manual management of combined entities • Suppression of incorrect or ’private’ attributes from display • Addition of attributes not in source data • Creating / breaking relationships between entities • Near real-time updates
  • 55. 56 70.4(million) into 6.1(million) entities does go! • We know that now - but how did we get there? • Requirements hurdles to be cleared: – Not a single source of truth – Regular automatic updates – add / update / delete – Manual management of combined entities • Suppression of incorrect or ’private’ attributes from display • Addition of attributes not in source data • Creating / breaking relationships between entities • Near real-time updates
  • 56. 57 Adaptive Data Model Concepts • Source entitles – Individual representation of source data • Aggregation entities – Tracking relationships between source entities for the same thing – No copying of attributes • Primary Entities – Searchable by users – Displayable to users – Consolidation of aggregated source data & managed attributes
  • 57. 58 Adaptive Data Model Concepts • Source entitles – Individual representation of source data • Aggregation entities – Tracking relationships between source entities for the same thing – No copying of attributes • Primary Entities – Searchable by users – Displayable to users – Consolidation of aggregated source data & managed attributes
  • 58. 59 Adaptive Data Model Concepts • Source entitles – Individual representation of source data • Aggregation entities – Tracking relationships between source entities for the same thing – No copying of attributes • Primary Entities – Searchable by users – Displayable to users – Consolidation of aggregated source data & managed attributes
  • 59. 60
  • 60. 61
  • 61. 62
  • 62. 63
  • 63. 64
  • 64. 65
  • 65. 66
  • 66. 67
  • 67. 68
  • 68. 69 Entity Matching • Candidate matches based on schema:names – Lucene indexes matching – Levenshtein similarity refined – Same entity type – Entity type specific rules eg: • Work: name + creator / contributor / author / sameAs • Person: name + birthDate / deathDate / sameAs
  • 69. 70 Entity Matching • Candidate matches based on schema:names – Lucene indexes matching – Levenshtein similarity refined – Same entity type – Entity type specific rules eg: • Work: name + creator / contributor / author / sameAs • Person: name + birthDate / deathDate / sameAs
  • 70. 71 Entity Matching • Candidate matches based on schema:names – Lucene indexes matching – Levenshtein similarity refined – Same entity type – Entity type specific rules eg: • Work: name + creator / contributor / author / sameAs • Person: name + birthDate / deathDate / sameAs
  • 71. 72 Entity Matching • Candidate matches based on schema:names – Lucene indexes matching – Levenshtein similarity refined – Same entity type – Entity type specific rules eg: • Work: name + creator / contributor / author / sameAs • Person: name + birthDate / deathDate / sameAs
  • 72. 73 Entity Matching • Candidate matches based on schema:names – Lucene indexes matching – Levenshtein similarity refined – Same entity type – Entity type specific rules eg: • Work: name + creator / contributor / author / sameAs • Person: name + birthDate / deathDate / sameAs
  • 79. 80 Data Management Interface • Rapidly developed – easily modified • Using mataphactory’s semantic visual interface • A collaboration environment for data curators • Pre-publish entity management • Search & discovery – emulates DI • Dashboard view • Entity management / Creation
  • 80. 81 Data Management Interface • Rapidly developed – easily modified • Using mataphactory’s semantic visual interface • A collaboration environment for data curators • Pre-publish entity management • Search & discovery – emulates DI • Dashboard view • Entity management / Creation
  • 81. 82 Data Management Interface • Rapidly developed – easily modified • Using mataphactory’s semantic visual interface • A collaboration environment for data curators • Pre-publish entity management • Search & discovery – emulates DI • Dashboard view • Entity management / Creation
  • 82. 83 Data Management Interface • Rapidly developed – easily modified • Using mataphactory’s semantic visual interface • A collaboration environment for data curators • Pre-publish entity management • Search & discovery – emulates DI • Dashboard view • Entity management / Creation
  • 83. 84 Data Management Interface • Rapidly developed – easily modified • Using mataphactory’s semantic visual interface • A collaboration environment for data curators • Pre-publish entity management • Search & discovery – emulates DI • Dashboard view • Entity management / Creation
  • 84. 85 Data Management Interface • Rapidly developed – easily modified • Using mataphactory’s semantic visual interface • A collaboration environment for data curators • Pre-publish entity management • Search & discovery – emulates DI • Dashboard view • Entity management / Creation
  • 85. 86
  • 86. 87
  • 87. 88
  • 88. 89
  • 89. 90
  • 90. 91
  • 91. 92
  • 92. 93
  • 93. 94
  • 94. 95
  • 95. 96
  • 96. 97
  • 97. 98
  • 98. 99
  • 99. 100
  • 100. 101
  • 101. 102
  • 103. 104
  • 104. 105
  • 105. 106
  • 106. 107
  • 107. 108
  • 108. 109
  • 109. 110
  • 110. 111 LDMS System Live – December 2022 • Live – data curation team using DMI – Entity curation – Manually merging & splitting aggregations • Live – data ingest pipelines – Daily / Weekly / Monthly – Delta update exports from source systems • Live – Discovery Interface ready for deployment • Live – Support – Issue resolution & updates
  • 111. 112 LDMS System Live – December 2022 • Live – data curation team using DMI – Entity curation – Manually merging & splitting aggregations • Live – data ingest pipelines – Daily / Weekly / Monthly – Delta update exports from source systems • Live – Discovery Interface ready for deployment • Live – Support – Issue resolution & updates
  • 112. 113 LDMS System Live – December 2022 • Live – data curation team using DMI – Entity curation – Manually merging & splitting aggregations • Live – data ingest pipelines – Daily / Weekly / Monthly – Delta update exports from source systems • Live – Discovery Interface ready for deployment • Live – Support – Issue resolution & updates
  • 113. 114 LDMS System Live – December 2022 • Live – data curation team using DMI – Entity curation – Manually merging & splitting aggregations • Live – data ingest pipelines – Daily / Weekly / Monthly – Delta update exports from source systems • Live – Discovery Interface ready for deployment • Live – Support – Issue resolution & updates
  • 114. 115 Discovery Interface • Live and ready for deployment – Performance, load and user tested • Designed & developed by NLB Data Team to demonstrate characteristics and benefits of a Linked Data service • NLB services rolled-out by National Library & Public Library divisions – Recently identified a service for Linked Data integration • Currently undergoing fine tuning to meet service’s UI requirements
  • 115. 116 Discovery Interface • Live and ready for deployment – Performance, load and user tested • Designed & developed by NLB Data Team to demonstrate characteristics and benefits of a Linked Data service • NLB services rolled-out by National Library & Public Library divisions – Recently identified a service for Linked Data integration • Currently undergoing fine tuning to meet service’s UI requirements
  • 116. 117 Discovery Interface • Live and ready for deployment – Performance, load and user tested • Designed & developed by NLB Data Team to demonstrate characteristics and benefits of a Linked Data service • NLB services rolled-out by National Library & Public Library divisions – Recently identified a service for Linked Data integration • Currently undergoing fine tuning to meet service’s UI requirements
  • 117. 118 Daily Delta Ingestion & Reconciliation
  • 118. 119 • Inaccurate source data – Not apparent in source systems – Very obvious when aggregated in LDMS • Eg. Subject and Person URIs duplicated – Identified and analysed in DMI – Reported to source curators and fixed – Updated in next delta update – Data issues identified in LDMS – source data quality enhanced Some Challenges Addressed on the Way
  • 119. 120 • Inaccurate source data – Not apparent in source systems – Very obvious when aggregated in LDMS • Eg. Subject and Person URIs duplicated – Identified and analysed in DMI – Reported to source curators and fixed – Updated in next delta update – Data issues identified in LDMS – source data quality enhanced Some Challenges Addressed on the Way
  • 120. 121 • Asian name matching – Often consisting of several short 3-4 letter names – Lucene matches not of much help – Introduction of Levenshtein similarity matching helped – Tuning is challenging – trading off European & Asian names Some Challenges Addressed on the Way
  • 121. 122 • Asian name matching – Often consisting of several short 3-4 letter names – Lucene matches not of much help – Introduction of Levenshtein similarity matching helped – Tuning is challenging – trading off European & Asian names Some Challenges Addressed on the Way
  • 122. 137 From Ambition to Go Live – A Summary • Ambition based on previous years of trials, experimentation, and understanding of Linked Data potential • Ambition to deliver a production LDMS to provide Linked Data curation and management capable of delivering a public discovery service • With experienced commercial partners & tools as part of a globally distributed team – Benefiting from Open Source community developments where appropriate • Unique and challenging requirements – Distributed sources of truth in disparate data formats – Auto updating, consolidated entity view – plus individual entity management – Web friendly – providing structured data for search engines • Live and operational actively delivering benefits from December 2022
  • 123. 138 From Ambition to Go Live – A Summary • Ambition based on previous years of trials, experimentation, and understanding of Linked Data potential • Ambition to deliver a production LDMS to provide Linked Data curation and management capable of delivering a public discovery service • With experienced commercial partners & tools as part of a globally distributed team – Benefiting from Open Source community developments where appropriate • Unique and challenging requirements – Distributed sources of truth in disparate data formats – Auto updating, consolidated entity view – plus individual entity management – Web friendly – providing structured data for search engines • Live and operational actively delivering benefits from December 2022
  • 124. 139 From Ambition to Go Live – A Summary • Ambition based on previous years of trials, experimentation, and understanding of Linked Data potential • Ambition to deliver a production LDMS to provide Linked Data curation and management capable of delivering a public discovery service • With experienced commercial partners & tools as part of a globally distributed team – Benefiting from Open Source community developments where appropriate • Unique and challenging requirements – Distributed sources of truth in disparate data formats – Auto updating, consolidated entity view – plus individual entity management – Web friendly – providing structured data for search engines • Live and operational actively delivering benefits from December 2022
  • 125. 140 From Ambition to Go Live – A Summary • Ambition based on previous years of trials, experimentation, and understanding of Linked Data potential • Ambition to deliver a production LDMS to provide Linked Data curation and management capable of delivering a public discovery service • With experienced commercial partners & tools as part of a globally distributed team – Benefiting from Open Source community developments where appropriate • Unique and challenging requirements – Distributed sources of truth in disparate data formats – Auto updating, consolidated entity view – plus individual entity management – Web friendly – providing structured data for search engines • Live and operational actively delivering benefits from December 2022
  • 126. 141 From Ambition to Go Live – A Summary • Ambition based on previous years of trials, experimentation, and understanding of Linked Data potential • Ambition to deliver a production LDMS to provide Linked Data curation and management capable of delivering a public discovery service • With experienced commercial partners & tools as part of a globally distributed team – Benefiting from Open Source community developments where appropriate • Unique and challenging requirements – Distributed sources of truth in disparate data formats – Auto updating, consolidated entity view – plus individual entity management – Web friendly – providing structured data for search engines • Live and operational actively delivering benefits from December 2022
  • 127. From Ambition to Go Live The National Library Board of Singapore’s Journey to an operational Linked Data Management and Discovery System Richard Wallis Evangelist and Founder Data Liberate richard.wallis@dataliberate.com @dataliberate 15th Semantic Web in Libraries Conference SWIB23 - 12th September 2023 - Berlin