Meemoo manages a large quantity of mainly audiovisual material from more than 170 partners in cultural heritage and media. More than 6 million objects are currently stored, ranging from digitised newspapers, photos, videos, and audio. In addition, a number of access platforms make the digitised content available to specific target groups, including teachers, students, professional re-users, or the public.
Metadata is a key element in all of meemoo’s processes. An important part of our activities is to collect, integrate, manage, and search a large variety of heterogeneous metadata across the archived content. The scale of this has increased enormously, so a good and integrated approach is needed to deal with the amount of metadata, its need for flexibility, and how easy it is to find. One of the specific challenges is modelling and storing data from machine learning algorithms (speech recognition, face detection and entity recognition) for reuse.
In this talk, we will discuss the key points and lessons learned from implementing the new metadata roadmap at Meemoo, which is focused on a Knowledge Graph-based infrastructure. The goal of the roadmap is to establish a better data practice within the organization and offer application-independent, uniform access to (meta)data that is spread across various systems and formats.
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
20230525_mmc_seminar.pdf
1. From metadata to Knowledge Graph
Miel Vander Sande - MMC Seminar 2023
2. From metadata to
Knowledge Graph
Who is meemoo?
Drivers for a
new metadata roadmap
Knowledge Graph-based
infrastructure
Modelling a
heterogeneous archive
Lessons learned & way forward
3. At meemoo we’re here for the archive.
We help cultural, media and government
organisations with advice and practical
support, and want to make archival materials
accessible and usable.
4. Service provision
Digitisation, digital archiving and management of archival materials
Make content accessible and usable
Actively gather and share expertise on digital archive operations
Advise on digital heritage processes
6. Content partners in different sectors
performing arts
50
museums
45
archives
24
heritage societies
19
regional broadcasters
10
government institutions
12
heritage libraries
7
national broadcasters
3
sector institutes
2
These figures are from 31 December 2022.
10. In figures
nearly 170,000
user accounts at The Archive
for Education at end of
2021-2022 academic year
All these figures except for education are from 31 December 2022.
> 540,000
audiovisual carriers
transferred to our
archive system
> 6 million
objects in our
archive system
11. Metadata is key in all processes
Diagnostics &
operations
for finding out what went
wrong or where things at
Preservation
& digitization
such as digital format
deprecation and AV
carrier characteristics
for inventory
Search &
exploration
by platform users,
but also internal
12. From metadata to
Knowledge Graph
Who is meemoo?
Drivers for a
new metadata roadmap
Knowledge Graph-based
infrastructure
Modelling a
heterogeneous archive
Takeaways & way forward
13. MAM-centered infrastructure
Media Asset Management System
The Archive for
Education
hetarchief.be
News of the
great war
Catalogus Pro
Art in Flanders
Contentpartner
Contentpartner
Contentpartner
Content partner
(CRS, DAMS, …)
metadata & media
CRM
Internal
tools
Other data
sources
data model E-Z
data model A
data model C
data model B
data model D
Applications of
Content partners
Internal tools
OAI-PMH
REST API GRAPHQL
SEARCH
metadata
integration
was implicit
Implicit, but
demanding role
as metadata
integrator
14. It works, but…
Our metadata practice had become outdated and was reaching its limits
too many domains with specific needs
one-size-fits-all datamodel cannot deal with the data heterogeneity
The metadata (model) was underspecified
no clear definitions, labels or documentation of concepts and properties
the lack of a shared terminology leads to miscommunication
15. and we still have plans
Adding new analog carriers or new media (e.g., 3D objects, glass plates)
Catch-up process with (new) content partners and with AI / machine learning
Speech-to-text, face recognition, and named-entity recognition
Connecting to external sources (e.g. wikidata), or standardized vocabularies,
controlled lists, thesauri, or taxonomies (e.g. GTAA, VIAF)
Provide extra useful services on and with metadata (e.g. IIIF, ...)
16. Roadmap: Five ambitious horizons
1
Measuring and
validating the
quality of
metadata
Thorough revision
of the
metadatamodel
2
Tackling data
integration with
suitable
fundamental
infrastructure
3
Creating new
ways for inflow
and outflow
4
Active
collaboration
with and about
metadata
5
2020-...
Framework for
data quality
assessment
2021
Datamodels
2022 - 2023
Knowledge
Graph
2023 - …
Access and use
Knowledge Graph
2023 - ...
Linked Data:
external sources
and partners
17. From metadata to
Knowledge Graph
Who is meemoo?
Drivers for a
new metadata roadmap
Knowledge Graph-based
infrastructure
Modelling a
heterogeneous archive
Takeaways & way forward
21. Knowledge Graph?
Archive metadata are
represented and queried
as nodes connected by edges
Intuitive navigation
Supports discovery by exploration
Flexible data structure & schema,
but data semantics, schema and
constraints are still essential!
VRT
Newsitem 25/05
2nd grade English
wikidata
23. Applications of
Content partners
Internal tools
meemoo’s
interactive platforms
Contentpartner
Contentpartner
Contentpartner
Contentpartner
Knowledge Graph
universal, application-independent access to (meta)data
OAI-PMH
REST API GRAPHQL ...
IIIF 3.0
Media Asset
Management System
CRM
Internal
tools
Other data
sources
media
metadata
metadata metadata
metadata
metadata
General purpose
Single purpose
interaction
(platforms)
integration
metadata
management
sources
24. Applications of
Content partners
Internal tools
meemoo’s
interactive platforms
Contentpartner
Contentpartner
Contentpartner
Contentpartner
Knowledge Graph
universal, application-independent access to (meta)data
OAI-PMH
REST API GRAPHQL ...
IIIF 3.0
Media Asset
Management System
CRM
Internal
tools
Other data
sources
media
metadata
metadata metadata
metadata
metadata
General purpose
interaction
(platforms)
integration
metadata
management
sources
Single purpose
Multi purpose
25. Applications of
Content partners
Internal tools
meemoo’s
interactive platforms
Contentpartner
Contentpartner
Contentpartner
Contentpartner
Knowledge Graph
universal, application-independent access to (meta)data
OAI-PMH
REST API GRAPHQL ...
IIIF 3.0
Media Asset
Management System
CRM
Internal
tools
Other data
sources
media
metadata
metadata metadata
metadata
metadata
interaction
(platforms)
integration
metadata
management
sources
User needs
presentation, focused, simple, no surprises
Data needs
flexibility, semantics, context, relationships,
expressive data models and querying
Application needs
performance (caching), developer-friendly, interoperable
26. From metadata to
Knowledge Graph
Who is meemoo?
Drivers for a
new metadata roadmap
Knowledge Graph-based
infrastructure
Modelling a
heterogeneous archive
Takeaways & way forward
27. Metadata modelling methodology
1. Knowledge capture
Business working groups → relevant business questions
Existing models, (meta)data, documentation and (functional) analyses
External standards (DC, EBU Core, PREMIS, CIDOC)
Other pain points, wishes & use cases
2. Knowledge implementation
Thematic working group
(per domain)
Diagram
Formalise & document
Proof-of-concept
Knowledge
inventory
Specifications
Open problems
3. Model evaluation
External working groups (partners)
Test business questions & intake procedures
vocabularium & schema
and/or thesauri & lists of terms
31. Basic object structure (PREMIS OWL)
metadata about
the content
eg. a film
metadata about
the reproduction
eg. archive master
metadata about the carrier
or physical (art)work
eg. the nitrate film
technical
metadata
eg. the .mov
33. From metadata to
Knowledge Graph
Who is meemoo?
Drivers for a
new metadata roadmap
Knowledge Graph-based
infrastructure
Modelling a
heterogeneous archive
Takeaways & way forward
34. Composing a good data toolchain
Public procurement procedure to purchase a
Graph storage and RDF mapping solution: TriplyDB
Adopting & contributing to open source tooling
Workflow and ETL automation with Prefect
GraphQL over SPARQL framework GRASP
SKOS editing tool manager, possibly atramhasis
many smaller tools and libraries
Custom tooling: shacl2md to generate datamodel documentation
35. Invest in good data management
It takes time, effort, budget and know-how to do this right
Data modelling is a lost art, but it is still essential
Figuring out the shape and meaning of things pays off
Try new data technologies: metadata is a graph
Right base for data integration, unknown use cases and sparse data
The RDF ecosystem gives a head start, also in AV archiving
PREMIS, ODRL, EBUCore, SKOS, … are powerful, especially combined
36. Current state and the way forward
What have we done?
Inventory of existing data and
knowledge domains
Creating and formalizing new
data models
Setting up an RDF Knowledge
Graph platform
What are we working on?
Mapping between the archive
data and the data models
Implementing ETLs to generate
RDF data
Developing a GraphQL
framework and build APIs