SlideShare a Scribd company logo
1 of 31
Download to read offline
Aggregation workflow
Cécile Devarenne
Operations Officer
Metadata training, Europeana Sounds project
Athens, 23rd/24th of October 2014
Content
• Europeana's aggregation team
• Europeana Publication Policy
• Aggregation workflow
• Submission deadlines
• Ingestion processes and tools
• Acceptance criteria and Europeana validation of data
• Guidance and help – Europeana pro
• Future plans for aggregation workflow
Europeana’s aggregation team:
who are we?
Europeana’s aggregation team
• Partner relationships, business development, administration
• Henning Scholz, Joris Pekel, Gina Van der Linden
• Technical support
• Operations officers: content@europeana.eu
• Data support, feedback and ingestion of your collections into
Europeana portal and API
Europeana Publication Policy
Europeana Publication Policy
Clear criteria for acceptance or decline of metadata for publication and for take
down of legacy metadata from the Europeana database
•Ingestion workflow (deadlines, timelines, prioritization)
•Content scope (what is a digital object? what content does Europeana
aggregate?)
•Technical validation of metadata quality (expected values)
•Metadata licensing (CC0)
•Rights Statements for digital objects
• All digital objects with valid edm:rights chosen from http://
pro.europeana.eu/web/guest/available-rights-statements
• Public Domain material labelled with the Public Domain mark in
edm:rights
• edm:rights & dc:rights not in contradiction
Aggregation workflow and
submission deadlines: how does it
work?
8
Submission of data: preliminary steps
for your project
• (1) Data Exchange Agreement to Europeana (DEA)
• Europeana Sounds project needs to submit the signed Data
Exchange Agreements for each contributing data provider
• The Europeana Data Exchange Agreement establishes the terms
under which Europeana can make use of the previews and descriptive
metadata provided by cultural institution
• More information to be found here: http://pro.europeana.eu/ensuring-
permissions-for-aggregators
• (2) Data contribution form
• One form for the whole project
• General information on data to be submitted to Europeana
• Schedule of data delivery: ingestion planning
• (3) Submission of data samples and feedback taken into account
Submission of data: (4) publication cycles
• Operations officers work on a monthly cycle
• Submission of data in the form of datasets: a coherent batch of records,
for the Europeana Sounds project, probably one dataset for each of your
data providers
• A dataset takes on average 40 mins to process
• Around 200 datasets are processed by the Operations officers for each
cycle of publication
• Datasets go through a full flow of operations before they are production
ready
• Datasets need to be submitted on time in order for this production cycle to
work
• Datasets are submitted by the technical/content coordinators of your
project
• The earlier you submit datasets the more feedback we can give!
Submission of data: new provider timeline
Submission of data: regular ingestion
cycle timeline
Ingestion processes and tools:
what happens to your data when
submitted to Europeana?
Europeana’s set of ingestion tools
• Unified Ingestion Manager (UIM): orchestrator of data flows triggered in
various tools and plugins
• SugarCRM (Customer Relationship Management): reference entries for
datasets and organisations
• REPOX: harvester to get the collections uploaded into Europeana
• Europeana’s instance of Mint (Metadata INTeroperability): mapping and
editing tool for ingested datasets
• Data plugins
• Itemization, Europeana identifiers generation
• Dereferencing
• Enrichment
• Redirects
• Extraction of hierarchies
• Thumbnails caching
Europeana ingestion data flows
Steps to get data ingested
From the moment your data was submitted:
• Checks on raw xml (Browser)
• Prior to harvesting
• Identification of key issues
• Creation/update of dataset information, checks on validity of the supplied
harvesting information (SugarCRM - REPOX)
• Harvesting (REPOX)
• Mapping/editing of datasets (Europeana Mint)
• Mapping tool for all datasets
• Adapted for Europeana in order to process multiple formats (EDM,
ESE, any metadata standard with provided XSLT)
• Drag and drop appropriate elements
• Quality checks and data cleaning if necessary
• Transformation and validation of records according to EDM schema
and schematron rules
• EDM Internal data: Europeana ready material
Steps to get data ingested
• Operations on data following transformation:
• Itemization and creation/management of Europeana identifiers for
permalinks to your records in Europeana
• Extraction of hierarchies for datasets including EDM hierarchies
• Thumbnails caching
• Enrichments of data:
• From links to linked data exposed ontologies, generation of
additional contextual data (dereferencing)
• From analysis of the provided data, automated semantic
enrichment (Europeana enrichment)
• If necessary (when a change of identifiers was communicated to
Europeana), creation of redirections between previous and newly
generated identifiers
• Data ready! monthly deploy on Europeana portal and API
Acceptance criteria: how exactly is
the Publication Policy
implemented?
Acceptance criteria
• Data Exchange Agreement to Europeana
• Datasets submitted via OAI-PMH protocol, FTP or file
• Metadata are accepted for publication after the feedback of the
Europeana Operations Officers
• EDM schema and guidelines
• Rights labeling
• Datasets are prioritized for publication if the edm:rights in the majority of
the metadata of the dataset is PDM, CC0, CC BY or CC BY-SA
Automatic validation:
• Validation according to the EDM schema
• Validation of the mandatory properties
• Unique identifiers within a dataset
• Metadata records that don’t meet this validation are invalidated or
discarded
• Providers can fix issues first and resubmit or let Europeana ingest the
records that are valid, and fix the invalid records at a later stage
• Validation of urls for thumbnail creation (ImageMagick)
Europeana validation
Applicable class Mandatory Properties (or alternatives)
Aggregation edm:dataProvider
Aggregation edm:isShownAt or edm:isShownBy
Aggregation edm:provider
Aggregation edm:rights
Aggregation edm:aggregatedCHO
Aggregation edm:ugc (when applicable)
ProvidedCHO dc:title or dc:description
ProvidedCHO dc:language for text objects
ProvidedCHO
dc:subject or dc:type or dc:coverage or
dcterms:spatial
ProvidedCHO edm:type
Mandatory properties
Validation by the Operations officers:
• Feedback is according to the EDM schema and guidelines
• Checks on the connections between the EDM classes and the general
structure of the data
• Correct use of vocabularies, recommendations to include geolocations
• Checks on the types of values: literals vs resources (e.g. a thumbnail
always need to be a valid url)
• Checks on links to digital representations of the objects; if direct links to
a file, check that they are of reasonable size
• Provision of thumbnails highly recommended
• Feedback on (near) duplicate records
• Feedback on rights statements in edm:rights and dc:rights
• Feedback on any other metadata quality related matters (duplication of
properties, encoding in the data, wrongly mapped properties, etc.)
Europeana validation
• The data is represented according to expectations for both sides
• Users can search and retrieve rich content:
• Developers can make the best use of the API
• Objects are clicked through and re-used from the Europeana portal
Happy ingestion :-)
Happy ingestion :-)
Guidance and help
Guidance and help 



Europeana Professional:

http://pro.europeana.eu/provide-data



Content inbox – for all ingestion & metadata related matters 

content@europeana.eu
Questions?
Future plans for aggregation
workflow
Future plans for aggregation workflow
• Future plans to open up part of the Europeana ingestion workflow to
providers
• Providers can log-in the Europeana ingestion suite, identify the
aggregator/project they work for
• Providers can select the datasets they want to update, or add new
datasets
• Providers can upload their data (OAI-PMH and FTP protocols)
• Providers can map their data to EDM, or edit data that is already EDM
• Providers can validate the data against the EDM schema and preview
them prior to submission
• Other processes being considered for refactoring: semantic validation, link
checking, thumbnail caching, enrichment
Future plans for aggregation workflow
• Benefits for providers:
• Possibility to map to EDM
• Validation according to the EDM schema (with schematron rules we
implemented)
• Preview before publication
• Self service, less dependent on Europeana, saving time (you can do
many steps yourself, and you spot errors earlier)
• Benefits for Europeana:
• Operations scaled up – amount of projects, aggregators and therefore
datasets has grown exponentially in the last years
• More focus on EDM modeling and metadata related questions
• Ingestion process transparent and more connected to the process at
aggregators side
Thank you!
Cécile Devarenne
cecile.devarenne@europeana.eu or content@europeana.eu

More Related Content

Similar to Aggregation workflow

A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesDataWorks Summit
 
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Ricard de la Vega
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfRob Winters
 
IPC Data Analysis and Extraction
IPC Data Analysis and ExtractionIPC Data Analysis and Extraction
IPC Data Analysis and Extractionpzybrick
 
A Beginner's Guide to Ember
A Beginner's Guide to EmberA Beginner's Guide to Ember
A Beginner's Guide to EmberRichard Martin
 
Share Point Sat Share Point 2010 And Content Migration
Share Point Sat Share Point 2010 And Content MigrationShare Point Sat Share Point 2010 And Content Migration
Share Point Sat Share Point 2010 And Content MigrationNadir Kamdar
 
Ingestion workflows. Presentation at the Europeana Aggregator Forum 2015
Ingestion workflows. Presentation at the Europeana Aggregator Forum 2015Ingestion workflows. Presentation at the Europeana Aggregator Forum 2015
Ingestion workflows. Presentation at the Europeana Aggregator Forum 2015Europeana
 
Annotations and Europeana @Project Assembly 2014 - Tech Workshops
Annotations and Europeana @Project Assembly 2014 - Tech WorkshopsAnnotations and Europeana @Project Assembly 2014 - Tech Workshops
Annotations and Europeana @Project Assembly 2014 - Tech WorkshopsDavid Haskiya
 
Lantea platform
Lantea platformLantea platform
Lantea platformNeuzilla
 
Data Management Workshop - ETOT 2016
Data Management Workshop - ETOT 2016Data Management Workshop - ETOT 2016
Data Management Workshop - ETOT 2016DataGenic Ltd
 
Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...Victor Holman
 
(ATS6-PLAT04) Query service
(ATS6-PLAT04) Query service (ATS6-PLAT04) Query service
(ATS6-PLAT04) Query service BIOVIA
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)Lucas Jellema
 
GOKb: What it builds on, what it can build (code4lib 2012)
GOKb: What it builds on, what it can build (code4lib 2012)GOKb: What it builds on, what it can build (code4lib 2012)
GOKb: What it builds on, what it can build (code4lib 2012)GOKb Project
 
Streamline Cognos Migrations & Consolidations
Streamline Cognos Migrations & ConsolidationsStreamline Cognos Migrations & Consolidations
Streamline Cognos Migrations & ConsolidationsSenturus
 

Similar to Aggregation workflow (20)

A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companies
 
Information management at vhir ueb using tiki-cms
Information management at vhir ueb using tiki-cmsInformation management at vhir ueb using tiki-cms
Information management at vhir ueb using tiki-cms
 
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
 
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
 
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the Bijenkorf
 
IPC Data Analysis and Extraction
IPC Data Analysis and ExtractionIPC Data Analysis and Extraction
IPC Data Analysis and Extraction
 
A Beginner's Guide to Ember
A Beginner's Guide to EmberA Beginner's Guide to Ember
A Beginner's Guide to Ember
 
Share Point Sat Share Point 2010 And Content Migration
Share Point Sat Share Point 2010 And Content MigrationShare Point Sat Share Point 2010 And Content Migration
Share Point Sat Share Point 2010 And Content Migration
 
Ingestion workflows. Presentation at the Europeana Aggregator Forum 2015
Ingestion workflows. Presentation at the Europeana Aggregator Forum 2015Ingestion workflows. Presentation at the Europeana Aggregator Forum 2015
Ingestion workflows. Presentation at the Europeana Aggregator Forum 2015
 
JOSA TechTalk: Metadata Management
in Big Data
JOSA TechTalk: Metadata Management
in Big DataJOSA TechTalk: Metadata Management
in Big Data
JOSA TechTalk: Metadata Management
in Big Data
 
Annotations and Europeana @Project Assembly 2014 - Tech Workshops
Annotations and Europeana @Project Assembly 2014 - Tech WorkshopsAnnotations and Europeana @Project Assembly 2014 - Tech Workshops
Annotations and Europeana @Project Assembly 2014 - Tech Workshops
 
Lantea platform
Lantea platformLantea platform
Lantea platform
 
Echoes Project
Echoes ProjectEchoes Project
Echoes Project
 
Data Management Workshop - ETOT 2016
Data Management Workshop - ETOT 2016Data Management Workshop - ETOT 2016
Data Management Workshop - ETOT 2016
 
Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...
 
(ATS6-PLAT04) Query service
(ATS6-PLAT04) Query service (ATS6-PLAT04) Query service
(ATS6-PLAT04) Query service
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)
 
GOKb: What it builds on, what it can build (code4lib 2012)
GOKb: What it builds on, what it can build (code4lib 2012)GOKb: What it builds on, what it can build (code4lib 2012)
GOKb: What it builds on, what it can build (code4lib 2012)
 
Streamline Cognos Migrations & Consolidations
Streamline Cognos Migrations & ConsolidationsStreamline Cognos Migrations & Consolidations
Streamline Cognos Migrations & Consolidations
 

More from Europeana_Sounds

The Europeana Sounds Music Information Retrieval Pilot
The Europeana Sounds Music Information Retrieval PilotThe Europeana Sounds Music Information Retrieval Pilot
The Europeana Sounds Music Information Retrieval PilotEuropeana_Sounds
 
Semantic Enrichment & Crowdsourcing
Semantic Enrichment & CrowdsourcingSemantic Enrichment & Crowdsourcing
Semantic Enrichment & CrowdsourcingEuropeana_Sounds
 
Crowdsourcing and Semantic Enrichments for European Cultural Heritage
Crowdsourcing and Semantic Enrichments for European Cultural HeritageCrowdsourcing and Semantic Enrichments for European Cultural Heritage
Crowdsourcing and Semantic Enrichments for European Cultural HeritageEuropeana_Sounds
 
Data processing for digital libraries: the experience of the BnF with Europea...
Data processing for digital libraries: the experience of the BnF with Europea...Data processing for digital libraries: the experience of the BnF with Europea...
Data processing for digital libraries: the experience of the BnF with Europea...Europeana_Sounds
 
Treasuring the sound heritage: the Europeana Sounds project
Treasuring the sound heritage: the Europeana Sounds projectTreasuring the sound heritage: the Europeana Sounds project
Treasuring the sound heritage: the Europeana Sounds projectEuropeana_Sounds
 
Europeana Sounds: improving access to Europe’s digital audio archives
Europeana Sounds: improving access to Europe’s digital audio archives Europeana Sounds: improving access to Europe’s digital audio archives
Europeana Sounds: improving access to Europe’s digital audio archives Europeana_Sounds
 
Challenges on modeling annotations in the europeana sounds project
Challenges on modeling annotations in the europeana sounds projectChallenges on modeling annotations in the europeana sounds project
Challenges on modeling annotations in the europeana sounds projectEuropeana_Sounds
 
A virtual jukebox for europe's sound heritage
A virtual jukebox for europe's sound heritageA virtual jukebox for europe's sound heritage
A virtual jukebox for europe's sound heritageEuropeana_Sounds
 
Creating legal access to sound heritage
Creating legal access to sound heritageCreating legal access to sound heritage
Creating legal access to sound heritageEuropeana_Sounds
 
The Future of Historic Sounds – a prelude
The Future of Historic Sounds – a preludeThe Future of Historic Sounds – a prelude
The Future of Historic Sounds – a preludeEuropeana_Sounds
 
Europeana sounds in a nutshell (August 2015)
Europeana sounds in a nutshell (August 2015)Europeana sounds in a nutshell (August 2015)
Europeana sounds in a nutshell (August 2015)Europeana_Sounds
 
Aggregation status on Year 1
Aggregation status on Year 1Aggregation status on Year 1
Aggregation status on Year 1Europeana_Sounds
 
Publication of Europeana Sounds data in Europeana
Publication of Europeana Sounds data in EuropeanaPublication of Europeana Sounds data in Europeana
Publication of Europeana Sounds data in EuropeanaEuropeana_Sounds
 
Metadata ingestion plan presentation
Metadata ingestion plan presentationMetadata ingestion plan presentation
Metadata ingestion plan presentationEuropeana_Sounds
 
Recap of the previous training session
Recap of the previous training sessionRecap of the previous training session
Recap of the previous training sessionEuropeana_Sounds
 
Short introduction to RDF model based on the EDM sounds profile
Short introduction to RDF model based on the EDM sounds profileShort introduction to RDF model based on the EDM sounds profile
Short introduction to RDF model based on the EDM sounds profileEuropeana_Sounds
 
Europeana sounds in a nutshell (June 2015)
Europeana sounds in a nutshell (June 2015)Europeana sounds in a nutshell (June 2015)
Europeana sounds in a nutshell (June 2015)Europeana_Sounds
 

More from Europeana_Sounds (20)

The Europeana Sounds Music Information Retrieval Pilot
The Europeana Sounds Music Information Retrieval PilotThe Europeana Sounds Music Information Retrieval Pilot
The Europeana Sounds Music Information Retrieval Pilot
 
Semantic Enrichment & Crowdsourcing
Semantic Enrichment & CrowdsourcingSemantic Enrichment & Crowdsourcing
Semantic Enrichment & Crowdsourcing
 
Crowdsourcing and Semantic Enrichments for European Cultural Heritage
Crowdsourcing and Semantic Enrichments for European Cultural HeritageCrowdsourcing and Semantic Enrichments for European Cultural Heritage
Crowdsourcing and Semantic Enrichments for European Cultural Heritage
 
Data processing for digital libraries: the experience of the BnF with Europea...
Data processing for digital libraries: the experience of the BnF with Europea...Data processing for digital libraries: the experience of the BnF with Europea...
Data processing for digital libraries: the experience of the BnF with Europea...
 
Treasuring the sound heritage: the Europeana Sounds project
Treasuring the sound heritage: the Europeana Sounds projectTreasuring the sound heritage: the Europeana Sounds project
Treasuring the sound heritage: the Europeana Sounds project
 
Europeana Sounds: improving access to Europe’s digital audio archives
Europeana Sounds: improving access to Europe’s digital audio archives Europeana Sounds: improving access to Europe’s digital audio archives
Europeana Sounds: improving access to Europe’s digital audio archives
 
Challenges on modeling annotations in the europeana sounds project
Challenges on modeling annotations in the europeana sounds projectChallenges on modeling annotations in the europeana sounds project
Challenges on modeling annotations in the europeana sounds project
 
A virtual jukebox for europe's sound heritage
A virtual jukebox for europe's sound heritageA virtual jukebox for europe's sound heritage
A virtual jukebox for europe's sound heritage
 
Creating legal access to sound heritage
Creating legal access to sound heritageCreating legal access to sound heritage
Creating legal access to sound heritage
 
The Future of Historic Sounds – a prelude
The Future of Historic Sounds – a preludeThe Future of Historic Sounds – a prelude
The Future of Historic Sounds – a prelude
 
Europeana sounds in a nutshell (August 2015)
Europeana sounds in a nutshell (August 2015)Europeana sounds in a nutshell (August 2015)
Europeana sounds in a nutshell (August 2015)
 
Aggregation status on Year 1
Aggregation status on Year 1Aggregation status on Year 1
Aggregation status on Year 1
 
Publication of Europeana Sounds data in Europeana
Publication of Europeana Sounds data in EuropeanaPublication of Europeana Sounds data in Europeana
Publication of Europeana Sounds data in Europeana
 
EDM for Europeana Sounds
EDM for Europeana SoundsEDM for Europeana Sounds
EDM for Europeana Sounds
 
Metadata ingestion plan presentation
Metadata ingestion plan presentationMetadata ingestion plan presentation
Metadata ingestion plan presentation
 
Recap of the previous training session
Recap of the previous training sessionRecap of the previous training session
Recap of the previous training session
 
Short introduction to RDF model based on the EDM sounds profile
Short introduction to RDF model based on the EDM sounds profileShort introduction to RDF model based on the EDM sounds profile
Short introduction to RDF model based on the EDM sounds profile
 
Advanced mappings
Advanced mappingsAdvanced mappings
Advanced mappings
 
Europeana publication
Europeana publicationEuropeana publication
Europeana publication
 
Europeana sounds in a nutshell (June 2015)
Europeana sounds in a nutshell (June 2015)Europeana sounds in a nutshell (June 2015)
Europeana sounds in a nutshell (June 2015)
 

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Aggregation workflow

  • 1. Aggregation workflow Cécile Devarenne Operations Officer Metadata training, Europeana Sounds project Athens, 23rd/24th of October 2014
  • 2. Content • Europeana's aggregation team • Europeana Publication Policy • Aggregation workflow • Submission deadlines • Ingestion processes and tools • Acceptance criteria and Europeana validation of data • Guidance and help – Europeana pro • Future plans for aggregation workflow
  • 4. Europeana’s aggregation team • Partner relationships, business development, administration • Henning Scholz, Joris Pekel, Gina Van der Linden • Technical support • Operations officers: content@europeana.eu • Data support, feedback and ingestion of your collections into Europeana portal and API
  • 6. Europeana Publication Policy Clear criteria for acceptance or decline of metadata for publication and for take down of legacy metadata from the Europeana database •Ingestion workflow (deadlines, timelines, prioritization) •Content scope (what is a digital object? what content does Europeana aggregate?) •Technical validation of metadata quality (expected values) •Metadata licensing (CC0) •Rights Statements for digital objects • All digital objects with valid edm:rights chosen from http:// pro.europeana.eu/web/guest/available-rights-statements • Public Domain material labelled with the Public Domain mark in edm:rights • edm:rights & dc:rights not in contradiction
  • 7. Aggregation workflow and submission deadlines: how does it work?
  • 8. 8
  • 9. Submission of data: preliminary steps for your project • (1) Data Exchange Agreement to Europeana (DEA) • Europeana Sounds project needs to submit the signed Data Exchange Agreements for each contributing data provider • The Europeana Data Exchange Agreement establishes the terms under which Europeana can make use of the previews and descriptive metadata provided by cultural institution • More information to be found here: http://pro.europeana.eu/ensuring- permissions-for-aggregators • (2) Data contribution form • One form for the whole project • General information on data to be submitted to Europeana • Schedule of data delivery: ingestion planning • (3) Submission of data samples and feedback taken into account
  • 10. Submission of data: (4) publication cycles • Operations officers work on a monthly cycle • Submission of data in the form of datasets: a coherent batch of records, for the Europeana Sounds project, probably one dataset for each of your data providers • A dataset takes on average 40 mins to process • Around 200 datasets are processed by the Operations officers for each cycle of publication • Datasets go through a full flow of operations before they are production ready • Datasets need to be submitted on time in order for this production cycle to work • Datasets are submitted by the technical/content coordinators of your project • The earlier you submit datasets the more feedback we can give!
  • 11. Submission of data: new provider timeline
  • 12. Submission of data: regular ingestion cycle timeline
  • 13. Ingestion processes and tools: what happens to your data when submitted to Europeana?
  • 14. Europeana’s set of ingestion tools • Unified Ingestion Manager (UIM): orchestrator of data flows triggered in various tools and plugins • SugarCRM (Customer Relationship Management): reference entries for datasets and organisations • REPOX: harvester to get the collections uploaded into Europeana • Europeana’s instance of Mint (Metadata INTeroperability): mapping and editing tool for ingested datasets • Data plugins • Itemization, Europeana identifiers generation • Dereferencing • Enrichment • Redirects • Extraction of hierarchies • Thumbnails caching
  • 16. Steps to get data ingested From the moment your data was submitted: • Checks on raw xml (Browser) • Prior to harvesting • Identification of key issues • Creation/update of dataset information, checks on validity of the supplied harvesting information (SugarCRM - REPOX) • Harvesting (REPOX) • Mapping/editing of datasets (Europeana Mint) • Mapping tool for all datasets • Adapted for Europeana in order to process multiple formats (EDM, ESE, any metadata standard with provided XSLT) • Drag and drop appropriate elements • Quality checks and data cleaning if necessary • Transformation and validation of records according to EDM schema and schematron rules • EDM Internal data: Europeana ready material
  • 17. Steps to get data ingested • Operations on data following transformation: • Itemization and creation/management of Europeana identifiers for permalinks to your records in Europeana • Extraction of hierarchies for datasets including EDM hierarchies • Thumbnails caching • Enrichments of data: • From links to linked data exposed ontologies, generation of additional contextual data (dereferencing) • From analysis of the provided data, automated semantic enrichment (Europeana enrichment) • If necessary (when a change of identifiers was communicated to Europeana), creation of redirections between previous and newly generated identifiers • Data ready! monthly deploy on Europeana portal and API
  • 18. Acceptance criteria: how exactly is the Publication Policy implemented?
  • 19. Acceptance criteria • Data Exchange Agreement to Europeana • Datasets submitted via OAI-PMH protocol, FTP or file • Metadata are accepted for publication after the feedback of the Europeana Operations Officers • EDM schema and guidelines • Rights labeling • Datasets are prioritized for publication if the edm:rights in the majority of the metadata of the dataset is PDM, CC0, CC BY or CC BY-SA
  • 20. Automatic validation: • Validation according to the EDM schema • Validation of the mandatory properties • Unique identifiers within a dataset • Metadata records that don’t meet this validation are invalidated or discarded • Providers can fix issues first and resubmit or let Europeana ingest the records that are valid, and fix the invalid records at a later stage • Validation of urls for thumbnail creation (ImageMagick) Europeana validation
  • 21. Applicable class Mandatory Properties (or alternatives) Aggregation edm:dataProvider Aggregation edm:isShownAt or edm:isShownBy Aggregation edm:provider Aggregation edm:rights Aggregation edm:aggregatedCHO Aggregation edm:ugc (when applicable) ProvidedCHO dc:title or dc:description ProvidedCHO dc:language for text objects ProvidedCHO dc:subject or dc:type or dc:coverage or dcterms:spatial ProvidedCHO edm:type Mandatory properties
  • 22. Validation by the Operations officers: • Feedback is according to the EDM schema and guidelines • Checks on the connections between the EDM classes and the general structure of the data • Correct use of vocabularies, recommendations to include geolocations • Checks on the types of values: literals vs resources (e.g. a thumbnail always need to be a valid url) • Checks on links to digital representations of the objects; if direct links to a file, check that they are of reasonable size • Provision of thumbnails highly recommended • Feedback on (near) duplicate records • Feedback on rights statements in edm:rights and dc:rights • Feedback on any other metadata quality related matters (duplication of properties, encoding in the data, wrongly mapped properties, etc.) Europeana validation
  • 23. • The data is represented according to expectations for both sides • Users can search and retrieve rich content: • Developers can make the best use of the API • Objects are clicked through and re-used from the Europeana portal Happy ingestion :-)
  • 26. Guidance and help 
 
 Europeana Professional:
 http://pro.europeana.eu/provide-data
 
 Content inbox – for all ingestion & metadata related matters 
 content@europeana.eu
  • 28. Future plans for aggregation workflow
  • 29. Future plans for aggregation workflow • Future plans to open up part of the Europeana ingestion workflow to providers • Providers can log-in the Europeana ingestion suite, identify the aggregator/project they work for • Providers can select the datasets they want to update, or add new datasets • Providers can upload their data (OAI-PMH and FTP protocols) • Providers can map their data to EDM, or edit data that is already EDM • Providers can validate the data against the EDM schema and preview them prior to submission • Other processes being considered for refactoring: semantic validation, link checking, thumbnail caching, enrichment
  • 30. Future plans for aggregation workflow • Benefits for providers: • Possibility to map to EDM • Validation according to the EDM schema (with schematron rules we implemented) • Preview before publication • Self service, less dependent on Europeana, saving time (you can do many steps yourself, and you spot errors earlier) • Benefits for Europeana: • Operations scaled up – amount of projects, aggregators and therefore datasets has grown exponentially in the last years • More focus on EDM modeling and metadata related questions • Ingestion process transparent and more connected to the process at aggregators side