SlideShare a Scribd company logo
1 of 31
Download to read offline
Aggregation workflow
Cécile Devarenne
Operations Officer
Metadata training, Europeana Sounds project
Athens, 23rd/24th of October 2014
Content
• Europeana's aggregation team
• Europeana Publication Policy
• Aggregation workflow
• Submission deadlines
• Ingestion processes and tools
• Acceptance criteria and Europeana validation of data
• Guidance and help – Europeana pro
• Future plans for aggregation workflow
Europeana’s aggregation team:
who are we?
Europeana’s aggregation team
• Partner relationships, business development, administration
• Henning Scholz, Joris Pekel, Gina Van der Linden
• Technical support
• Operations officers: content@europeana.eu
• Data support, feedback and ingestion of your collections into
Europeana portal and API
Europeana Publication Policy
Europeana Publication Policy
Clear criteria for acceptance or decline of metadata for publication and for take
down of legacy metadata from the Europeana database
•Ingestion workflow (deadlines, timelines, prioritization)
•Content scope (what is a digital object? what content does Europeana
aggregate?)
•Technical validation of metadata quality (expected values)
•Metadata licensing (CC0)
•Rights Statements for digital objects
• All digital objects with valid edm:rights chosen from http://
pro.europeana.eu/web/guest/available-rights-statements
• Public Domain material labelled with the Public Domain mark in
edm:rights
• edm:rights & dc:rights not in contradiction
Aggregation workflow and
submission deadlines: how does it
work?
8
Submission of data: preliminary steps
for your project
• (1) Data Exchange Agreement to Europeana (DEA)
• Europeana Sounds project needs to submit the signed Data
Exchange Agreements for each contributing data provider
• The Europeana Data Exchange Agreement establishes the terms
under which Europeana can make use of the previews and descriptive
metadata provided by cultural institution
• More information to be found here: http://pro.europeana.eu/ensuring-
permissions-for-aggregators
• (2) Data contribution form
• One form for the whole project
• General information on data to be submitted to Europeana
• Schedule of data delivery: ingestion planning
• (3) Submission of data samples and feedback taken into account
Submission of data: (4) publication cycles
• Operations officers work on a monthly cycle
• Submission of data in the form of datasets: a coherent batch of records,
for the Europeana Sounds project, probably one dataset for each of your
data providers
• A dataset takes on average 40 mins to process
• Around 200 datasets are processed by the Operations officers for each
cycle of publication
• Datasets go through a full flow of operations before they are production
ready
• Datasets need to be submitted on time in order for this production cycle to
work
• Datasets are submitted by the technical/content coordinators of your
project
• The earlier you submit datasets the more feedback we can give!
Submission of data: new provider timeline
Submission of data: regular ingestion
cycle timeline
Ingestion processes and tools:
what happens to your data when
submitted to Europeana?
Europeana’s set of ingestion tools
• Unified Ingestion Manager (UIM): orchestrator of data flows triggered in
various tools and plugins
• SugarCRM (Customer Relationship Management): reference entries for
datasets and organisations
• REPOX: harvester to get the collections uploaded into Europeana
• Europeana’s instance of Mint (Metadata INTeroperability): mapping and
editing tool for ingested datasets
• Data plugins
• Itemization, Europeana identifiers generation
• Dereferencing
• Enrichment
• Redirects
• Extraction of hierarchies
• Thumbnails caching
Europeana ingestion data flows
Steps to get data ingested
From the moment your data was submitted:
• Checks on raw xml (Browser)
• Prior to harvesting
• Identification of key issues
• Creation/update of dataset information, checks on validity of the supplied
harvesting information (SugarCRM - REPOX)
• Harvesting (REPOX)
• Mapping/editing of datasets (Europeana Mint)
• Mapping tool for all datasets
• Adapted for Europeana in order to process multiple formats (EDM,
ESE, any metadata standard with provided XSLT)
• Drag and drop appropriate elements
• Quality checks and data cleaning if necessary
• Transformation and validation of records according to EDM schema
and schematron rules
• EDM Internal data: Europeana ready material
Steps to get data ingested
• Operations on data following transformation:
• Itemization and creation/management of Europeana identifiers for
permalinks to your records in Europeana
• Extraction of hierarchies for datasets including EDM hierarchies
• Thumbnails caching
• Enrichments of data:
• From links to linked data exposed ontologies, generation of
additional contextual data (dereferencing)
• From analysis of the provided data, automated semantic
enrichment (Europeana enrichment)
• If necessary (when a change of identifiers was communicated to
Europeana), creation of redirections between previous and newly
generated identifiers
• Data ready! monthly deploy on Europeana portal and API
Acceptance criteria: how exactly is
the Publication Policy
implemented?
Acceptance criteria
• Data Exchange Agreement to Europeana
• Datasets submitted via OAI-PMH protocol, FTP or file
• Metadata are accepted for publication after the feedback of the
Europeana Operations Officers
• EDM schema and guidelines
• Rights labeling
• Datasets are prioritized for publication if the edm:rights in the majority of
the metadata of the dataset is PDM, CC0, CC BY or CC BY-SA
Automatic validation:
• Validation according to the EDM schema
• Validation of the mandatory properties
• Unique identifiers within a dataset
• Metadata records that don’t meet this validation are invalidated or
discarded
• Providers can fix issues first and resubmit or let Europeana ingest the
records that are valid, and fix the invalid records at a later stage
• Validation of urls for thumbnail creation (ImageMagick)
Europeana validation
Applicable class Mandatory Properties (or alternatives)
Aggregation edm:dataProvider
Aggregation edm:isShownAt or edm:isShownBy
Aggregation edm:provider
Aggregation edm:rights
Aggregation edm:aggregatedCHO
Aggregation edm:ugc (when applicable)
ProvidedCHO dc:title or dc:description
ProvidedCHO dc:language for text objects
ProvidedCHO
dc:subject or dc:type or dc:coverage or
dcterms:spatial
ProvidedCHO edm:type
Mandatory properties
Validation by the Operations officers:
• Feedback is according to the EDM schema and guidelines
• Checks on the connections between the EDM classes and the general
structure of the data
• Correct use of vocabularies, recommendations to include geolocations
• Checks on the types of values: literals vs resources (e.g. a thumbnail
always need to be a valid url)
• Checks on links to digital representations of the objects; if direct links to
a file, check that they are of reasonable size
• Provision of thumbnails highly recommended
• Feedback on (near) duplicate records
• Feedback on rights statements in edm:rights and dc:rights
• Feedback on any other metadata quality related matters (duplication of
properties, encoding in the data, wrongly mapped properties, etc.)
Europeana validation
• The data is represented according to expectations for both sides
• Users can search and retrieve rich content:
• Developers can make the best use of the API
• Objects are clicked through and re-used from the Europeana portal
Happy ingestion :-)
Happy ingestion :-)
Guidance and help
Guidance and help 



Europeana Professional:

http://pro.europeana.eu/provide-data



Content inbox – for all ingestion & metadata related matters 

content@europeana.eu
Questions?
Future plans for aggregation
workflow
Future plans for aggregation workflow
• Future plans to open up part of the Europeana ingestion workflow to
providers
• Providers can log-in the Europeana ingestion suite, identify the
aggregator/project they work for
• Providers can select the datasets they want to update, or add new
datasets
• Providers can upload their data (OAI-PMH and FTP protocols)
• Providers can map their data to EDM, or edit data that is already EDM
• Providers can validate the data against the EDM schema and preview
them prior to submission
• Other processes being considered for refactoring: semantic validation, link
checking, thumbnail caching, enrichment
Future plans for aggregation workflow
• Benefits for providers:
• Possibility to map to EDM
• Validation according to the EDM schema (with schematron rules we
implemented)
• Preview before publication
• Self service, less dependent on Europeana, saving time (you can do
many steps yourself, and you spot errors earlier)
• Benefits for Europeana:
• Operations scaled up – amount of projects, aggregators and therefore
datasets has grown exponentially in the last years
• More focus on EDM modeling and metadata related questions
• Ingestion process transparent and more connected to the process at
aggregators side
Thank you!
Cécile Devarenne
cecile.devarenne@europeana.eu or content@europeana.eu

More Related Content

Similar to Aggregation workflow

A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesDataWorks Summit
 
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Ricard de la Vega
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfRob Winters
 
IPC Data Analysis and Extraction
IPC Data Analysis and ExtractionIPC Data Analysis and Extraction
IPC Data Analysis and Extractionpzybrick
 
A Beginner's Guide to Ember
A Beginner's Guide to EmberA Beginner's Guide to Ember
A Beginner's Guide to EmberRichard Martin
 
Share Point Sat Share Point 2010 And Content Migration
Share Point Sat Share Point 2010 And Content MigrationShare Point Sat Share Point 2010 And Content Migration
Share Point Sat Share Point 2010 And Content MigrationNadir Kamdar
 
Ingestion workflows. Presentation at the Europeana Aggregator Forum 2015
Ingestion workflows. Presentation at the Europeana Aggregator Forum 2015Ingestion workflows. Presentation at the Europeana Aggregator Forum 2015
Ingestion workflows. Presentation at the Europeana Aggregator Forum 2015Europeana
 
Annotations and Europeana @Project Assembly 2014 - Tech Workshops
Annotations and Europeana @Project Assembly 2014 - Tech WorkshopsAnnotations and Europeana @Project Assembly 2014 - Tech Workshops
Annotations and Europeana @Project Assembly 2014 - Tech WorkshopsDavid Haskiya
 
Lantea platform
Lantea platformLantea platform
Lantea platformNeuzilla
 
Data Management Workshop - ETOT 2016
Data Management Workshop - ETOT 2016Data Management Workshop - ETOT 2016
Data Management Workshop - ETOT 2016DataGenic Ltd
 
Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...Victor Holman
 
(ATS6-PLAT04) Query service
(ATS6-PLAT04) Query service (ATS6-PLAT04) Query service
(ATS6-PLAT04) Query service BIOVIA
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)Lucas Jellema
 
GOKb: What it builds on, what it can build (code4lib 2012)
GOKb: What it builds on, what it can build (code4lib 2012)GOKb: What it builds on, what it can build (code4lib 2012)
GOKb: What it builds on, what it can build (code4lib 2012)GOKb Project
 
Streamline Cognos Migrations & Consolidations
Streamline Cognos Migrations & ConsolidationsStreamline Cognos Migrations & Consolidations
Streamline Cognos Migrations & ConsolidationsSenturus
 

Similar to Aggregation workflow (20)

A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companies
 
Information management at vhir ueb using tiki-cms
Information management at vhir ueb using tiki-cmsInformation management at vhir ueb using tiki-cms
Information management at vhir ueb using tiki-cms
 
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
 
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
 
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the Bijenkorf
 
IPC Data Analysis and Extraction
IPC Data Analysis and ExtractionIPC Data Analysis and Extraction
IPC Data Analysis and Extraction
 
A Beginner's Guide to Ember
A Beginner's Guide to EmberA Beginner's Guide to Ember
A Beginner's Guide to Ember
 
Share Point Sat Share Point 2010 And Content Migration
Share Point Sat Share Point 2010 And Content MigrationShare Point Sat Share Point 2010 And Content Migration
Share Point Sat Share Point 2010 And Content Migration
 
Ingestion workflows. Presentation at the Europeana Aggregator Forum 2015
Ingestion workflows. Presentation at the Europeana Aggregator Forum 2015Ingestion workflows. Presentation at the Europeana Aggregator Forum 2015
Ingestion workflows. Presentation at the Europeana Aggregator Forum 2015
 
JOSA TechTalk: Metadata Management
in Big Data
JOSA TechTalk: Metadata Management
in Big DataJOSA TechTalk: Metadata Management
in Big Data
JOSA TechTalk: Metadata Management
in Big Data
 
Annotations and Europeana @Project Assembly 2014 - Tech Workshops
Annotations and Europeana @Project Assembly 2014 - Tech WorkshopsAnnotations and Europeana @Project Assembly 2014 - Tech Workshops
Annotations and Europeana @Project Assembly 2014 - Tech Workshops
 
Lantea platform
Lantea platformLantea platform
Lantea platform
 
Echoes Project
Echoes ProjectEchoes Project
Echoes Project
 
Data Management Workshop - ETOT 2016
Data Management Workshop - ETOT 2016Data Management Workshop - ETOT 2016
Data Management Workshop - ETOT 2016
 
Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...
 
(ATS6-PLAT04) Query service
(ATS6-PLAT04) Query service (ATS6-PLAT04) Query service
(ATS6-PLAT04) Query service
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)
 
GOKb: What it builds on, what it can build (code4lib 2012)
GOKb: What it builds on, what it can build (code4lib 2012)GOKb: What it builds on, what it can build (code4lib 2012)
GOKb: What it builds on, what it can build (code4lib 2012)
 
Streamline Cognos Migrations & Consolidations
Streamline Cognos Migrations & ConsolidationsStreamline Cognos Migrations & Consolidations
Streamline Cognos Migrations & Consolidations
 

More from Europeana_Sounds

The Europeana Sounds Music Information Retrieval Pilot
The Europeana Sounds Music Information Retrieval PilotThe Europeana Sounds Music Information Retrieval Pilot
The Europeana Sounds Music Information Retrieval PilotEuropeana_Sounds
 
Semantic Enrichment & Crowdsourcing
Semantic Enrichment & CrowdsourcingSemantic Enrichment & Crowdsourcing
Semantic Enrichment & CrowdsourcingEuropeana_Sounds
 
Crowdsourcing and Semantic Enrichments for European Cultural Heritage
Crowdsourcing and Semantic Enrichments for European Cultural HeritageCrowdsourcing and Semantic Enrichments for European Cultural Heritage
Crowdsourcing and Semantic Enrichments for European Cultural HeritageEuropeana_Sounds
 
Data processing for digital libraries: the experience of the BnF with Europea...
Data processing for digital libraries: the experience of the BnF with Europea...Data processing for digital libraries: the experience of the BnF with Europea...
Data processing for digital libraries: the experience of the BnF with Europea...Europeana_Sounds
 
Treasuring the sound heritage: the Europeana Sounds project
Treasuring the sound heritage: the Europeana Sounds projectTreasuring the sound heritage: the Europeana Sounds project
Treasuring the sound heritage: the Europeana Sounds projectEuropeana_Sounds
 
Europeana Sounds: improving access to Europe’s digital audio archives
Europeana Sounds: improving access to Europe’s digital audio archives Europeana Sounds: improving access to Europe’s digital audio archives
Europeana Sounds: improving access to Europe’s digital audio archives Europeana_Sounds
 
Challenges on modeling annotations in the europeana sounds project
Challenges on modeling annotations in the europeana sounds projectChallenges on modeling annotations in the europeana sounds project
Challenges on modeling annotations in the europeana sounds projectEuropeana_Sounds
 
A virtual jukebox for europe's sound heritage
A virtual jukebox for europe's sound heritageA virtual jukebox for europe's sound heritage
A virtual jukebox for europe's sound heritageEuropeana_Sounds
 
Creating legal access to sound heritage
Creating legal access to sound heritageCreating legal access to sound heritage
Creating legal access to sound heritageEuropeana_Sounds
 
The Future of Historic Sounds – a prelude
The Future of Historic Sounds – a preludeThe Future of Historic Sounds – a prelude
The Future of Historic Sounds – a preludeEuropeana_Sounds
 
Europeana sounds in a nutshell (August 2015)
Europeana sounds in a nutshell (August 2015)Europeana sounds in a nutshell (August 2015)
Europeana sounds in a nutshell (August 2015)Europeana_Sounds
 
Aggregation status on Year 1
Aggregation status on Year 1Aggregation status on Year 1
Aggregation status on Year 1Europeana_Sounds
 
Publication of Europeana Sounds data in Europeana
Publication of Europeana Sounds data in EuropeanaPublication of Europeana Sounds data in Europeana
Publication of Europeana Sounds data in EuropeanaEuropeana_Sounds
 
Metadata ingestion plan presentation
Metadata ingestion plan presentationMetadata ingestion plan presentation
Metadata ingestion plan presentationEuropeana_Sounds
 
Recap of the previous training session
Recap of the previous training sessionRecap of the previous training session
Recap of the previous training sessionEuropeana_Sounds
 
Short introduction to RDF model based on the EDM sounds profile
Short introduction to RDF model based on the EDM sounds profileShort introduction to RDF model based on the EDM sounds profile
Short introduction to RDF model based on the EDM sounds profileEuropeana_Sounds
 
Europeana sounds in a nutshell (June 2015)
Europeana sounds in a nutshell (June 2015)Europeana sounds in a nutshell (June 2015)
Europeana sounds in a nutshell (June 2015)Europeana_Sounds
 

More from Europeana_Sounds (20)

The Europeana Sounds Music Information Retrieval Pilot
The Europeana Sounds Music Information Retrieval PilotThe Europeana Sounds Music Information Retrieval Pilot
The Europeana Sounds Music Information Retrieval Pilot
 
Semantic Enrichment & Crowdsourcing
Semantic Enrichment & CrowdsourcingSemantic Enrichment & Crowdsourcing
Semantic Enrichment & Crowdsourcing
 
Crowdsourcing and Semantic Enrichments for European Cultural Heritage
Crowdsourcing and Semantic Enrichments for European Cultural HeritageCrowdsourcing and Semantic Enrichments for European Cultural Heritage
Crowdsourcing and Semantic Enrichments for European Cultural Heritage
 
Data processing for digital libraries: the experience of the BnF with Europea...
Data processing for digital libraries: the experience of the BnF with Europea...Data processing for digital libraries: the experience of the BnF with Europea...
Data processing for digital libraries: the experience of the BnF with Europea...
 
Treasuring the sound heritage: the Europeana Sounds project
Treasuring the sound heritage: the Europeana Sounds projectTreasuring the sound heritage: the Europeana Sounds project
Treasuring the sound heritage: the Europeana Sounds project
 
Europeana Sounds: improving access to Europe’s digital audio archives
Europeana Sounds: improving access to Europe’s digital audio archives Europeana Sounds: improving access to Europe’s digital audio archives
Europeana Sounds: improving access to Europe’s digital audio archives
 
Challenges on modeling annotations in the europeana sounds project
Challenges on modeling annotations in the europeana sounds projectChallenges on modeling annotations in the europeana sounds project
Challenges on modeling annotations in the europeana sounds project
 
A virtual jukebox for europe's sound heritage
A virtual jukebox for europe's sound heritageA virtual jukebox for europe's sound heritage
A virtual jukebox for europe's sound heritage
 
Creating legal access to sound heritage
Creating legal access to sound heritageCreating legal access to sound heritage
Creating legal access to sound heritage
 
The Future of Historic Sounds – a prelude
The Future of Historic Sounds – a preludeThe Future of Historic Sounds – a prelude
The Future of Historic Sounds – a prelude
 
Europeana sounds in a nutshell (August 2015)
Europeana sounds in a nutshell (August 2015)Europeana sounds in a nutshell (August 2015)
Europeana sounds in a nutshell (August 2015)
 
Aggregation status on Year 1
Aggregation status on Year 1Aggregation status on Year 1
Aggregation status on Year 1
 
Publication of Europeana Sounds data in Europeana
Publication of Europeana Sounds data in EuropeanaPublication of Europeana Sounds data in Europeana
Publication of Europeana Sounds data in Europeana
 
EDM for Europeana Sounds
EDM for Europeana SoundsEDM for Europeana Sounds
EDM for Europeana Sounds
 
Metadata ingestion plan presentation
Metadata ingestion plan presentationMetadata ingestion plan presentation
Metadata ingestion plan presentation
 
Recap of the previous training session
Recap of the previous training sessionRecap of the previous training session
Recap of the previous training session
 
Short introduction to RDF model based on the EDM sounds profile
Short introduction to RDF model based on the EDM sounds profileShort introduction to RDF model based on the EDM sounds profile
Short introduction to RDF model based on the EDM sounds profile
 
Advanced mappings
Advanced mappingsAdvanced mappings
Advanced mappings
 
Europeana publication
Europeana publicationEuropeana publication
Europeana publication
 
Europeana sounds in a nutshell (June 2015)
Europeana sounds in a nutshell (June 2015)Europeana sounds in a nutshell (June 2015)
Europeana sounds in a nutshell (June 2015)
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

Aggregation workflow

  • 1. Aggregation workflow Cécile Devarenne Operations Officer Metadata training, Europeana Sounds project Athens, 23rd/24th of October 2014
  • 2. Content • Europeana's aggregation team • Europeana Publication Policy • Aggregation workflow • Submission deadlines • Ingestion processes and tools • Acceptance criteria and Europeana validation of data • Guidance and help – Europeana pro • Future plans for aggregation workflow
  • 4. Europeana’s aggregation team • Partner relationships, business development, administration • Henning Scholz, Joris Pekel, Gina Van der Linden • Technical support • Operations officers: content@europeana.eu • Data support, feedback and ingestion of your collections into Europeana portal and API
  • 6. Europeana Publication Policy Clear criteria for acceptance or decline of metadata for publication and for take down of legacy metadata from the Europeana database •Ingestion workflow (deadlines, timelines, prioritization) •Content scope (what is a digital object? what content does Europeana aggregate?) •Technical validation of metadata quality (expected values) •Metadata licensing (CC0) •Rights Statements for digital objects • All digital objects with valid edm:rights chosen from http:// pro.europeana.eu/web/guest/available-rights-statements • Public Domain material labelled with the Public Domain mark in edm:rights • edm:rights & dc:rights not in contradiction
  • 7. Aggregation workflow and submission deadlines: how does it work?
  • 8. 8
  • 9. Submission of data: preliminary steps for your project • (1) Data Exchange Agreement to Europeana (DEA) • Europeana Sounds project needs to submit the signed Data Exchange Agreements for each contributing data provider • The Europeana Data Exchange Agreement establishes the terms under which Europeana can make use of the previews and descriptive metadata provided by cultural institution • More information to be found here: http://pro.europeana.eu/ensuring- permissions-for-aggregators • (2) Data contribution form • One form for the whole project • General information on data to be submitted to Europeana • Schedule of data delivery: ingestion planning • (3) Submission of data samples and feedback taken into account
  • 10. Submission of data: (4) publication cycles • Operations officers work on a monthly cycle • Submission of data in the form of datasets: a coherent batch of records, for the Europeana Sounds project, probably one dataset for each of your data providers • A dataset takes on average 40 mins to process • Around 200 datasets are processed by the Operations officers for each cycle of publication • Datasets go through a full flow of operations before they are production ready • Datasets need to be submitted on time in order for this production cycle to work • Datasets are submitted by the technical/content coordinators of your project • The earlier you submit datasets the more feedback we can give!
  • 11. Submission of data: new provider timeline
  • 12. Submission of data: regular ingestion cycle timeline
  • 13. Ingestion processes and tools: what happens to your data when submitted to Europeana?
  • 14. Europeana’s set of ingestion tools • Unified Ingestion Manager (UIM): orchestrator of data flows triggered in various tools and plugins • SugarCRM (Customer Relationship Management): reference entries for datasets and organisations • REPOX: harvester to get the collections uploaded into Europeana • Europeana’s instance of Mint (Metadata INTeroperability): mapping and editing tool for ingested datasets • Data plugins • Itemization, Europeana identifiers generation • Dereferencing • Enrichment • Redirects • Extraction of hierarchies • Thumbnails caching
  • 16. Steps to get data ingested From the moment your data was submitted: • Checks on raw xml (Browser) • Prior to harvesting • Identification of key issues • Creation/update of dataset information, checks on validity of the supplied harvesting information (SugarCRM - REPOX) • Harvesting (REPOX) • Mapping/editing of datasets (Europeana Mint) • Mapping tool for all datasets • Adapted for Europeana in order to process multiple formats (EDM, ESE, any metadata standard with provided XSLT) • Drag and drop appropriate elements • Quality checks and data cleaning if necessary • Transformation and validation of records according to EDM schema and schematron rules • EDM Internal data: Europeana ready material
  • 17. Steps to get data ingested • Operations on data following transformation: • Itemization and creation/management of Europeana identifiers for permalinks to your records in Europeana • Extraction of hierarchies for datasets including EDM hierarchies • Thumbnails caching • Enrichments of data: • From links to linked data exposed ontologies, generation of additional contextual data (dereferencing) • From analysis of the provided data, automated semantic enrichment (Europeana enrichment) • If necessary (when a change of identifiers was communicated to Europeana), creation of redirections between previous and newly generated identifiers • Data ready! monthly deploy on Europeana portal and API
  • 18. Acceptance criteria: how exactly is the Publication Policy implemented?
  • 19. Acceptance criteria • Data Exchange Agreement to Europeana • Datasets submitted via OAI-PMH protocol, FTP or file • Metadata are accepted for publication after the feedback of the Europeana Operations Officers • EDM schema and guidelines • Rights labeling • Datasets are prioritized for publication if the edm:rights in the majority of the metadata of the dataset is PDM, CC0, CC BY or CC BY-SA
  • 20. Automatic validation: • Validation according to the EDM schema • Validation of the mandatory properties • Unique identifiers within a dataset • Metadata records that don’t meet this validation are invalidated or discarded • Providers can fix issues first and resubmit or let Europeana ingest the records that are valid, and fix the invalid records at a later stage • Validation of urls for thumbnail creation (ImageMagick) Europeana validation
  • 21. Applicable class Mandatory Properties (or alternatives) Aggregation edm:dataProvider Aggregation edm:isShownAt or edm:isShownBy Aggregation edm:provider Aggregation edm:rights Aggregation edm:aggregatedCHO Aggregation edm:ugc (when applicable) ProvidedCHO dc:title or dc:description ProvidedCHO dc:language for text objects ProvidedCHO dc:subject or dc:type or dc:coverage or dcterms:spatial ProvidedCHO edm:type Mandatory properties
  • 22. Validation by the Operations officers: • Feedback is according to the EDM schema and guidelines • Checks on the connections between the EDM classes and the general structure of the data • Correct use of vocabularies, recommendations to include geolocations • Checks on the types of values: literals vs resources (e.g. a thumbnail always need to be a valid url) • Checks on links to digital representations of the objects; if direct links to a file, check that they are of reasonable size • Provision of thumbnails highly recommended • Feedback on (near) duplicate records • Feedback on rights statements in edm:rights and dc:rights • Feedback on any other metadata quality related matters (duplication of properties, encoding in the data, wrongly mapped properties, etc.) Europeana validation
  • 23. • The data is represented according to expectations for both sides • Users can search and retrieve rich content: • Developers can make the best use of the API • Objects are clicked through and re-used from the Europeana portal Happy ingestion :-)
  • 26. Guidance and help 
 
 Europeana Professional:
 http://pro.europeana.eu/provide-data
 
 Content inbox – for all ingestion & metadata related matters 
 content@europeana.eu
  • 28. Future plans for aggregation workflow
  • 29. Future plans for aggregation workflow • Future plans to open up part of the Europeana ingestion workflow to providers • Providers can log-in the Europeana ingestion suite, identify the aggregator/project they work for • Providers can select the datasets they want to update, or add new datasets • Providers can upload their data (OAI-PMH and FTP protocols) • Providers can map their data to EDM, or edit data that is already EDM • Providers can validate the data against the EDM schema and preview them prior to submission • Other processes being considered for refactoring: semantic validation, link checking, thumbnail caching, enrichment
  • 30. Future plans for aggregation workflow • Benefits for providers: • Possibility to map to EDM • Validation according to the EDM schema (with schematron rules we implemented) • Preview before publication • Self service, less dependent on Europeana, saving time (you can do many steps yourself, and you spot errors earlier) • Benefits for Europeana: • Operations scaled up – amount of projects, aggregators and therefore datasets has grown exponentially in the last years • More focus on EDM modeling and metadata related questions • Ingestion process transparent and more connected to the process at aggregators side