FDO as building block for digitization technology stacks
1. This project has received funding from the European research infrastructures
(including e-Infrastructures) under the European Union's Horizon 2020 research
and innovation programme under grant agreement No 101017501
Research Lifecycle Management technologies for
Earth Science Communities and Copernicus users in EOSC
FAIR digital objects as a building block for
digitalization technology stacks
Raul Palma
RELIANCE Project Coordinator
Head of Data Analytics and Semantics Department
Poznan Supercomputing and Networking Center (PSNC)
4th EMMC International Workshop 2023
27th April 2023
2. A continuous, iterative and dynamic process followed by scientists for conducting, validating and
disseminating scientific knowledge
The research and Information Lifecycle
3. Motivation for FAIR digital objects & Open Science
A. Fouilloux, et. Al. FAIR Research Objects for realising Open Science with RELIANCE EOSC Project
Need: mechanisms to manage data, methods and other resources which could: i) enhance visibility of scientific
breakthroughs; ii) encourage reuse, and iii) foster a broader research accessibility
4. Goal: Account, describe and share everything about your
research, including how those things are related
Research objects
http://www.researchobject.org
5. Research outcomes and related resources
Each object has its own metadata and repositories
All are first class citizens and are required to make research FAIR
[source RO-Crate: A framework for packaging research products into
FAIR Research Objects]
7. Research objects: Self-describing, chiefly
metadata, objects
[source RO-Crate: A framework for packaging research products into
FAIR Research Objects]
8. Research objects: Self-describing, chiefly
metadata, objects
[source RO-Crate: A framework for packaging research products into
FAIR Research Objects]
9. From FAIR data to FAIR Digital Objects
C. Goble, S. Soiland-Reyes. RO-Crate: A framework for packaging research products into FAIR Research Objects
10. RO-crate as FAIR Digital Object
C. Goble, S. Soiland-Reyes. RO-Crate: A framework for packaging
research products into FAIR Research Objects
Files: physical/links
RO: linked data/ZIP
implementation of FAIR Digital Objects using RO Crate
12. Summary: RO-Crate in a nutshell
[source RO-Crate: A framework for packaging research products into
FAIR Research Objects]
RO-Crate (Research Object Crate): Practical lightweight approach
to packaging research data entities (any object) with metadata
Aggregate files and/or any URI-addressable content, with
contextual information to aid decisions about re-use: Who What
When Where Why How.
Web Native Machine readable. Human readable. Search engine
friendly. Familiar.
Extensible and Incremental: add additional metadata; nested
and typed by their profile.
Open Community effort
13. • Holistic solution for the management of Research
Objects
• Reference platform
• implements natively the RO-crate model and paradigm
• support different stakeholders, with the primary focus on scientists,
researchers, students and enthusiasts
• provides the backbone to a wealth of RO-centric applications and
interfaces across different scientific communities
ROHub overview
2020+
2010-2013 2014-2019
https://reliance.rohub.org/
Onboarded and
integrated in EOSC
14. ROHUB enables:
• to create and manage high-quality ROs that can be interpreted and reproduced in the future
• to reference, share and preserve scientific studies, campaigns, and observations related resources, including internal
ones, links to external ones as well as other ROs (nested ROs)
• to collaborate with colleagues and to discover new knowledge via advanced exploratory search interfaces that exploit RO
metadata (both explicitly provided and automatically extracted from its content), as well as via an standard search API
via an standard search API OpenSearch with Geo extensions
• to manage the RO evolution including the ability to generate snapshots and releases and to allow others to fork the RO
to reuse it and extend it.
• to publish the associated work and assign it a DOI to allow its citation in scholarly communications
• to monitor and follow a particular RO, getting notifications about its progress or quality changes
• researchers to build reputation by enabling users to rate and favorite ROs created by others
• to find related works or researchers in a a domain, e.g., for possible collaborations or reviews
High-level features
15. ROHub and added value services
Semantic enrichment
readability, discoverability, reuse
Recommendation
content-based, concentric spheres
Research lifecycle & scholarly communication
collaboration, publication, citation, validation
Completeness assessment
HQ monitoring & preservation
Impact
Sharing, rating
Publish RO-crate
in EOSC
Publish
as PDF
FAIR assessment
RO and components level
16. ROHub connections with EOSC Core and other
Exchange services
Notebook
Binder
AAI
check-in
EOSC Resource Catalogue
17. • Anne, a scientist from Oslo and her team
want to perform a climate change research
under the atmospheric perspective.
• Anne goes to EOSC resource catalogue where
she searches for existing results and finds an
executable RO
• She opens the RO and finds a Jupyter
notebook that was used to analyze the data.
• Anne clicks on the notebook and it is opened in EGI
notebooks. She uses Data Cubes to exploit EO data
provided by the Copernicus program, and saves
results in EGI DataHub
• Anne creates her own RO (forking the reused RO)
and starts to work on it, i.e., aggregates the new
notebook, the data cube, and other resources.The
new RO will also appear in EOSC explore
• Anne invites colleagues to contribute; the shared RO
will keep track of the provenance of contributions.
They can be notified when the RO is modified.
• Before publishing, they make a self-
assessment of the FAIR-ness of their
research, and check the quality of the RO
• Once RO is ready, Anne makes a snapshot and
publish it in Zenodo with a DOI.This new RO will
then appear also in EOSC catalogue
Find research work, access and reproduce it, reuse it
in new research, collaborate, assess quality and
publish it leveraging different EOSC services
check-in
EOSC services in support of OS
18. EOSC architecture overview
365 Resources onboarded in EOSC as of end
of October 2022 (source EOSC Metrics Portal)
EOSC
Exchange
EOSC
Core
EOSC
Interoperability
Framework
Execution
Framework
EOSC
Support
Activities
Community
A
Thematic
Resources
Thematic
service
Thematic
service
Thematic
resources
Core coordination functions
Core technical platform
Regional D
Regional
Resources
Thematic
service
Thematic
service
Regional
resources
Cluster B
Thematic
Resources
Thematic
service
Thematic
service
Thematic
resources
Community
C
Thematic
Resources
Thematic
service
Thematic
service
Thematic
resources
Horizontal services
Regional clusters
Thematic clusters
Cluster services
onboarded
eInfrastructures
INFRAEOSC-07
EOSC provisioning / expansion
Horizontal services
and resources from
07 projects and e-
infras onboarded
EOSC Horizontal services
are delivered (e.g. data,
compute and other
research enabling services)
Ability to create thematic
execution environments /
VREs based on integration of
compliant thematic, horizontal,
and core resources
19. This project has received funding from the European research infrastructures
(including e-Infrastructures) under the European Union's Horizon 2020 research
and innovation programme under grant agreement No 101017501
Research Lifecycle Management technologies for
Earth Science Communities and Copernicus users in EOSC
Thanks!
Raul Palma
rpalma@man.poznan.pl
20. • ROHub in EOSC marketplace: https://marketplace.eosc-portal.eu/services/psnc.rohub
• ROHub portal https://reliance.rohub.org/
• ROHub tutorial: https://reliance-eosc.github.io/ROHUB-API_documentation/html/tutorials.html
• ROHub portal documentation: https://reliance-eosc.github.io/rohub-portal-documentation/
• ROHub API library documentation : https://reliance-eosc.github.io/ROHUB-
API_documentation/html/index.html
• ROHub API library example Jupyter Notebooks: https://github.com/RELIANCE-EOSC/sample-
notebooks
• ROHub helpdesk: https://support.pcss.pl/servicedesk/customer/portal/27 or support
email:support@rohub.org
Onboarding and support resources
Editor's Notes
I would like to start the motivation to research objects with an overview of the research lifecycle. Of course we can think or find more granular representations for different domains/applications, but in general the research lifecycle starts with…
Science is incrementally built on results which can be reused and therefore reproduced for validation
[The “Earth Science Research and Information Lifecycle” can be defined as the continuous, iterative and on-going process used by scientists for conducting, validating and disseminating scientific knowledge. It can undergo an unlimited number of iterations which lead to the development of new and innovative ideas, concepts, techniques and technologies which ultimately benefit both science and society.
The life cycle can be summarized into four main phases that include different categories of stakeholder :
Scientists access information (e.g. raw data or added value products generated by colleagues) and share results; this is reliant on researchers and data providers giving access to the data and related knowledge;
Shared results and information are analysed, interpretative models are generated and discussed with other colleagues (within the team and/or the wider community, which can include external stakeholders), and may require the use of visualisation tools and data analytics;
Discussion leads to novel ideas and concepts which might need validation through further experimentation or data acquisition; requires access to additional data sets held by other data providers;
New results are validated and shared (together with the workflow and processes used to generate them) for further discussion; including dissemination to external stakeholders (e.g. general public, policy makers,). This provides stimulus to new research bringing the process back to step 1. ]
incl. information of the underlying context & relations between resources,
incl. information of the underlying context & relations between resources,
incl. information of the underlying context & relations between resources,
incl. information of the underlying context & relations between resources,
incl. information of the underlying context & relations between resources,
RO-crate provides a straightforward and lightweight implementation of FDOs, which are part of the long-term vision of EOSC.
FDO are defined as sequence of bits that represents an informational unit and is presented according to the FAIR principle
FDO is a unit of data that is able to interact with automated data processing systems. FDOs are accessed through their PID. They may receive requests for operations, which they may inherit from their type, as known from object-oriented programming. Through operations, their metadata can be accessed, which in turn describes the enclosed data content (a bit sequence).
A trusted and open virtual environment for the scientific community with seamless access to services (with highest TRLs) addressing the whole research lifecycle:
EOSC aim to provide 1.7m EU researchers an environment with free, open services for data storage, management, analysis and re-use across disciplines
A web of FAIR data and services
Federation of eInfra and Research Infrastructures (RIs)
EOSC core, the set of enabling services needed to operate the EOSC
EOSC exchange registering resources and services from research infrastructures, other EOSC projects and science clusters to the EOSC and integrating them with the EOSC core functionalities
the EOSC interoperability framework will provide guidelines for providers that want to integrate services or data into EOSC
It recommends beginning with a first iteration to establish a Minimum Viable EOSC (MVE) addressing the needs of publicly funded researchers exploiting openly available data.
FDOs are accessed through their PID. They may receive requests for operations, which they may inherit from their type.
Through operations, their metadata can be accessed, which in turn describes the enclosed data content (a bit sequence).
The content of a DO is encoded as a structured bit-sequence and stored in repositories. It is assigned a globally unique, persistent and resolvable identifier (PID), as well as rich metadata (descriptive, scientific, system, provenance, rights, etc.). Metadata descriptions themselves are DOs. Moreover, DOs can be aggregated to collections which are also DOs with a content consisting of the references to its components.
incl. information of the underlying context & relations between resources,
e.g., scientific investigations, campaigns and operational processes
including latest RO-crate specification
e.g., scientific investigations, campaigns and operational processes
including latest RO-crate specification
exposing RO functionalities that can be used by other applications or by data scientists via (jupyter) notebooks
Comprises
backend service exposing a set of APIs
IAM component integrated with EOSC AAI
reference web client application
Python library
EOSC integration (AAI, publishing, storage)
External RO added value services
Semantic enrichment & recommendation
Checklist evaluation
Quality monitoring
PDF generation
Data cubes