Presentation on how to chat with PDF using ChatGPT code interpreter
ROHub-Argos integration
1. This project has received funding from the European research infrastructures
(including e-Infrastructures) under the European Union's Horizon 2020 research
and innovation programme under grant agreement No 101017501
Research Lifecycle Management technologies for
Earth Science Communities and Copernicus users in EOSC
ARGOS integration in ROHub
Raul Palma
RELIANCE Project Coordinator
Head of Data Analytics and Semantics Department
Poznan Supercomputing and Networking Center (PSNC)
Community Call for ARGOS and Data Management Plans (DMPs)
22th Februrary 2023
2. • A continuous, iterative and dynamic
process involving the different stages
research data go through before, during,
and after a research project
Motivation: the Research Data Lifecycle
Source: https://www.reading.ac.uk/research-services/research-data-
management/about-research-data-management/the-research-data-lifecycle
3. Motivation: challenges and vision
Goals
o Support creation and maintenance of machine actionable
o Support data lifecycle stakeholders to actively manage and
related data from a single point
o Enable linking resources related to the data lifecycle (e.g.,
analyze/produce the data, publications, etc.)
o Enable a FAIRness assessment of DMP and related data
o Make DMPs and related resources discoverable in EOSC RG
Challenge
o support the the research data lifecycle mgmt. from the perspective of the various
related stakeholders (e.g., researchers, data providers), who
o define how data will be handled during and after a research project,
o produce/collect this data, process and analyze it, preserve it, and make sure the data is
discoverable, accessible and reusable when appropriate
4. • Holistic solution for the management of ROs
• storage, lifecycle mgmt. & preservation of scientific outcomes
• share and makes these resources available to others
• publish and release resources through a DOI
• discover and reuse pre-existing scientific knowledge.
• Reference platform
• implements natively the RO-crate model and paradigm
• support different stakeholders, with the primary focus on scientists,
researchers, students and enthusiasts
• provides the backbone to a wealth of RO-centric applications and interfaces
across different scientific communities
ROHub overview
2020+
2010-2013 2014-2019
https://reliance.rohub.org/
Onboarded and
integrated in EOSC
5. Goal: Account, describe and share everything about your
research, including how those things are related
Research objects
http://www.researchobject.org
6. Research outcomes and related resources
Each object has its own metadata and repositories
All are first class citizens and are required to make research FAIR
[source RO-Crate: A framework for packaging research products into
FAIR Research Objects]
8. Research objects: Self-describing, chiefly
metadata, objects
[source RO-Crate: A framework for packaging research products into
FAIR Research Objects]
9. Research objects: Self-describing, chiefly
metadata, objects
[source RO-Crate: A framework for packaging research products into
FAIR Research Objects]
10. Summary: RO-Crate in a nutshell
[source RO-Crate: A framework for packaging research products into
FAIR Research Objects]
RO-Crate (Research Object Crate): Practical lightweight approach
to packaging research data entities (any object) with metadata
Aggregate files and/or any URI-addressable content, with
contextual information to aid decisions about re-use: Who What
When Where Why How.
Web Native Machine readable. Human readable. Search engine
friendly. Familiar.
Extensible and Incremental: add additional metadata; nested
and typed by their profile.
Open Community effort
11.
12. ROHUB enables:
• to create and manage high-quality ROs that can be interpreted and reproduced in the future
• to reference, share and preserve scientific studies, campaigns, and observations related resources, including internal
ones, links to external ones as well as other ROs (nested ROs)
• to collaborate with colleagues and to discover new knowledge via advanced exploratory search interfaces that exploit
RO metadata (both explicitly provided and automatically extracted from its content), as well as via an standard search
well as via an standard search API OpenSearch with Geo extensions
• to manage the RO evolution including the ability to generate snapshots and releases and to allow others to fork the
RO to reuse it and extend it.
• to publish the associated work and assign it a DOI to allow its citation in scholarly communications
• to monitor and follow a particular RO, getting notifications about its progress or quality changes
• researchers to build reputation by enabling users to rate and favorite ROs created by others
• to find related works or researchers in a a domain, e.g., for possible collaborations or reviews
High-level features
13. ROHub and added value services
Semantic enrichment
readability, discoverability, reuse
Recommendation
content-based, concentric spheres
Research lifecycle & scholarly communication
collaboration, publication, citation, validation
FAIR/Quality assessment
HQ monitoring & preservation
Social Impact
Sharing, quality
Publish
in EOSC
Publish
as PDF
EOSC integration
AAI
14. ROHub connections with EOSC Core and other
Exchange services
• All RELIANCE services are
onboarded in EOSC marketplace
• RELIANCE services integrates and
rely on different EOSC core and
other EOSC Exchange services
Notebook
Binder
AAI
check-in
EOSC Resource Catalogue
17. Argos Integration
• Argos DMP can be exported in different formats including XML and JSON.
• ROHub can import Argos DMPs in XML (+ JSON)
• The imported DMP generates a research object
18. Argos Integration
• The imported RO includes
• All the information from the DMP in the form of human (a subset)
and machine-readable metadata (reusing standard vocabularies)
• the datasets themselves (physically or by reference) or, if they
are not created/collected at that point in time, a reserved space in
the research object to upload them when they will be available.
19. Argos Integration
• New DMP versions can be propagated in the research object, generating new version of the research
object itself:
• A snapshot of the research object is generated (before importing new version)
• The updated DMP replaces all the
metadata in the research object
21. Argos Integration – Templates
1. ROHub is able to process an Argos
template ( XML), which extracts and
returns the questions and ids of
answers in JSON.
2. The domain expert maps the answers
to predicates reusing existing
vocabularies, i.e., DMP ontology, Dublin
Core, Schema.org. If no predicate from
those vocabularies is suitable, a new
predicate must be created.
3. The filled mapping in JSON is imported
in ROHub
Admin operations
22. Argos Integration – DMPs
1. ROHub imports DMP (based on supported
template). This operations expects DMP in
XML. Optionally, it can be provided the
URL of original DMP, and DMP in JSON
(to complement XML)
2. It is possible to retrieve folders that
corresponds to DMP datasets for given
RO
3. It is possible to return all machine
readable metadata for given folder (DMP
dataset)
23. Argos Integration – DMP upgrades
1. To propagate new version of DMP, it is
neccesary to upgrade the DMP RO,
sending the same information as new
import
2. Considered scenarios:
• New dataset is added
• Existing datasets is being updated
• Existing dataset is being deleted
3. For last case there is a constraint that the
dataset (RO folder) to be deleted can not
contain any data
4. If conditions are met, an RO snapshot is
generated , then all annotations for
original RO and metadata is replaced with
new DMP
24. Argos Integration – ongoing and future work
1. Propagate changes in ROHub to Argos
2. Export existing RO to Argos
• Export RO to XML
• Import generated XML in Argos
3. Define Argos templates aligned with DMP
requirements at the national level
4. Extend Argos GUI to enable direct export to ROHub
5. Extend ROHub GUI to enable direct export to Argos
6. Provide automatic suggestions for mappings based
on NLP methods
7. Integrate national repositories in ROHub to allow
search/connect existing datasets to DMP or to add
new datasets from DMP
8. Improve/test/evaluate at national level, and potential
integration/alignment with national funding agencies
DMPs
9. Asses possibility of using any Argos API if any ?
25. Argos Integration – clarification/issues found
• export DMP as XML file:
• funder, grant and project IDs are internal IDs while in json file (json file - DMP
exported as RDA json) they are Zenodo IDs
• datasets descriptions are missing
• contact data is missing (is present in json file)
• there is a problem with profiles (templates) IDs - they do not correspond to profiles
between users
• some data is missing when compared to json file e.g. in DMP based on a template
National Science Center Poland in json file there is:
"commentFieldValue116df9c2-441d-36a3-1560-a5638c7c5778" : "user answer",
which is not present in XML counterpart
26. • ROHub in EOSC marketplace: https://marketplace.eosc-portal.eu/services/psnc.rohub
• ROHub portal https://reliance.rohub.org/
• ROHub tutorial: https://reliance-eosc.github.io/ROHUB-API_documentation/html/tutorials.html
• ROHub portal documentation: https://reliance-eosc.github.io/rohub-portal-documentation/
• ROHub API library documentation : https://reliance-eosc.github.io/ROHUB-
API_documentation/html/index.html
• ROHub API library example Jupyter Notebooks: https://github.com/RELIANCE-EOSC/sample-
notebooks
• ROHub helpdesk: https://support.pcss.pl/servicedesk/customer/portal/27 or support
email:support@rohub.org
Onboarding and support resources
27. This project has received funding from the European research infrastructures
(including e-Infrastructures) under the European Union's Horizon 2020 research
and innovation programme under grant agreement No 101017501
Research Lifecycle Management technologies for
Earth Science Communities and Copernicus users in EOSC
Thanks!
Raul Palma
rpalma@man.poznan.pl
Editor's Notes
[The “Earth Science Research and Information Lifecycle” can be defined as the continuous, iterative and on-going process used by scientists for conducting, validating and disseminating scientific knowledge. It can undergo an unlimited number of iterations which lead to the development of new and innovative ideas, concepts, techniques and technologies which ultimately benefit both science and society.
The life cycle can be summarized into four main phases that include different categories of stakeholder :
Scientists access information (e.g. raw data or added value products generated by colleagues) and share results; this is reliant on researchers and data providers giving access to the data and related knowledge;
Shared results and information are analysed, interpretative models are generated and discussed with other colleagues (within the team and/or the wider community, which can include external stakeholders), and may require the use of visualisation tools and data analytics;
Discussion leads to novel ideas and concepts which might need validation through further experimentation or data acquisition; requires access to additional data sets held by other data providers;
New results are validated and shared (together with the workflow and processes used to generate them) for further discussion; including dissemination to external stakeholders (e.g. general public, policy makers,). This provides stimulus to new research bringing the process back to step 1. ]
updating them as needs evolve during the research project
incl. information of the underlying context & relations between resources,
e.g., scientific investigations, campaigns and operational processes
including latest RO-crate specification
incl. information of the underlying context & relations between resources,
incl. information of the underlying context & relations between resources,
incl. information of the underlying context & relations between resources,
incl. information of the underlying context & relations between resources,
incl. information of the underlying context & relations between resources,
incl. information of the underlying context & relations between resources,
exposing RO functionalities that can be used by other applications or by data scientists via (jupyter) notebooks
Comprises
backend service exposing a set of APIs
IAM component integrated with EOSC AAI
reference web client application
Python library
EOSC integration (AAI, publishing, storage)
External RO added value services
Semantic enrichment & recommendation
Checklist evaluation
Quality monitoring
PDF generation
Data cubes
exposing RO functionalities that can be used by other applications or by data scientists via (jupyter) notebooks
Comprises
backend service exposing a set of APIs
IAM component integrated with EOSC AAI
reference web client application
Python library
EOSC integration (AAI, publishing, storage)
External RO added value services
Semantic enrichment & recommendation
Checklist evaluation
Quality monitoring
PDF generation
Data cubes
exposing RO functionalities that can be used by other applications or by data scientists via (jupyter) notebooks
Comprises
backend service exposing a set of APIs
IAM component integrated with EOSC AAI
reference web client application
Python library
EOSC integration (AAI, publishing, storage)
External RO added value services
Semantic enrichment & recommendation
Checklist evaluation
Quality monitoring
PDF generation
Data cubes
exposing RO functionalities that can be used by other applications or by data scientists via (jupyter) notebooks
Comprises
backend service exposing a set of APIs
IAM component integrated with EOSC AAI
reference web client application
Python library
EOSC integration (AAI, publishing, storage)
External RO added value services
Semantic enrichment & recommendation
Checklist evaluation
Quality monitoring
PDF generation
Data cubes
exposing RO functionalities that can be used by other applications or by data scientists via (jupyter) notebooks
Comprises
backend service exposing a set of APIs
IAM component integrated with EOSC AAI
reference web client application
Python library
EOSC integration (AAI, publishing, storage)
External RO added value services
Semantic enrichment & recommendation
Checklist evaluation
Quality monitoring
PDF generation
Data cubes
exposing RO functionalities that can be used by other applications or by data scientists via (jupyter) notebooks
Comprises
backend service exposing a set of APIs
IAM component integrated with EOSC AAI
reference web client application
Python library
EOSC integration (AAI, publishing, storage)
External RO added value services
Semantic enrichment & recommendation
Checklist evaluation
Quality monitoring
PDF generation
Data cubes
exposing RO functionalities that can be used by other applications or by data scientists via (jupyter) notebooks
Comprises
backend service exposing a set of APIs
IAM component integrated with EOSC AAI
reference web client application
Python library
EOSC integration (AAI, publishing, storage)
External RO added value services
Semantic enrichment & recommendation
Checklist evaluation
Quality monitoring
PDF generation
Data cubes
exposing RO functionalities that can be used by other applications or by data scientists via (jupyter) notebooks
Comprises
backend service exposing a set of APIs
IAM component integrated with EOSC AAI
reference web client application
Python library
EOSC integration (AAI, publishing, storage)
External RO added value services
Semantic enrichment & recommendation
Checklist evaluation
Quality monitoring
PDF generation
Data cubes
exposing RO functionalities that can be used by other applications or by data scientists via (jupyter) notebooks
Comprises
backend service exposing a set of APIs
IAM component integrated with EOSC AAI
reference web client application
Python library
EOSC integration (AAI, publishing, storage)
External RO added value services
Semantic enrichment & recommendation
Checklist evaluation
Quality monitoring
PDF generation
Data cubes
exposing RO functionalities that can be used by other applications or by data scientists via (jupyter) notebooks
Comprises
backend service exposing a set of APIs
IAM component integrated with EOSC AAI
reference web client application
Python library
EOSC integration (AAI, publishing, storage)
External RO added value services
Semantic enrichment & recommendation
Checklist evaluation
Quality monitoring
PDF generation
Data cubes
exposing RO functionalities that can be used by other applications or by data scientists via (jupyter) notebooks
Comprises
backend service exposing a set of APIs
IAM component integrated with EOSC AAI
reference web client application
Python library
EOSC integration (AAI, publishing, storage)
External RO added value services
Semantic enrichment & recommendation
Checklist evaluation
Quality monitoring
PDF generation
Data cubes
exposing RO functionalities that can be used by other applications or by data scientists via (jupyter) notebooks
Comprises
backend service exposing a set of APIs
IAM component integrated with EOSC AAI
reference web client application
Python library
EOSC integration (AAI, publishing, storage)
External RO added value services
Semantic enrichment & recommendation
Checklist evaluation
Quality monitoring
PDF generation
Data cubes