Presentation about the usage of Research Objects to improve scientific experiment sharing and reproducibility, given at the Dagstuhl Perspective Workshop on the intersection between Computer Sciences and Psychology (July 2015)
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
Research Objects for improved sharing and reproducibility
1. Research Objects for improved
sharing and reproducibility
Dagstuhl Perspective Workshop on the intersection
between Computer Sciences and Psychology
Oscar Corcho
@ocorcho, http://slideshare.net/ocorcho
Ontology Engineering Group
Universidad Politécnica de Madrid
(and the Research Object community group)
3. Some memos from our futuristic scenario
• Don’t publish,
release (ack: Carole
Goble), reloaded
(ack. Paul Groth)
• Don’t just read a
paper, but also view
it, play with it, and
whatever else
• Convert passive
papers into active
scientific storytellers
and alert systems
3
4. A few quotes from this week
• Data (and method) sharing
• Dietrich: The method for investigation is not clearly
described
• Eric: Provide links between articles and datasets
(interlinking of scholarly content)
• William: methods are normally reduced to a tiny
piece of text
• Reproducibility
• Working group on “the present”: Crisis of
replicability is driving increased concern and
interest
• Eric: 70% of science articles are not reproducible
4
6. One of the many origins of “Don’t Publish, Release”
• A day in Granada… (January, 2012)
• Let’s get some of the interesting discussions on the Force11
Dagstuhl meeting into practice
6
7. Scientist
Live RO Live RO
RO snapshot
<<copy>>
Identified by a URI
Some metadata
Some curation
Mostly private (for my group)
RO snapshot
<<copy>>
Identified by a URI
Some metadata
Some curation
Mostly private (for my group
and for paper reviewers)
Librarian/Curator
Scientist
My supervisor calls
me to report my work
My supervisor calls
me again and we
decide to publish our
RO+paper
<<versionOf>>
Archived RO
<<copy, filter
and curate>>
Identified by a URI
Good metadata
and curation
Mostly public
Reviews
received and
final version
published
<<versionOf>>
A new PhD
student
continues my
work
<<copy>>
One of the origins of “Don’t Publish, Release”
8. How do you usually structure your experiment?
• In a set of folders?
• These could be profiles for how you normally
structure your research
• Dropbox? Google Drive? GitHub?
• Overleaf+figshare? Whatever???
8
10. Multi-various products, platforms,
resources
First class citizens - id, manage, credit, track,
profile, focus
A Framework to Bundle, Port and Link (scattered) resources, related
experiments. Metadata Objects that carry Research Context.
Units of exchange.
Research Objects
http://www.researchobject.org
11. Identity
Aggregation
Interpretation:
The objects
How they are
linked together
RO main principles
manifest
Refer to
aggregations
and their
contents
Describe group
& constituents
External ids
Local filesAttribution:
Who , when,
where, why?
Metadata
Description
12. Aggregations
Resource maps
Proxies
Annotation first class
and stand-off
Identity persistence and
resolution, Names
Citation
Identity
Annotation
Aggregation
DOIs
URIs
Handles
ORCID
W3C
OADM
OAI-
ORE
manifest
Point of
extendability
RO main principles: technologies
13. RO Model Ontology
• Defines core concepts
of research objects,
identity, aggregation,
annotation. Used in
the manifest
• http://w3id.org/ro/
14
15. Export, archive, publish and transfer ROs.
File format for storage and distribution of
ROs as a ZIP archive
Includes an RO’s manifest, annotations and
some or all of its aggregated resources
Basis for more specific file formats
Backwards compatible: its zip
Programmatic access: JSON and JSON-
LD manifest, API
https://researchobject.github.io/specifications/bundle/
https://w3id.org/bundle/ doi:10.5281/zenodo.10440
19. Publishing may be as easy as…
• Providing the URL
of the Research
Object to the
publisher, with a
release tag, to start
the review process
(if extra review
needed)
21
26. The Research Method in different disciplines
28
INPUT DATA SCIENTIFIC PROCEDURE EQUIPMENT
INVIVO/VITROINSILICO
27. 29
The Research Method in different disciplines
Lab book
Digital
Log
Laboratory Protocol
(recipe)
Workflow
Experiment
28. The Research Method in different disciplines
30
INPUT DATA SCIENTIFIC PROCEDURE EQUIPMENT
INVIVO/VITROINSILICO
29. Some problems in lab protocols
some of them present
insufficient granularity,
the instructions can be
imprecise or ambiguous due to
the use of natural language.
• Incubate the
centrifuge tubes in a
water bath.
• Incubate the samples
for 5 min with gentle
shaking.
• Rinse DNA briefly in
1-2 ml of wash.
• Incubate at -20C
overnight.
31. SMART Protocols - document
The Protocol as a document
sp:application of the protocol
sp:advantage of the protocol
sp:limitation of the protocol
sp:provenance of the protocol
sp:purpose of the protocol
sp:introduction section
sp:buffer list
sp:equipment and supplies list
sp:kit list
sp:primer list
sp:reagent list
sp:software list
sp:solution list
sp:materials section
exact:caution
sp:critical step
sp:hint
sp:pause point
sp:storage condition
sp:timing
sp:troubleshooting
sp:methods section
sp:experimental
protocol
iao:document iao:document part
iao:textual entity iao:data set
owl:subClassOf
ro:hasPart
ro:partOf
owl:subClassOf
owl:subClassOfowl:subClassOf
ro:hasPart
ro:hasPart
ro:hasPart
ro:partOf
ro:partOf
ro:partOf
owl:subClassOf owl:subClassOf
exact:alert message
owl:subClassOf
Rhetorical and structural components (e.g. introduction, materials, and methods);
Information like application of the protocol, advantages and limitations, list of reagents,
critical steps.
32. SMART Protocols - wf
sp:basic step of
DNA extraction
p-plan:Step
p-plan:Variable
sp:cell disruption
sp:plant tissue
Basic Steps of DNA Extraction
sp:DNA purification
obi:DNA extract
p-plan:hasInputVariable
p-plan:hasOutputVariable
p-plan:hasOutputVariable
owl:subClassOf
sp:digestion
reaction
sp:powdered tissue
owl:subClassOf owl:subClassOf
owl:subClassOf
p-plan:hasInputVariable
sp:digested
contaminant
p-plan:hasInputVariable
p-plan:hasOutputVariable
owl:subClassOfowl:subClassOfowl:subClassOfowl:subClassOf
bfo:isPrecededBy bfo:isPrecededBy
Representation of the workflow aspects in protocols
implicit order in the instructions, following the input output structure.
33. SMART Protocols documentation
• SMART Protocols ontology is available here:
• http://vocab.linkeddata.es/SMARTProtocols/
• Giraldo O, García-Castro A, Corcho O. SMART
Protocols: SeMAntic RepresenTation for
Experimental Protocols. LISC2014
34. SMART Protocols in action
sp= smart protocols, ro= relation ontology
sp:experimental
protocol
sp:DNA extraction
protocol
sp:advantages
sp:sample
owl:subClassOf
rdf:type
sp:title of the protocol
sp:author entry
rdf:type
sp:hasAuthor
sp:hasTitle
rdf:type
ro:partOf
ro:partOf
sp:application
of the protocol
ro:partOf
rdf:type
rdf:type
36. The Research Method in different disciplines
38
INPUT DATA SCIENTIFIC PROCEDURE EQUIPMENT
INVIVO/VITROINSILICO
37. Vocabularies and methodologies for representing and publishing
workflows
39
Interactive
Browsing
(Pubby frontend)
Programatic access
(external apps)
Wings workflow
generation
OPM/PROV
conversion
Publication Share Reuse
Core
Portal
WINGS on local laptop
Workflow
Template
Workflow
Instance
PROV
export
Core
Portal
WINGS on shared host
Workflow
Template
Workflow
Instance
PROV
export
Core
Portal
WINGS on web server
Workflow
Template
Workflow
Instance
PROV
export
Linked
Data
Publication
Users
Other
workflow
environments
RDF
TripleStore
Workflow Provenance
Workflow Plan
Methodology for workflow publishing
Repository of linked workflows:
http://www.opmw.org/sparql
http://purl.org/net/p-plan
http://www.opmw.org/ontology/
Daniel Garijo and Yolanda Gil. 2011. A new approach for publishing workflows: abstractions, standards, and linked data. (WORKS '11). ACM, New York, NY, USA, 47-56.
Daniel Garijo and Yolanda Gil. Augmenting PROV with Plans in P-PLAN: Scientific Processes as Linked Data. In Proceedings of the 2nd International Workshop on Linked
Science 2012, Boston, 2012.
38. Definition of workflow abstractions
40
Catalog of common independent
workflow abstractions (motifs)
Data-oriented motifs: What kind of
manipulations does the workflow have?
Workflow-oriented motifs: How does the
workflow perform its operations
Analysis from 260 different workflows
from 10 domains analyzed belonging to
5 different workflow systems
http://purl.org/net/wf-motifs#
Daniel Garijo, Pinar Alper, Khalid Belhajjame, Oscar Corcho, Yolanda Gil, Carole Goble, Common motifs in
scientific workflows: An empirical analysis, Future Generation Computer Systems, Volume 36, July 2014,
Pages 338-351
39. Finding and evaluating common abstractions
41
https://github.com/dgarijo/FragFlow
http://purl.org/net/wf-fd
Graph mining techniques
Workflow fragment
representation
and linkage
Workflow fragment
Filtering techniques
Daniel Garijo, Oscar Corcho, Yolanda Gil, Boris A.Gutman,Ivo D. Dinov, Paul Thompson, and Arthur W. Toga. FragFlow: Automated Fragment Detection in Scientific
Workflows. In The 10th IEEE International Conference on e-Science, Guaruja, 2014
40. How to preserve Workflows/Research Objects?
42
Three main ways/levels:
•Descriptive reproducibility
•Documentation
•Workflow execution reproducibility
•Can we run the workflow?
•Workflow results reproducibility
•Can we get the same results?
Checklists!
•Corcho et al: Checklist for workflow conservation.
•http://dx.doi.org/10.6084/m9.figshare.1285011
•40 different aspects
•Documentation
•Goals
•Results
•Metadata
•Corcho et al: Checklist for a workflow conservation plan
•http://dx.doi.org/10.6084/m9.figshare.1285012
•Based on the DCC’s data management plan
44. Some results
• Pegasus Montage Workflow
• Astronomy workflow
• Construct large image mosaics of the sky
• Montage Software distribution
• 59 binaries
• Target IaaS Cloud Providers
• Amazon EC2 & Futuregrid
• Vagrant
47
RO available at http://pegasus.isi.edu/publications/reppar
45. Lessons learned for Anna
• Research Objects as a
concept
• Identity, annotation,
aggregation
• Adapted to the
tools/infrastructure for each
domain
• With some tooling available
already
• It’s not just data preservation
but also methods
• Lab protocols
• Computational workflows
• Understand what
reproducibility means for you
48
46. Research Objects for improved
sharing and reproducibility
Dagstuhl Perspective Workshop on the intersection
between Computer Sciences and Psychology
Oscar Corcho
@ocorcho, http://slideshare.net/ocorcho
Ontology Engineering Group
Universidad Politécnica de Madrid
(and the Research Object community group)
47. Acknowledgements
• The Semantic e-Science team at UPM
• Carlos Badenes
• Daniel Garijo
• Olga Giraldo
• Rafael González-Cabero
• Idafen Santana
• The Wf4Ever team
• Carole Goble, José Manuel Gómez Pérez, Raúl Palma, Jun Zhao,
Stian Soiland-Reyes, Khalid Belhajjame, José Enrique Ruíz, Marco
Roos, Lourdes Verdes-Montenegro, Norman Morrison, Sean
Bechoffer, Graham Klyne, Matt Gamble, and a large etcetera
• The Research Object community group
• http://www.researchobject.org/
50
Editor's Notes
We will now illustrate research object lifecycle through a small example that shows how all the resources contained in a research object are bundled as the scientific experiment progresses. This example lifecycle is summarized graphically on the slide.
A research object normally starts its life as an empty Live Research Object, with a first design of the experiments to be performed (which determines what workflows and resources will be added, by either retrieving them from an existing platform or creating them from scratch). Then the research object is filled incrementally by aggregating such workflows that are being created, reused or re-purposed, datasets, documents, etc. Any of these components can be changed at any point in time, removed, etc.
In our scenario, we observe several points in time when this Live Research Object gets copied and kept into a Research Object snapshot, which aims to reflect the status of the research object at a given point in time. Such a snap- shot may be useful to release the current version of the research outcome of an experiment, submit it to be peer reviewed or to be published (with the appro- priate access control mechanisms), share it with supervisors or collaborators, or for acknowledgement and citation purposes.
A snapshot may also contain a paper describing the research object in general and the experiment in particular, depending on the policies of the corresponding scientific communication channel, e.g., workshop, conference or journal. Such snapshots have their own identifiers, and may even be preserved, since it may be useful to be able to track the evolution of the research object over time, so as to allow, for example, retrieval of a previous state of the research object, reporting to funding agencies the evolution of the research conducted, etc.
At some point in time, the research object may get published and archived, in what we know as an Archived Research Object, with a permanent identifier. Such a version of our research object may be the result of copying completely our Live Research Object, or it may be the result of some filtering or curation process where only some parts of the information available in the aggregation are actually published for others to reuse.
As illustrated in Figure 4, a user can use an existing Archived Research Object as a starting point to his or her research, e.g., to repurpose it or its parts, in which case a new Live Research Object is created based on the existing Archived Research Object. This is only one of the many potential scenarios that could be foreseen for the lifecycle of a workflow-centric research object and we are currently defining different storyboards for their evolution. One important aspect to highlight is the fact that during its whole lifecycle, the research object is aggregating new ob- jects. The annotation process during the lifecycle of experimentation allows the generation of sufficient metadata about the research objects to support preser- vation and sharing. Therefore, when a scientists decides to preserve it most of the annotations that will be needed for that preservation process will be already available inside the research object.
Packaging – physical and logical containers
Open Archives Initiation Object Reuse and Exchange (OAI ORE) is a standard for describing aggregations of web resources
http://www.openarchives.org/ore/
Uses a Resource Map to describe the aggregated resources
Proxies allow for statements about the resources within the aggregation
Capturing context and viewpoints
Several concrete serialisations
RDF/XML, Atom, RDFa
Open Annotation specification is a community developed data model for annotation of web resources
http://www.openannotation.org/spec/core/
Developed by the W3C Open Annotation Community Group
Allows for “stand-off” annotations
Annotation as a first class citizen
Developed to fit with Web Architecture
Capture a Research Object to a single file or byte-stream by including its manifest, annotations and some or all of its aggregated resources for the purposes of exporting, archiving, publishing and transferring research objects.
Capture a Research Object to a single file or byte-stream by including its manifest, annotations and some or all of its aggregated resources for the purposes of exporting, archiving, publishing and transferring research objects.
So not everyone have access to set up a RESTful semantic web servers, in particular we’ve run into this with desktop applications – users just want to save files and then they decide where they are stored. So we decided to write a serialization format for Research Object, which we call the RO Bundle.
We wanted this to be accessible for application developers, so we’ve adopted ZIP and JSON, and in a way this would let you create research objects and make annotations without ever seing any RDF.
Preservation
Keep it in a perfect/unaltered condition.
Preserving the integrity and authenticity.
Conservation
Action of prolonging the existence of significant objects.
Researching, recording and retaining all information related to the object.
Documenting
Restoration
Return something to an earlier condition
Reconstruction
Forming again, with improvements or removal of defects
http://en.wikipedia.org/wiki/Wilderness#Conservation_vs._preservation: “Two opposing factions had emerged within the environmental movement by the early 20th century: the conservationists and the preservationists. The conservationists (such as Gifford Pinchot) focused on the proper use of nature, whereas the preservationists sought the protection of nature from use.[9] Put another way, conservation sought to regulate human use while preservation sought to eliminate human impact altogether.”
This is the What: detect common groups of tasks. vs How: exact and inexact FGM techniques vs Why? T.
The ontologies are available here and recently were accepted a paper in the workshop linked science 2014 where is describing the ontology design.
So far, we have covered a way about how to report formally a lab protocol.
This is an overview of the system we propose. WICUS stands for Workflow Infrastructure Conservation Using Semantics…