SlideShare a Scribd company logo
1 of 54
Capturing Context in Scientific Experiments:
Towards Computer-Driven Science
Daniel Garijo
Information Sciences Institute and
Department of Computer Science
https://w3id.org/people/dgarijo
@dgarijov
dgarijo@isi.edu
A prediction of the futureā€¦ from the past
Useful for:
ā€¢ Every day tasks
ā€¢ Organize agenda
ā€¢ Calls
ā€¢ Look for information
ā€¢ Research features
ā€¢ Summarize related work
ā€¢ Reuse and comparison of
work
ā€¢ Highlights
ā€¢ Do new data analyses
Capturing Context in Scientific Experiments: Towards Computer-Driven Science 2
Source: https://www.businessinsider.com.au/apple-future-computer-knowledge-navigator-john-sculley-george-lucas-2017-10,
https://www.youtube.com/watch?v=QRH8eimU_20
The knowledge navigator (Apple, 1987)
Meeting expectationsā€¦
ā€¢ In terms of Data
ā€¢ Open datasets
ā€¢ Open metadata portals
ā€¢ In terms of Software
ā€¢ Open Source repositories
ā€¢ Containers and virtual machines
ā€¢ In terms of Publications
ā€¢ Open journals
ā€¢ Open methods/protocols
3Capturing Context in Scientific Experiments: Towards Computer-Driven Science
What are we missing?
ā€¢ Methods in publications are not designed for intelligent systems
ā€¢ Objectives, hypotheses, methodology and conclusions are tailored for humans
ā€¢ Link between data, software and publications is not clear (if exists)
ā€¢ Functionality and instructions for executing software requires specific
domain expertise
ā€¢ Publications are difficult to reuse and reproduce
4
Retracted Scientiļ¬c Studies: A Growing List - NYTimes.com
Sections Home Search Skip to content
Advertisement
Email
Share
Tweet
More
Search
Subscribe
Log In 0 Settings
Close search
search sponsored by
Search NYTimes.com
SUBSCRIBE NOW
5/ 29/ 15, 1:49 AMRetracted Scientiļ¬c Studies: A Growing List - NYTimes.com
The retraction by Science of a study of changing attitudes about gay marriage is
the latest prominent withdrawal of research results from scientific literature.
And it very likely won't be the last. A 2011 study in Nature found a 10-fold
increase in retraction notices during the preceding decade.
Many retractions barely register outside of the scientific field. But in some
instances, the studies that were clawed back made major waves in societal
discussions of the issues they dealt with. This list recounts some prominent
retractions that have occurred since 1980.
Photo
In 1998, The Lancet, a British medical journal,
published a study by Dr. Andrew Wakefield
that suggested that autism in children was
caused by the combined vaccine for measles,
mumps and rubella. In 2010, The Lancet
retracted the study following a review of Dr.
Wakefield's scientific methods and financial
conflicts.
Despite challenges to the study, Dr.
Wakefield's research had a strong effect on
many parents. Vaccination rates tumbled in
Britain, and measles cases grew. American
antivaccine groups also seized on the research. The United States had more
cases of measles in the first month of 2015
than the number that is typically diagnosed in a full year.
Vaccinesand
Autism
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
The Cost of Reproducibility
5
ā€¢ Necessary to fill in the gaps
ā€¢ 2 months of effort in reproducing published method [Kinnings et al, PLOS 2010]
ā€¢ Authors expertise was required
Comparison of
ligand binding
sites
Comparison of dissimilar
protein structures
Graph network
generation
Molecular Docking
[Garijo et al PLOS]
Collaboration with UCSD
5Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Scientist-Driven Science
6
Scientist
Scientist +
Automated
Tools
Scientist +
Intelligent
System
Intelligent Systems help:
ā€¢ Comparing
ā€¢ Reusing/Repurposing
ā€¢ Testing new hypotheses
ā€¢ Explaining results
Requirements:
ā€¢ Functionality
ā€¢ Relations between data,
software and method
ā€¢ Provenance
Scientists:
ā€¢ Keep their own records
ā€¢ Write their own software
ā€¢ Data cleaning
ā€¢ Reformatting
ā€¢ Analysis
ā€¢ Run the experiments
ā€¢ Manually analyze results
and compare to state of
the art
Automated Tools help:
ā€¢ Searching
ā€¢ Setting up execution
ā€¢ Visualizing
ā€¢ Sharing
Requirements
ā€¢ Data/Dataset metadata
ā€¢ Software/Software
metadata
ā€¢ Method description
ā€¢ User/domain expertise
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Context of a computational experiment
Outline
ā€¢ Capturing and publishing context of computational experiments
ā€¢ From scientific workflows to Linked Data
ā€¢ Capturing software functionality
ā€¢ Representing software metadata
ā€¢ Using context to facilitate reusability and exploration of experiments
ā€¢ Detecting commonalities among experiments
ā€¢ Explaining computational results
ā€¢ Using context in Intelligent Systems
ā€¢ Hypothesis testing
ā€¢ Environmental sciences modeling
ā€¢ A vision for context capture in computer-driven science
7Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Introduction
Lab book
Digital Log
Laboratory Protocol
(recipe)
Scientific Workflow
Experiment
In silico experiment
8
Background: Computational Experiments
Capturing Context in Scientific Experiments: Towards Computer-Driven Science 8
Outline
ā€¢ Capturing and publishing context of computational experiments
ā€¢ From scientific workflows to Linked Data
ā€¢ Capturing software functionality
ā€¢ Representing software metadata
ā€¢ Using context to facilitate reusability and exploration of experiments
ā€¢ Detecting commonalities among experiments
ā€¢ Explaining computational results
ā€¢ Using context in Intelligent Systems
ā€¢ Hypothesis testing
ā€¢ Environmental sciences modeling
ā€¢ A vision for context capture in computer-driven science
9Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Workflow representation: Structures interchanged in the workflow lifecycle
Dataset
Stemmer
algorithm
Result
Term weighting
algorithm
FinalResult
File:
Dataset123
LovinsStemmer
algorithm
Id:resultaa1
IDF
algorithm
Id:fresultaa2
Workflow
Template Workflow Instance Workflow Execution Trace
Design Instantiation Execution
File:
Dataset124
PorterStemmer
algorithm
Id:resultaa1
IDF
algorithm
Id:fresultaa2
File:
Dataset123
LovinsStemmer
execution
Id:resultaa1
IDF
execution
Id:fresultaa2
File:
Dataset123
LovinsStemmer
execution
Id:resultaa1
IDF
execution
Id:fresultaa2
File:
Dataset124
PorterStemmer
execution
Id:resultaa1
IDF
execution
Id:fresultaa2
File:
Dataset124
PorterStemmer
execution
Id:resultaa1
IDF
execution
Id:fresultaa2
File:
Dataset124
PorterStemmer
execution
Id:resultaa1
IDF
execution
Id:fresultaa2
File:
Dataset123
LovinsStemmer
execution
Id:resultaa1
IDF
execution
Id:fresultaa2
ā€¦
ā€¦
Id:resultaa1
Workflow Lifecycle
Capturing Context in Scientific Experiments: Towards Computer-Driven Science 11
Requirements
Workflow template description
Workflow execution trace description
Workflow attribution
Workflow metadata
Link between templates and executions
Requirements for workflow Representation
[Garijo et al., 2017 FGCS]
Plan: P-Plan [Garijo et al 2012]
http://purl.org/net/p-plan
Provenance: PROV (W3C)
[Lebo et al 2013]
http://www.w3.org/ns/prov#
Dublin Core, PROV (W3C)
11Capturing Context in Scientific Experiments: Towards Computer-Driven Science
OPMW: Extending provenance standards and plan models
template1
opmw:isVariableOfTemplate
opmw:isVariable
OfTemplate
Input Dataset
Term Weighting
Topics
p-plan:isOutputVarOf
p-plan:hasInputVar
opmw:isStepOf
Template
opmw:correspondsTo
Template
opmw:corresponds
toTemplateArtifact
opmw:corresponds
toTemplateProcess
opmw:corresponds
toTemplateArtifact
opmw:Workflow
ExecutionProcess
opmw:Workflow
ExecutionAccount
prov:Entity
prov:Activity
prov:Bundle
PROV, OPM Extension
opmv:Artifact
opmo:Account
opmv:Process
opmw:Workflow
ExecutionArtifact
opmw:Workflow
TemplateArtifact
opmw:Workflow
TemplateProcess
opmw:Workflow
Template
p-plan:Plan
p-plan:Step
p-plan:Variable
P-Plan extension
Class Object property
Legend
Instance ofInstance Subclass of
execution1
File: Dataset123
IDF
(java)
File: FResultaa2
prov:wasGeneratedBy
prov:used
opmo:account
opmo:account
opmo:account
http://www.opmw.org/ontology/
A Vocabulary for Workflow Representation: OPMW
Capturing Context in Scientific Experiments: Towards Computer-Driven Science 13
Publishing workflows as Linked Data
Specification
Why Linked Data?
ā€¢Facilitates exploitation of workflow resources in an homogeneous manner
Adapted methodology from [VillazĆ³n-Terrazas et al 2011]
Tested it for the WINGS workflow system
1
Base URI = http://www.opmw.org/
Ontology URI = http://www.opmw.org/ontology/
Assertion URI = http://www.opmw.org/export/resource/ClassName/instanceName
Examples:
http://www.opmw.org/export/resource/WorkflowTemplate/ABSTRACTSUBWFDOCKING
http://www.opmw.org/export/resource/WorkflowExecutionAccount/ACCOUNT1348629
350796
Publishing scientific workflows as Linked Data
14Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Publishing workflows as Linked Data
Why Linked Data?
ā€¢Facilitates exploitation of workflow resources in an homogeneous manner
Adapted methodology from [VillazĆ³n-Terrazas et al 2011]
Tested it for the WINGS workflow system
Publishing scientific workflows as Linked Data
Specification Modeling
1 2
OPMW
P-Plan
OPM DC
PROV
15Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Publishing workflows as Linked Data
Why Linked Data?
ā€¢Facilitates exploitation of workflow resources in an homogeneous manner
Adapted methodology from [VillazĆ³n-Terrazas et al 2011]
Tested it for the WINGS workflow system
Publishing scientific workflows as Linked Data
16Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Specification Modeling Generation
1 2 3
Workflow system
Workflow
Template
Workflow
execution
OPMW
export
OPMW
RDF
Publishing workflows as Linked Data
Why Linked Data?
ā€¢Facilitates exploitation of workflow resources in an homogeneous manner
Adapted methodology from [VillazĆ³n-Terrazas et al 2011]
Tested it for the WINGS workflow system
Publishing scientific workflows as Linked Data
17Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Specification Modeling Generation Publication
1 2 3 4
RDF
Triple
store
Permanent
web-
accessible
file
store
RDF Upload Interface
SPARQL
Endpoint
OPMW
RDF
Publishing workflows as Linked Data
Why Linked Data?
ā€¢Facilitates exploitation of workflow resources in an homogeneous manner
Adapted methodology from [VillazĆ³n-Terrazas et al 2011]
Tested it for the WINGS workflow system
Publishing scientific workflows as Linked Data
18Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Specification Modeling Generation Publication
1 2 3 4
Exploitation
5
Curl Linked Data Browser SPARQL
endpoint
Workflow explorer
Outline
ā€¢ Capturing and publishing context of computational experiments
ā€¢ From scientific workflows to Linked Data
ā€¢ Capturing software functionality
ā€¢ Representing software metadata
ā€¢ Using context to facilitate reusability and exploration of experiments
ā€¢ Detecting commonalities among experiments
ā€¢ Explaining computational results
ā€¢ Using context in Intelligent Systems
ā€¢ Hypothesis testing
ā€¢ Machine learning analysis
ā€¢ Environmental sciences modeling
ā€¢ A vision for context capture in computer-driven science
18Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing software functionality
[Garijo et al 2014a] (Collaboration with U. of Manchester)
Is it possible to generalize workflow steps based on their functionality in an
experiment?
19Capturing Context in Scientific Experiments: Towards Computer-Driven Science
ā€¢ What kind of data manipulations are performed in a workflow?
ā€¢E.g.:
ā€¢Data retrieval
ā€¢Data preparation
ā€¢Data curation
ā€¢Data visualization
ā€¢ etc.
Capturing software functionality
[Garijo et al 2014a] (Collaboration with U. of Manchester)
Analyzed software steps of 260 workflows from 4 different workflow systems
Created a catalog of workflow step functionalities (motifs)
Guidelines for annotating workflows
Catalog available at: http://purl.org/net/wf-motifs#
20Capturing Context in Scientific Experiments: Towards Computer-Driven Science
= 260 workflows
89 12526 20
Outline
ā€¢ Capturing and publishing context of computational experiments
ā€¢ From scientific workflows to Linked Data
ā€¢ Capturing software functionality
ā€¢ Representing software metadata
ā€¢ Using context to facilitate reusability and exploration of experiments
ā€¢ Detecting commonalities among experiments
ā€¢ Explaining computational results
ā€¢ Using context in Intelligent Systems
ā€¢ Hypothesis testing
ā€¢ Environmental sciences modeling
ā€¢ A vision for context capture in computer-driven science
21Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Software Metadata
[Gil et al 2015]
ā€¢ Scientific workflows capture some software metadata
ā€¢ High amount of software not used in scientific workflows
ā€¢ Software in open repositories often have missing metadata
ā€¢ How to use it?
ā€¢ What can I use it with?
ā€¢ What are the dependencies?
ā€¢ Is it still maintained?
ā€¢ How can I contribute?
ā€¢ ā€¦
ā€¢ Ontology for scientific software metadata
ā€¢ Described with scientist in mind:
ā€¢ How can scientist contribute to populate it?
ā€¢ What do scientists need in terms of software?
22Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Software Metadata: Categories
23Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Used in the OntoSoft
metadata Registry:
http://ontosoft.org/portals
http://ontosoft.org/software
Using the ontology in the Ontosoft software registry
24Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Software entries
from distributed
repositories are
readily accessible
Semantic
search
Comparison matrix
of software entries
PIHM PIHMgis DrEICH TauDEM WBMsed
nto$
o%$
Metadata
completion
highlighted
Software is
contrasted
by property
Outline
ā€¢ Capturing and publishing context of computational experiments
ā€¢ From scientific workflows to Linked Data
ā€¢ Capturing software functionality
ā€¢ Representing software metadata
ā€¢ Using context to facilitate reusability and exploration of experiments
ā€¢ Detecting commonalities among experiments
ā€¢ Explaining computational results
ā€¢ Using context in Intelligent Systems
ā€¢ Hypothesis testing
ā€¢ Environmental sciences modeling
ā€¢ A vision for context capture in computer-driven science
25Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Detecting commonalities in computational experiments
[Garijo et al 2014b]
PROBLEMS to address:
ā€¢ Workflows have many detailed steps and may be difficult to understand
ā€¢ The general method may not apparent
ā€¢ How are different workflow related?
ā€¢ What steps do they have in common?
26Capturing Context in Scientific Experiments: Towards Computer-Driven Science
A
B
C
A
F
D
A
B
C
G
B
H
A
B
F
B
E
Common workflow fragments
Workflow 1 Workflow 2 Workflow 3
1
2
3
4
28Capturing Context in Scientific Experiments: Towards Computer-Driven Science
A method for detecting reusable workflow fragments
[Garijo et al 2014b]
Dataset
Stemmer
algorithm
Result
Term weighting
algorithm
FinalResult
Stemmer
algorithm
Term weighting
algorithm
Duplicated workflows are removed
Single-step workflows are removed
1
2
3
4
29Capturing Context in Scientific Experiments: Towards Computer-Driven Science
A method for detecting reusable workflow fragments
[Garijo et al 2014b]
Popular graph mining techniques
Inexact FSM: usage of heuristics to calculate
similarity between two graphs. The solution
might not be complete
Exact FSM: deliver all the possible fragments to be
found the dataset.
1
2
3
4
30Capturing Context in Scientific Experiments: Towards Computer-Driven Science
A method for detecting reusable workflow fragments
[Garijo et al 2014b]
Remove redundant fragments
1
2
3
4
31Capturing Context in Scientific Experiments: Towards Computer-Driven Science
A method for detecting reusable workflow fragments
[Garijo et al 2014b]
Link fragments back to the workflows
where they were found
http://purl.org/net/wf-fd
?
Research question: Are our proposed workflow fragments useful?
ā€¢A fragment is useful if it has been designed and (re)used by a user.
ā€¢Comparison between proposed fragments and user designed fragments
(groupings) and workflows
Workflow fragment assessment
32Capturing Context in Scientific Experiments: Towards Computer-Driven Science
?
Workflow fragment assessment
33Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Metrics: Precision and recall
Fragments
(F)
Workflows
(W)
Groupings
(G)
?
Workflow fragment assessment
34Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Workflow corpora
User Corpus 1 (WC1)
ā€¢ Designed mostly by a single a single user
ā€¢ 790 workflows (475 after data preparation)
User Corpus 2 (WC2)
ā€¢ Created by a user, with collaborations of others
ā€¢ 113 workflows (96 after data preparation)
Multi User Corpus 3 (WC3)
ā€¢ Workflows submitted by 62 users during the month of Jan 2014
ā€¢ 5859 workflows (357 after data preparation)
User Corpus 4 (WC4)
ā€¢ Designed mostly by a single a single user
ā€¢ 53 workflows (50 after data preparation)
?
Workflow fragment assessment
35Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Result assessment
ā€¢30%-60% of proposed fragments are equal to user defined groupings or
workflows
ā€¢40%-80% of proposed of proposed fragments are equal or similar to user
defined groupings or workflows
Commonly occurring patterns are potentially useful for users designing workflows
What about the rest of the fragments? Are those useful?
?
Workflow fragment assessment
36Capturing Context in Scientific Experiments: Towards Computer-Driven Science
User feedback: user survey
Q1: Would you consider the proposed fragment a valuable grouping?
ā€¢I would not select it as a grouping (0)
ā€¢I would use it as a grouping with major changes (i.e., adding/removing more than 30% of the steps) (1)
ā€¢I would use it as a grouping with minor changes (i.e., adding/removing less than 30% of the steps) (2).
ā€¢I would use it as a grouping as it is (3)
Q2: What do you think about the complexity of the fragment?
ā€¢The fragment is too simple (0)
ā€¢The fragment is fine as it is (1)
ā€¢The fragment has too many steps (2)
Not enough evidence to state that all proposed workflow fragments are useful
Outline
ā€¢ Capturing and publishing context of computational experiments
ā€¢ From scientific workflows to Linked Data
ā€¢ Capturing software functionality
ā€¢ Representing software metadata
ā€¢ Using context to facilitate reusability and exploration of experiments
ā€¢ Detecting commonalities among experiments
ā€¢ Explaining computational results
ā€¢ Using context in Intelligent Systems
ā€¢ Hypothesis testing
ā€¢ Environmental sciences modeling
ā€¢ A vision for context capture in computer-driven science
36Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Using captured context to explain results
[Gil and Garijo 2016]
Current methods in paper are ambiguous, incomplete and described at
inconsistent levels of detail
Comparison of
ligand binding
sites
Comparison of dissimilar
protein structures
Graph network
generation
Molecular Docking
The SMAP software was used to
compare the binding sites of the 749
M.tb protein structures plus 1,446
homology models (a total of 2,195
protein structures) with the 962 binding
sites of 274 approved drugs, in an all-
against-all manner. While the
binding sites of the approved drugs
were already defined by the bound
ligand, the entire protein surface of each
of the 2,195 M.tb protein structures
was scanned in order to identify
alternative binding sites. For each
pairwise comparison, a P -value
representing the significance of the
binding site similarity was calculated.
38Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Using captured context to explain results
[Gil and Garijo 2016]
Current methods in paper are ambiguous, incomplete and described at
inconsistent levels of detail
Goal: Automatically generate reports from computer-generated data
analysis records
ā€¢ Reports must:
ā€¢ Be truthful to actual events
ā€¢ Enable inspection
ā€¢ Be human-understandable
ā€¢ Abstract details
ā€¢ Ideally:
ā€¢ Become part of papers
ā€¢ Have persistent evidence
ā€¢ Be adapted to different audiences/expertise/purpose
39Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Data Narratives
1. A record of events that describe a new result
ā€¢ A workflow and/or provenance of all the computations executed
2. Persistent entries for key entities involved
ā€¢ URIs/DOIs for data, software versions, workflow,ā€¦
3. Narrative account(s)
ā€¢ Human-consumable rendering(s) that includes pointers to the detailed
records and entries
ā€¢ Each account is generated for a different audience/purpose
ā€¢ A casual reader, a close colleague, someone inspecting how the work
was done, someone reproducing the work
40Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Data Narrative Accounts: An example
40
ā€œTopic modeling was run on the Reuters R8 dataset (10.6084/
m9.figshare.776887), and English Words dataset
(10.6084/m9.figshare.776888), with iterations set to 100, stop word
size set to 3, number of topics set to 10 and batch size set to 10.
The results are at 10.6084/m9.figshare.776856ā€
ā€œThe topics at 10.6084/m9.figshare.776856 were found
in the Reuters R8 dataset
(10.6084/m9.figshare.776887) and English Words
dataset (10.6084/m9.figshare.776888)ā€
ā€¢ Execution view
ā€¢ Inputs, parameters and main outputs
ā€¢ Data view
ā€¢ Just the data that influenced the results
ā€¢ Method view
ā€¢ Main steps based on their functionality
ā€œTopic training was run on the input dataset. The results are
product of PlotTopics, a visualization stepā€
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
ā€¢ Dependency view
ā€¢ How the steps depend on each other
ā€¢ Implementation view
ā€¢ How the steps were implemented in the execution
ā€¢ Software view
ā€¢ Details on the software used to implement the steps
Data Narrative Accounts: An example
41
ā€œFirst, the input data is filtered by Stop Words, followed by Small
Words, Format Dataset, and Train Topics. The final results are
produced by Plot Topicsā€
ā€œTrain topics was implemented using Latent Dirichlet allocationā€
ā€œThe train topics step was generated with Online LDA open source
software, written in Java. Plot topics was generated with the Termite
software.ā€
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
DANA: DAta NArratives
42
Experiment
Records
Provenance
RepositoryExperiment-
specific
Knowledge Base
DANA Generator
Narrative
accounts Software
registry
Query
patterns
Data Narrative aggregator
Input
Resource
request
Response
Resource
request
Response
Output
Get query Pattern
result
Get
pattern
1. Identify which experiment records to describe
2. Generation of an Experiment-specific knowledge base
3. Creation of the Data Narrative from templates
4. Produce narrative accounts
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
https://knowledgecaptureanddiscovery.github.io/DataNarratives/
Formative evaluation
ā€¢ Survey with 6 target scenarios
ā€¢ Each scenario:
ā€¢ Description of a situation where a user has to do a task
ā€¢ A workflow sketch of the analysis done
ā€¢ Six candidate narratives of that workflow sketch.
ā€¢ 12 responses from users
ā€¢ Results
ā€¢ Each narrative is considered appropriate for describing some scenario
ā€¢ Different users chose different narratives for each scenario
43Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Outline
ā€¢ Capturing and publishing context of computational experiments
ā€¢ From scientific workflows to Linked Data
ā€¢ Capturing software functionality
ā€¢ Representing software metadata
ā€¢ Using context to facilitate reusability and exploration of experiments
ā€¢ Detecting commonalities among experiments
ā€¢ Explaining computational results
ā€¢ Using context in Intelligent Systems
ā€¢ Hypothesis testing
ā€¢ Environmental sciences modeling
ā€¢ A vision for context capture in computer-driven science
44Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Using Context for Hypothesis Testing
[Gil et al 2016]
45Capturing Context in Scientific Experiments: Towards Computer-Driven Science
data
Protein PRKCDBP is expressed
in samples of patient P36
hypothesis
revision
PRKCDBP mutation
is expressed in P36
workflows meta-
workflows
Wf#0# Wf#1# Wf#2#
simMetrics#
com parison*
hypothesis#
revisedHyp#
hypothesisRevision*
Hypothesis Testing: My Contribution
[Garijo et al 2017]
46Capturing Context in Scientific Experiments: Towards Computer-Driven Science
HG2 HE2
HG1
HE1
HS2
Protein
EGFR
Colon
Cancer
SubtypeA
Associated
With
revisionOf
HS1
Protein
EGFR
Colon
Cancer
Associated
With
wasGeneratedBy
Execution 1
wasGeneratedBy
HQ2
Execution 2
C1
hasConfidence
Report
L2
hasConfidenceLevel
wasGeneratedBy
HQ1
C1
hasConfidence
Report
L1
hasConfidenceLevel
Statement
Qualifier
Evidence
History
The DISK Ontology: http://disk-project.org/ontology/disk/
Using Context for Environmental Sciences Modeling
47Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Work in progress
ā€¢ Modeler wants to predict a situation
ā€¢ E.g., Impact of draught in the Amazon
ā€¢ Intelligent system assists:
ā€¢ Finding data of interest
ā€¢ Connecting environmental models:
hydrology, economy, agronomy, etc.
ā€¢ Facilitating the execution of models
ā€¢ Visualizing results
My contribution:
ā€¢ Extending our software ontology to
capture requirements of environmental
models
ā€¢ Relating variables to inputs, units, time, etc.
Albedo
Soil
moisture
Soil
quality
Precipi
tation
Comm
odity
prices
Property
rights
Market
access
Crop/forest
yields
Land
use
House
hold
type
Climate Model Hydrology Model
Economy
model
ā€¦
Intelligent System
predictionsvariables
Scenario
Data Catalog
Model Catalog
Outline
ā€¢ Capturing and publishing context of computational experiments
ā€¢ From scientific workflows to Linked Data
ā€¢ Capturing software functionality
ā€¢ Representing software metadata
ā€¢ Using context to facilitate reusability and exploration of experiments
ā€¢ Detecting commonalities among experiments
ā€¢ Explaining computational results
ā€¢ Using context in Intelligent Systems
ā€¢ Hypothesis testing
ā€¢ Environmental sciences modeling
ā€¢ A vision for context capture in computer-driven science
48Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Where are we headed?
49
Scientist Driven Science Computer Driven Science
Scientist
Scientist +
Automated
Tools
Scientist +
Intelligent
System
Intelligent
System +
Scientist
ā€¢ Can an Intelligent System co-author a paper? Can it be an author?
ā€¢ Can it win a Nobel prize? [Kitano, ISWC 2016]
ā€¢ What do we need to capture (in Software, Data, Methods, Provenance)?
1. Functionality and abstraction
2. Granularity
3. Importance
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Next steps for context capture in
computational experiments
ā€¢ Capturing different levels of abstraction in experiments
ā€¢ Using user expertise to curate captured context
ā€¢ What do users consider important?
ā€¢ Improve explanation of details
ā€¢ How can we identify the core function of a
software step?
ā€¢ Represent the goal and objectives of a
computational experiment
50Capturing Context in Scientific Experiments: Towards Computer-Driven Science
RDF
Triple
store
Summing up
ā€¢ Context is needed to understand and reuse computational experiments
ā€¢ Sharing context from computational experiments
ā€¢ Scientific workflows and their executions
ā€¢ Software functionality and metadata
ā€¢ Getting value out of context
ā€¢ Reusability, exploration, explanation
ā€¢ Used to power intelligent systems!
ā€¢ Next steps
ā€¢ Representing functionality and levels of abstraction
ā€¢ Interact with users to curate context
51Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Special thanks
ā€¢ Yolanda Gil
ā€¢ Varun Ratnakar
ā€¢ Oscar Corcho
ā€¢ Pinar Alper
ā€¢ Khalid Belhajjame
ā€¢ Asuncion Gomez Perez
ā€¢ Idafen Santana Perez
ā€¢ Felisa Verdejo
ā€¢ Francisco Garijo
52Capturing Context in Scientific Experiments: Towards Computer-Driven Science
References
ā€¢ [Kinnings et al, PLOS 2010]: Kinnings SL, Xie L, Fung KH, Jackson RM, Xie L, Bourne PE (2010) The
Mycobacterium tuberculosis Drugome and Its Polypharmacological Implications. PLoS Comput Biol
6(11): e1000976. https://doi.org/10.1371/journal.pcbi.1000976
ā€¢ [Garijo et al PLOS]: Garijo D, Kinnings S, Xie L, Xie L, Zhang Y, Bourne PE, et al. (2013) Quantifying
Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome. PLoS ONE 8(11):
e80278. https://doi.org/10.1371/journal.pone.0080278
ā€¢ [Garijo et al 2014a]: Garijo, D.; Alper, P.; Belhajjame, K.; Corcho, O.; Gil, Y.; and Goble, C .Common motifs
in scientific workflows: An empirical analysis. Future Generation Computer Systems, 36: 338--351. 2014.
ā€¢ [Garijo et al 2014b]: Garijo, D.; Corcho, O.; Gil, Y.; Gutman, B. A; Dinov, I. D; Thompson, P.; and Toga, A
Fragflow automated fragment detection in scientific workflows. W In e-Science (e-Science), 2014 IEEE
10th International Conference on, volume 1, pages 281--289, 2014. IEEE
ā€¢ [Garijo and Gil 2016]: Gil, Y.; and Garijo, D. Towards Automating Data Narratives. In Proceedings of the
22nd International Conference on Intelligent User Interfaces, pages 565--576, 2017. ACM
ā€¢ [Garijo et al 2017]: Garijo, D.; Gil, Y.; and Ratnakar, V. The DISK Hypothesis Ontology: Capturing
Hypothesis Evolution for Automated Discovery. In Proceedings of the Workshop on Capturing Scientific
Knowledge (SciKnow), held in conjunction with the ACM International Conference on Knowledge Capture
(K-CAP), Austin, Texas, 2017.
ā€¢ [Garijo et al 2017 FGCS]: Garijo, D.; Gil, Y.; and Corcho, O. Abstract, link, publish, exploit: An end to end
framework for workflow sharing. Future Generation Computer Systems, . 2017.
ā€¢ [Gil et al 2015]: Gil, Y.; Ratnakar, V.; and Garijo, D. OntoSoft: Capturing scientific software metadata. In
Proceedings of the 8th International Conference on Knowledge Capture, pages 32, 2015. ACM
ā€¢ [Kitano ISWC 2016]: Kitano, H. Artificial Intelligence to Win the Nobel Prize and Beyond: Creating the
Engine for Scientific Discovery. Keynote http://iswc2016.semanticweb.org/pages/program/keynote-
kitano.html
53Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments:
Towards Computer-Driven Science:
Daniel Garijo
Information Sciences Institute and
Department of Computer Science
https://w3id.org/people/dgarijo
@dgarijov
dgarijo@isi.edu

More Related Content

What's hot

Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Carole Goble
Ā 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceRaul Palma
Ā 
Reproducibility of model-based results: standards, infrastructure, and recogn...
Reproducibility of model-based results: standards, infrastructure, and recogn...Reproducibility of model-based results: standards, infrastructure, and recogn...
Reproducibility of model-based results: standards, infrastructure, and recogn...FAIRDOM
Ā 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
Ā 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use CasesCarole Goble
Ā 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...Carole Goble
Ā 
Machines are people too
Machines are people tooMachines are people too
Machines are people tooPaul Groth
Ā 
MESUR: Making sense and use of usage data
MESUR: Making sense and use of usage dataMESUR: Making sense and use of usage data
MESUR: Making sense and use of usage dataHerbert Van de Sompel
Ā 
Peer Review and Science2.0
Peer Review and Science2.0Peer Review and Science2.0
Peer Review and Science2.0Jean-Claude Bradley
Ā 
Data at the NIH: Some Early Thoughts
Data at the NIH: Some Early ThoughtsData at the NIH: Some Early Thoughts
Data at the NIH: Some Early ThoughtsPhilip Bourne
Ā 
Some Early Thoughts
Some Early ThoughtsSome Early Thoughts
Some Early ThoughtsPhilip Bourne
Ā 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...GigaScience, BGI Hong Kong
Ā 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbioc.titus.brown
Ā 
On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...Susanna-Assunta Sansone
Ā 
Scott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data PublishingScott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data PublishingGigaScience, BGI Hong Kong
Ā 
Scientific Workflow Systems for accessible, reproducible research
Scientific Workflow Systems for accessible, reproducible researchScientific Workflow Systems for accessible, reproducible research
Scientific Workflow Systems for accessible, reproducible researchPeter van Heusden
Ā 

What's hot (20)

Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
Ā 
Beyond the PDF 2, 2013
Beyond the PDF 2, 2013Beyond the PDF 2, 2013
Beyond the PDF 2, 2013
Ā 
UKON 2014
UKON 2014UKON 2014
UKON 2014
Ā 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth Science
Ā 
Reproducibility of model-based results: standards, infrastructure, and recogn...
Reproducibility of model-based results: standards, infrastructure, and recogn...Reproducibility of model-based results: standards, infrastructure, and recogn...
Reproducibility of model-based results: standards, infrastructure, and recogn...
Ā 
4A2B2C-2013
4A2B2C-20134A2B2C-2013
4A2B2C-2013
Ā 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Ā 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
Ā 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
Ā 
Machines are people too
Machines are people tooMachines are people too
Machines are people too
Ā 
MESUR: Making sense and use of usage data
MESUR: Making sense and use of usage dataMESUR: Making sense and use of usage data
MESUR: Making sense and use of usage data
Ā 
Reproducibility 1
Reproducibility 1Reproducibility 1
Reproducibility 1
Ā 
Peer Review and Science2.0
Peer Review and Science2.0Peer Review and Science2.0
Peer Review and Science2.0
Ā 
Data at the NIH: Some Early Thoughts
Data at the NIH: Some Early ThoughtsData at the NIH: Some Early Thoughts
Data at the NIH: Some Early Thoughts
Ā 
Some Early Thoughts
Some Early ThoughtsSome Early Thoughts
Some Early Thoughts
Ā 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Ā 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
Ā 
On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...
Ā 
Scott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data PublishingScott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data Publishing
Ā 
Scientific Workflow Systems for accessible, reproducible research
Scientific Workflow Systems for accessible, reproducible researchScientific Workflow Systems for accessible, reproducible research
Scientific Workflow Systems for accessible, reproducible research
Ā 

Similar to Capturing Context in Scientific Experiments: Towards Computer-Driven Science

Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynoteCarole Goble
Ā 
Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015Jisc
Ā 
Standards and tools for model management in biomedical research
Standards and tools for model management in biomedical researchStandards and tools for model management in biomedical research
Standards and tools for model management in biomedical researchUniversity Medicine Greifswald
Ā 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceCarole Goble
Ā 
Minimal viable data reuse
Minimal viable data reuseMinimal viable data reuse
Minimal viable data reusevoginip
Ā 
Pine education-platform
Pine education-platformPine education-platform
Pine education-platformJaclyn Williams
Ā 
Benefits and practice of open science
Benefits and practice of open scienceBenefits and practice of open science
Benefits and practice of open scienceSarah Jones
Ā 
Open science and its advocacy
Open science and its advocacyOpen science and its advocacy
Open science and its advocacySarah Jones
Ā 
Acting as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeActing as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeLizLyon
Ā 
The beauty of workflows and models
The beauty of workflows and modelsThe beauty of workflows and models
The beauty of workflows and modelsmyGrid team
Ā 
From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...dgarijo
Ā 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Bertram LudƤscher
Ā 
Approach and outcome of the Biodiversity Virtual e-Laboratory (BioVeL) project
Approach and outcome of the Biodiversity Virtual e-Laboratory (BioVeL) projectApproach and outcome of the Biodiversity Virtual e-Laboratory (BioVeL) project
Approach and outcome of the Biodiversity Virtual e-Laboratory (BioVeL) projectAlex Hardisty
Ā 
From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...dgarijo
Ā 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityOscar Corcho
Ā 
cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)Pistoia Alliance
Ā 
Preserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of ScholarshipPreserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of Scholarshiptsbbbu
Ā 
Software Sustainability Institute
Software Sustainability InstituteSoftware Sustainability Institute
Software Sustainability InstituteNeil Chue Hong
Ā 
Learn to speak open
Learn to speak openLearn to speak open
Learn to speak openLilian Juma
Ā 
Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012
Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012
Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012Idafen Santana PĆ©rez
Ā 

Similar to Capturing Context in Scientific Experiments: Towards Computer-Driven Science (20)

Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynote
Ā 
Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015
Ā 
Standards and tools for model management in biomedical research
Standards and tools for model management in biomedical researchStandards and tools for model management in biomedical research
Standards and tools for model management in biomedical research
Ā 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
Ā 
Minimal viable data reuse
Minimal viable data reuseMinimal viable data reuse
Minimal viable data reuse
Ā 
Pine education-platform
Pine education-platformPine education-platform
Pine education-platform
Ā 
Benefits and practice of open science
Benefits and practice of open scienceBenefits and practice of open science
Benefits and practice of open science
Ā 
Open science and its advocacy
Open science and its advocacyOpen science and its advocacy
Open science and its advocacy
Ā 
Acting as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeActing as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decade
Ā 
The beauty of workflows and models
The beauty of workflows and modelsThe beauty of workflows and models
The beauty of workflows and models
Ā 
From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...
Ā 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Ā 
Approach and outcome of the Biodiversity Virtual e-Laboratory (BioVeL) project
Approach and outcome of the Biodiversity Virtual e-Laboratory (BioVeL) projectApproach and outcome of the Biodiversity Virtual e-Laboratory (BioVeL) project
Approach and outcome of the Biodiversity Virtual e-Laboratory (BioVeL) project
Ā 
From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...
Ā 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibility
Ā 
cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)
Ā 
Preserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of ScholarshipPreserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of Scholarship
Ā 
Software Sustainability Institute
Software Sustainability InstituteSoftware Sustainability Institute
Software Sustainability Institute
Ā 
Learn to speak open
Learn to speak openLearn to speak open
Learn to speak open
Ā 
Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012
Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012
Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012
Ā 

More from dgarijo

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesdgarijo
Ā 
FAIR Workļ¬‚ows: A step closer to the Scientiļ¬c Paper of the Future
FAIR Workļ¬‚ows: A step closer to the Scientiļ¬c Paper of the FutureFAIR Workļ¬‚ows: A step closer to the Scientiļ¬c Paper of the Future
FAIR Workļ¬‚ows: A step closer to the Scientiļ¬c Paper of the Futuredgarijo
Ā 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Softwaredgarijo
Ā 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationdgarijo
Ā 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasetsdgarijo
Ā 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphsdgarijo
Ā 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadatadgarijo
Ā 
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...dgarijo
Ā 
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Datadgarijo
Ā 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...dgarijo
Ā 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019dgarijo
Ā 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...dgarijo
Ā 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologiesdgarijo
Ā 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narrativesdgarijo
Ā 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflowsdgarijo
Ā 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Softwaredgarijo
Ā 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineeringdgarijo
Ā 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesdgarijo
Ā 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overviewdgarijo
Ā 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsdgarijo
Ā 

More from dgarijo (20)

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
Ā 
FAIR Workļ¬‚ows: A step closer to the Scientiļ¬c Paper of the Future
FAIR Workļ¬‚ows: A step closer to the Scientiļ¬c Paper of the FutureFAIR Workļ¬‚ows: A step closer to the Scientiļ¬c Paper of the Future
FAIR Workļ¬‚ows: A step closer to the Scientiļ¬c Paper of the Future
Ā 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Software
Ā 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentation
Ā 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasets
Ā 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
Ā 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
Ā 
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Ā 
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
Ā 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
Ā 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019
Ā 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
Ā 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologies
Ā 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narratives
Ā 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflows
Ā 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Software
Ā 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineering
Ā 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciences
Ā 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
Ā 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflows
Ā 

Recently uploaded

Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
Ā 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
Ā 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
Ā 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
Ā 
Visit to a blind student's schoolšŸ§‘ā€šŸ¦ÆšŸ§‘ā€šŸ¦Æ(community medicine)
Visit to a blind student's schoolšŸ§‘ā€šŸ¦ÆšŸ§‘ā€šŸ¦Æ(community medicine)Visit to a blind student's schoolšŸ§‘ā€šŸ¦ÆšŸ§‘ā€šŸ¦Æ(community medicine)
Visit to a blind student's schoolšŸ§‘ā€šŸ¦ÆšŸ§‘ā€šŸ¦Æ(community medicine)lakshayb543
Ā 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
Ā 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxPoojaSen20
Ā 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
Ā 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
Ā 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
Ā 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
Ā 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
Ā 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinojohnmickonozaleda
Ā 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
Ā 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
Ā 
Hį»ŒC Tį»T TIįŗ¾NG ANH 11 THEO CHĘÆĘ NG TRƌNH GLOBAL SUCCESS ĐƁP ƁN CHI TIįŗ¾T - Cįŗ¢ NĂ...
Hį»ŒC Tį»T TIįŗ¾NG ANH 11 THEO CHĘÆĘ NG TRƌNH GLOBAL SUCCESS ĐƁP ƁN CHI TIįŗ¾T - Cįŗ¢ NĂ...Hį»ŒC Tį»T TIįŗ¾NG ANH 11 THEO CHĘÆĘ NG TRƌNH GLOBAL SUCCESS ĐƁP ƁN CHI TIįŗ¾T - Cįŗ¢ NĂ...
Hį»ŒC Tį»T TIįŗ¾NG ANH 11 THEO CHĘÆĘ NG TRƌNH GLOBAL SUCCESS ĐƁP ƁN CHI TIįŗ¾T - Cįŗ¢ NĂ...Nguyen Thanh Tu Collection
Ā 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A BeƱa
Ā 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
Ā 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
Ā 

Recently uploaded (20)

Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
Ā 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Ā 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
Ā 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Ā 
Visit to a blind student's schoolšŸ§‘ā€šŸ¦ÆšŸ§‘ā€šŸ¦Æ(community medicine)
Visit to a blind student's schoolšŸ§‘ā€šŸ¦ÆšŸ§‘ā€šŸ¦Æ(community medicine)Visit to a blind student's schoolšŸ§‘ā€šŸ¦ÆšŸ§‘ā€šŸ¦Æ(community medicine)
Visit to a blind student's schoolšŸ§‘ā€šŸ¦ÆšŸ§‘ā€šŸ¦Æ(community medicine)
Ā 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
Ā 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
Ā 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Ā 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
Ā 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
Ā 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
Ā 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
Ā 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
Ā 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipino
Ā 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
Ā 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Ā 
Hį»ŒC Tį»T TIįŗ¾NG ANH 11 THEO CHĘÆĘ NG TRƌNH GLOBAL SUCCESS ĐƁP ƁN CHI TIįŗ¾T - Cįŗ¢ NĂ...
Hį»ŒC Tį»T TIįŗ¾NG ANH 11 THEO CHĘÆĘ NG TRƌNH GLOBAL SUCCESS ĐƁP ƁN CHI TIįŗ¾T - Cįŗ¢ NĂ...Hį»ŒC Tį»T TIįŗ¾NG ANH 11 THEO CHĘÆĘ NG TRƌNH GLOBAL SUCCESS ĐƁP ƁN CHI TIįŗ¾T - Cįŗ¢ NĂ...
Hį»ŒC Tį»T TIįŗ¾NG ANH 11 THEO CHĘÆĘ NG TRƌNH GLOBAL SUCCESS ĐƁP ƁN CHI TIįŗ¾T - Cįŗ¢ NĂ...
Ā 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
Ā 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Ā 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Ā 

Capturing Context in Scientific Experiments: Towards Computer-Driven Science

  • 1. Capturing Context in Scientific Experiments: Towards Computer-Driven Science Daniel Garijo Information Sciences Institute and Department of Computer Science https://w3id.org/people/dgarijo @dgarijov dgarijo@isi.edu
  • 2. A prediction of the futureā€¦ from the past Useful for: ā€¢ Every day tasks ā€¢ Organize agenda ā€¢ Calls ā€¢ Look for information ā€¢ Research features ā€¢ Summarize related work ā€¢ Reuse and comparison of work ā€¢ Highlights ā€¢ Do new data analyses Capturing Context in Scientific Experiments: Towards Computer-Driven Science 2 Source: https://www.businessinsider.com.au/apple-future-computer-knowledge-navigator-john-sculley-george-lucas-2017-10, https://www.youtube.com/watch?v=QRH8eimU_20 The knowledge navigator (Apple, 1987)
  • 3. Meeting expectationsā€¦ ā€¢ In terms of Data ā€¢ Open datasets ā€¢ Open metadata portals ā€¢ In terms of Software ā€¢ Open Source repositories ā€¢ Containers and virtual machines ā€¢ In terms of Publications ā€¢ Open journals ā€¢ Open methods/protocols 3Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  • 4. What are we missing? ā€¢ Methods in publications are not designed for intelligent systems ā€¢ Objectives, hypotheses, methodology and conclusions are tailored for humans ā€¢ Link between data, software and publications is not clear (if exists) ā€¢ Functionality and instructions for executing software requires specific domain expertise ā€¢ Publications are difficult to reuse and reproduce 4 Retracted Scientiļ¬c Studies: A Growing List - NYTimes.com Sections Home Search Skip to content Advertisement Email Share Tweet More Search Subscribe Log In 0 Settings Close search search sponsored by Search NYTimes.com SUBSCRIBE NOW 5/ 29/ 15, 1:49 AMRetracted Scientiļ¬c Studies: A Growing List - NYTimes.com The retraction by Science of a study of changing attitudes about gay marriage is the latest prominent withdrawal of research results from scientific literature. And it very likely won't be the last. A 2011 study in Nature found a 10-fold increase in retraction notices during the preceding decade. Many retractions barely register outside of the scientific field. But in some instances, the studies that were clawed back made major waves in societal discussions of the issues they dealt with. This list recounts some prominent retractions that have occurred since 1980. Photo In 1998, The Lancet, a British medical journal, published a study by Dr. Andrew Wakefield that suggested that autism in children was caused by the combined vaccine for measles, mumps and rubella. In 2010, The Lancet retracted the study following a review of Dr. Wakefield's scientific methods and financial conflicts. Despite challenges to the study, Dr. Wakefield's research had a strong effect on many parents. Vaccination rates tumbled in Britain, and measles cases grew. American antivaccine groups also seized on the research. The United States had more cases of measles in the first month of 2015 than the number that is typically diagnosed in a full year. Vaccinesand Autism Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  • 5. The Cost of Reproducibility 5 ā€¢ Necessary to fill in the gaps ā€¢ 2 months of effort in reproducing published method [Kinnings et al, PLOS 2010] ā€¢ Authors expertise was required Comparison of ligand binding sites Comparison of dissimilar protein structures Graph network generation Molecular Docking [Garijo et al PLOS] Collaboration with UCSD 5Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  • 6. Scientist-Driven Science 6 Scientist Scientist + Automated Tools Scientist + Intelligent System Intelligent Systems help: ā€¢ Comparing ā€¢ Reusing/Repurposing ā€¢ Testing new hypotheses ā€¢ Explaining results Requirements: ā€¢ Functionality ā€¢ Relations between data, software and method ā€¢ Provenance Scientists: ā€¢ Keep their own records ā€¢ Write their own software ā€¢ Data cleaning ā€¢ Reformatting ā€¢ Analysis ā€¢ Run the experiments ā€¢ Manually analyze results and compare to state of the art Automated Tools help: ā€¢ Searching ā€¢ Setting up execution ā€¢ Visualizing ā€¢ Sharing Requirements ā€¢ Data/Dataset metadata ā€¢ Software/Software metadata ā€¢ Method description ā€¢ User/domain expertise Capturing Context in Scientific Experiments: Towards Computer-Driven Science Context of a computational experiment
  • 7. Outline ā€¢ Capturing and publishing context of computational experiments ā€¢ From scientific workflows to Linked Data ā€¢ Capturing software functionality ā€¢ Representing software metadata ā€¢ Using context to facilitate reusability and exploration of experiments ā€¢ Detecting commonalities among experiments ā€¢ Explaining computational results ā€¢ Using context in Intelligent Systems ā€¢ Hypothesis testing ā€¢ Environmental sciences modeling ā€¢ A vision for context capture in computer-driven science 7Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  • 8. Introduction Lab book Digital Log Laboratory Protocol (recipe) Scientific Workflow Experiment In silico experiment 8 Background: Computational Experiments Capturing Context in Scientific Experiments: Towards Computer-Driven Science 8
  • 9. Outline ā€¢ Capturing and publishing context of computational experiments ā€¢ From scientific workflows to Linked Data ā€¢ Capturing software functionality ā€¢ Representing software metadata ā€¢ Using context to facilitate reusability and exploration of experiments ā€¢ Detecting commonalities among experiments ā€¢ Explaining computational results ā€¢ Using context in Intelligent Systems ā€¢ Hypothesis testing ā€¢ Environmental sciences modeling ā€¢ A vision for context capture in computer-driven science 9Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  • 10. Workflow representation: Structures interchanged in the workflow lifecycle Dataset Stemmer algorithm Result Term weighting algorithm FinalResult File: Dataset123 LovinsStemmer algorithm Id:resultaa1 IDF algorithm Id:fresultaa2 Workflow Template Workflow Instance Workflow Execution Trace Design Instantiation Execution File: Dataset124 PorterStemmer algorithm Id:resultaa1 IDF algorithm Id:fresultaa2 File: Dataset123 LovinsStemmer execution Id:resultaa1 IDF execution Id:fresultaa2 File: Dataset123 LovinsStemmer execution Id:resultaa1 IDF execution Id:fresultaa2 File: Dataset124 PorterStemmer execution Id:resultaa1 IDF execution Id:fresultaa2 File: Dataset124 PorterStemmer execution Id:resultaa1 IDF execution Id:fresultaa2 File: Dataset124 PorterStemmer execution Id:resultaa1 IDF execution Id:fresultaa2 File: Dataset123 LovinsStemmer execution Id:resultaa1 IDF execution Id:fresultaa2 ā€¦ ā€¦ Id:resultaa1 Workflow Lifecycle Capturing Context in Scientific Experiments: Towards Computer-Driven Science 11
  • 11. Requirements Workflow template description Workflow execution trace description Workflow attribution Workflow metadata Link between templates and executions Requirements for workflow Representation [Garijo et al., 2017 FGCS] Plan: P-Plan [Garijo et al 2012] http://purl.org/net/p-plan Provenance: PROV (W3C) [Lebo et al 2013] http://www.w3.org/ns/prov# Dublin Core, PROV (W3C) 11Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  • 12. OPMW: Extending provenance standards and plan models template1 opmw:isVariableOfTemplate opmw:isVariable OfTemplate Input Dataset Term Weighting Topics p-plan:isOutputVarOf p-plan:hasInputVar opmw:isStepOf Template opmw:correspondsTo Template opmw:corresponds toTemplateArtifact opmw:corresponds toTemplateProcess opmw:corresponds toTemplateArtifact opmw:Workflow ExecutionProcess opmw:Workflow ExecutionAccount prov:Entity prov:Activity prov:Bundle PROV, OPM Extension opmv:Artifact opmo:Account opmv:Process opmw:Workflow ExecutionArtifact opmw:Workflow TemplateArtifact opmw:Workflow TemplateProcess opmw:Workflow Template p-plan:Plan p-plan:Step p-plan:Variable P-Plan extension Class Object property Legend Instance ofInstance Subclass of execution1 File: Dataset123 IDF (java) File: FResultaa2 prov:wasGeneratedBy prov:used opmo:account opmo:account opmo:account http://www.opmw.org/ontology/ A Vocabulary for Workflow Representation: OPMW Capturing Context in Scientific Experiments: Towards Computer-Driven Science 13
  • 13. Publishing workflows as Linked Data Specification Why Linked Data? ā€¢Facilitates exploitation of workflow resources in an homogeneous manner Adapted methodology from [VillazĆ³n-Terrazas et al 2011] Tested it for the WINGS workflow system 1 Base URI = http://www.opmw.org/ Ontology URI = http://www.opmw.org/ontology/ Assertion URI = http://www.opmw.org/export/resource/ClassName/instanceName Examples: http://www.opmw.org/export/resource/WorkflowTemplate/ABSTRACTSUBWFDOCKING http://www.opmw.org/export/resource/WorkflowExecutionAccount/ACCOUNT1348629 350796 Publishing scientific workflows as Linked Data 14Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  • 14. Publishing workflows as Linked Data Why Linked Data? ā€¢Facilitates exploitation of workflow resources in an homogeneous manner Adapted methodology from [VillazĆ³n-Terrazas et al 2011] Tested it for the WINGS workflow system Publishing scientific workflows as Linked Data Specification Modeling 1 2 OPMW P-Plan OPM DC PROV 15Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  • 15. Publishing workflows as Linked Data Why Linked Data? ā€¢Facilitates exploitation of workflow resources in an homogeneous manner Adapted methodology from [VillazĆ³n-Terrazas et al 2011] Tested it for the WINGS workflow system Publishing scientific workflows as Linked Data 16Capturing Context in Scientific Experiments: Towards Computer-Driven Science Specification Modeling Generation 1 2 3 Workflow system Workflow Template Workflow execution OPMW export OPMW RDF
  • 16. Publishing workflows as Linked Data Why Linked Data? ā€¢Facilitates exploitation of workflow resources in an homogeneous manner Adapted methodology from [VillazĆ³n-Terrazas et al 2011] Tested it for the WINGS workflow system Publishing scientific workflows as Linked Data 17Capturing Context in Scientific Experiments: Towards Computer-Driven Science Specification Modeling Generation Publication 1 2 3 4 RDF Triple store Permanent web- accessible file store RDF Upload Interface SPARQL Endpoint OPMW RDF
  • 17. Publishing workflows as Linked Data Why Linked Data? ā€¢Facilitates exploitation of workflow resources in an homogeneous manner Adapted methodology from [VillazĆ³n-Terrazas et al 2011] Tested it for the WINGS workflow system Publishing scientific workflows as Linked Data 18Capturing Context in Scientific Experiments: Towards Computer-Driven Science Specification Modeling Generation Publication 1 2 3 4 Exploitation 5 Curl Linked Data Browser SPARQL endpoint Workflow explorer
  • 18. Outline ā€¢ Capturing and publishing context of computational experiments ā€¢ From scientific workflows to Linked Data ā€¢ Capturing software functionality ā€¢ Representing software metadata ā€¢ Using context to facilitate reusability and exploration of experiments ā€¢ Detecting commonalities among experiments ā€¢ Explaining computational results ā€¢ Using context in Intelligent Systems ā€¢ Hypothesis testing ā€¢ Machine learning analysis ā€¢ Environmental sciences modeling ā€¢ A vision for context capture in computer-driven science 18Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  • 19. Capturing software functionality [Garijo et al 2014a] (Collaboration with U. of Manchester) Is it possible to generalize workflow steps based on their functionality in an experiment? 19Capturing Context in Scientific Experiments: Towards Computer-Driven Science ā€¢ What kind of data manipulations are performed in a workflow? ā€¢E.g.: ā€¢Data retrieval ā€¢Data preparation ā€¢Data curation ā€¢Data visualization ā€¢ etc.
  • 20. Capturing software functionality [Garijo et al 2014a] (Collaboration with U. of Manchester) Analyzed software steps of 260 workflows from 4 different workflow systems Created a catalog of workflow step functionalities (motifs) Guidelines for annotating workflows Catalog available at: http://purl.org/net/wf-motifs# 20Capturing Context in Scientific Experiments: Towards Computer-Driven Science = 260 workflows 89 12526 20
  • 21. Outline ā€¢ Capturing and publishing context of computational experiments ā€¢ From scientific workflows to Linked Data ā€¢ Capturing software functionality ā€¢ Representing software metadata ā€¢ Using context to facilitate reusability and exploration of experiments ā€¢ Detecting commonalities among experiments ā€¢ Explaining computational results ā€¢ Using context in Intelligent Systems ā€¢ Hypothesis testing ā€¢ Environmental sciences modeling ā€¢ A vision for context capture in computer-driven science 21Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  • 22. Capturing Software Metadata [Gil et al 2015] ā€¢ Scientific workflows capture some software metadata ā€¢ High amount of software not used in scientific workflows ā€¢ Software in open repositories often have missing metadata ā€¢ How to use it? ā€¢ What can I use it with? ā€¢ What are the dependencies? ā€¢ Is it still maintained? ā€¢ How can I contribute? ā€¢ ā€¦ ā€¢ Ontology for scientific software metadata ā€¢ Described with scientist in mind: ā€¢ How can scientist contribute to populate it? ā€¢ What do scientists need in terms of software? 22Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  • 23. Software Metadata: Categories 23Capturing Context in Scientific Experiments: Towards Computer-Driven Science Used in the OntoSoft metadata Registry: http://ontosoft.org/portals http://ontosoft.org/software
  • 24. Using the ontology in the Ontosoft software registry 24Capturing Context in Scientific Experiments: Towards Computer-Driven Science Software entries from distributed repositories are readily accessible Semantic search Comparison matrix of software entries PIHM PIHMgis DrEICH TauDEM WBMsed nto$ o%$ Metadata completion highlighted Software is contrasted by property
  • 25. Outline ā€¢ Capturing and publishing context of computational experiments ā€¢ From scientific workflows to Linked Data ā€¢ Capturing software functionality ā€¢ Representing software metadata ā€¢ Using context to facilitate reusability and exploration of experiments ā€¢ Detecting commonalities among experiments ā€¢ Explaining computational results ā€¢ Using context in Intelligent Systems ā€¢ Hypothesis testing ā€¢ Environmental sciences modeling ā€¢ A vision for context capture in computer-driven science 25Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  • 26. Detecting commonalities in computational experiments [Garijo et al 2014b] PROBLEMS to address: ā€¢ Workflows have many detailed steps and may be difficult to understand ā€¢ The general method may not apparent ā€¢ How are different workflow related? ā€¢ What steps do they have in common? 26Capturing Context in Scientific Experiments: Towards Computer-Driven Science A B C A F D A B C G B H A B F B E Common workflow fragments Workflow 1 Workflow 2 Workflow 3
  • 27. 1 2 3 4 28Capturing Context in Scientific Experiments: Towards Computer-Driven Science A method for detecting reusable workflow fragments [Garijo et al 2014b] Dataset Stemmer algorithm Result Term weighting algorithm FinalResult Stemmer algorithm Term weighting algorithm Duplicated workflows are removed Single-step workflows are removed
  • 28. 1 2 3 4 29Capturing Context in Scientific Experiments: Towards Computer-Driven Science A method for detecting reusable workflow fragments [Garijo et al 2014b] Popular graph mining techniques Inexact FSM: usage of heuristics to calculate similarity between two graphs. The solution might not be complete Exact FSM: deliver all the possible fragments to be found the dataset.
  • 29. 1 2 3 4 30Capturing Context in Scientific Experiments: Towards Computer-Driven Science A method for detecting reusable workflow fragments [Garijo et al 2014b] Remove redundant fragments
  • 30. 1 2 3 4 31Capturing Context in Scientific Experiments: Towards Computer-Driven Science A method for detecting reusable workflow fragments [Garijo et al 2014b] Link fragments back to the workflows where they were found http://purl.org/net/wf-fd
  • 31. ? Research question: Are our proposed workflow fragments useful? ā€¢A fragment is useful if it has been designed and (re)used by a user. ā€¢Comparison between proposed fragments and user designed fragments (groupings) and workflows Workflow fragment assessment 32Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  • 32. ? Workflow fragment assessment 33Capturing Context in Scientific Experiments: Towards Computer-Driven Science Metrics: Precision and recall Fragments (F) Workflows (W) Groupings (G)
  • 33. ? Workflow fragment assessment 34Capturing Context in Scientific Experiments: Towards Computer-Driven Science Workflow corpora User Corpus 1 (WC1) ā€¢ Designed mostly by a single a single user ā€¢ 790 workflows (475 after data preparation) User Corpus 2 (WC2) ā€¢ Created by a user, with collaborations of others ā€¢ 113 workflows (96 after data preparation) Multi User Corpus 3 (WC3) ā€¢ Workflows submitted by 62 users during the month of Jan 2014 ā€¢ 5859 workflows (357 after data preparation) User Corpus 4 (WC4) ā€¢ Designed mostly by a single a single user ā€¢ 53 workflows (50 after data preparation)
  • 34. ? Workflow fragment assessment 35Capturing Context in Scientific Experiments: Towards Computer-Driven Science Result assessment ā€¢30%-60% of proposed fragments are equal to user defined groupings or workflows ā€¢40%-80% of proposed of proposed fragments are equal or similar to user defined groupings or workflows Commonly occurring patterns are potentially useful for users designing workflows What about the rest of the fragments? Are those useful?
  • 35. ? Workflow fragment assessment 36Capturing Context in Scientific Experiments: Towards Computer-Driven Science User feedback: user survey Q1: Would you consider the proposed fragment a valuable grouping? ā€¢I would not select it as a grouping (0) ā€¢I would use it as a grouping with major changes (i.e., adding/removing more than 30% of the steps) (1) ā€¢I would use it as a grouping with minor changes (i.e., adding/removing less than 30% of the steps) (2). ā€¢I would use it as a grouping as it is (3) Q2: What do you think about the complexity of the fragment? ā€¢The fragment is too simple (0) ā€¢The fragment is fine as it is (1) ā€¢The fragment has too many steps (2) Not enough evidence to state that all proposed workflow fragments are useful
  • 36. Outline ā€¢ Capturing and publishing context of computational experiments ā€¢ From scientific workflows to Linked Data ā€¢ Capturing software functionality ā€¢ Representing software metadata ā€¢ Using context to facilitate reusability and exploration of experiments ā€¢ Detecting commonalities among experiments ā€¢ Explaining computational results ā€¢ Using context in Intelligent Systems ā€¢ Hypothesis testing ā€¢ Environmental sciences modeling ā€¢ A vision for context capture in computer-driven science 36Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  • 37. Using captured context to explain results [Gil and Garijo 2016] Current methods in paper are ambiguous, incomplete and described at inconsistent levels of detail Comparison of ligand binding sites Comparison of dissimilar protein structures Graph network generation Molecular Docking The SMAP software was used to compare the binding sites of the 749 M.tb protein structures plus 1,446 homology models (a total of 2,195 protein structures) with the 962 binding sites of 274 approved drugs, in an all- against-all manner. While the binding sites of the approved drugs were already defined by the bound ligand, the entire protein surface of each of the 2,195 M.tb protein structures was scanned in order to identify alternative binding sites. For each pairwise comparison, a P -value representing the significance of the binding site similarity was calculated. 38Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  • 38. Using captured context to explain results [Gil and Garijo 2016] Current methods in paper are ambiguous, incomplete and described at inconsistent levels of detail Goal: Automatically generate reports from computer-generated data analysis records ā€¢ Reports must: ā€¢ Be truthful to actual events ā€¢ Enable inspection ā€¢ Be human-understandable ā€¢ Abstract details ā€¢ Ideally: ā€¢ Become part of papers ā€¢ Have persistent evidence ā€¢ Be adapted to different audiences/expertise/purpose 39Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  • 39. Data Narratives 1. A record of events that describe a new result ā€¢ A workflow and/or provenance of all the computations executed 2. Persistent entries for key entities involved ā€¢ URIs/DOIs for data, software versions, workflow,ā€¦ 3. Narrative account(s) ā€¢ Human-consumable rendering(s) that includes pointers to the detailed records and entries ā€¢ Each account is generated for a different audience/purpose ā€¢ A casual reader, a close colleague, someone inspecting how the work was done, someone reproducing the work 40Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  • 40. Data Narrative Accounts: An example 40 ā€œTopic modeling was run on the Reuters R8 dataset (10.6084/ m9.figshare.776887), and English Words dataset (10.6084/m9.figshare.776888), with iterations set to 100, stop word size set to 3, number of topics set to 10 and batch size set to 10. The results are at 10.6084/m9.figshare.776856ā€ ā€œThe topics at 10.6084/m9.figshare.776856 were found in the Reuters R8 dataset (10.6084/m9.figshare.776887) and English Words dataset (10.6084/m9.figshare.776888)ā€ ā€¢ Execution view ā€¢ Inputs, parameters and main outputs ā€¢ Data view ā€¢ Just the data that influenced the results ā€¢ Method view ā€¢ Main steps based on their functionality ā€œTopic training was run on the input dataset. The results are product of PlotTopics, a visualization stepā€ Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  • 41. ā€¢ Dependency view ā€¢ How the steps depend on each other ā€¢ Implementation view ā€¢ How the steps were implemented in the execution ā€¢ Software view ā€¢ Details on the software used to implement the steps Data Narrative Accounts: An example 41 ā€œFirst, the input data is filtered by Stop Words, followed by Small Words, Format Dataset, and Train Topics. The final results are produced by Plot Topicsā€ ā€œTrain topics was implemented using Latent Dirichlet allocationā€ ā€œThe train topics step was generated with Online LDA open source software, written in Java. Plot topics was generated with the Termite software.ā€ Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  • 42. DANA: DAta NArratives 42 Experiment Records Provenance RepositoryExperiment- specific Knowledge Base DANA Generator Narrative accounts Software registry Query patterns Data Narrative aggregator Input Resource request Response Resource request Response Output Get query Pattern result Get pattern 1. Identify which experiment records to describe 2. Generation of an Experiment-specific knowledge base 3. Creation of the Data Narrative from templates 4. Produce narrative accounts Capturing Context in Scientific Experiments: Towards Computer-Driven Science https://knowledgecaptureanddiscovery.github.io/DataNarratives/
  • 43. Formative evaluation ā€¢ Survey with 6 target scenarios ā€¢ Each scenario: ā€¢ Description of a situation where a user has to do a task ā€¢ A workflow sketch of the analysis done ā€¢ Six candidate narratives of that workflow sketch. ā€¢ 12 responses from users ā€¢ Results ā€¢ Each narrative is considered appropriate for describing some scenario ā€¢ Different users chose different narratives for each scenario 43Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  • 44. Outline ā€¢ Capturing and publishing context of computational experiments ā€¢ From scientific workflows to Linked Data ā€¢ Capturing software functionality ā€¢ Representing software metadata ā€¢ Using context to facilitate reusability and exploration of experiments ā€¢ Detecting commonalities among experiments ā€¢ Explaining computational results ā€¢ Using context in Intelligent Systems ā€¢ Hypothesis testing ā€¢ Environmental sciences modeling ā€¢ A vision for context capture in computer-driven science 44Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  • 45. Using Context for Hypothesis Testing [Gil et al 2016] 45Capturing Context in Scientific Experiments: Towards Computer-Driven Science data Protein PRKCDBP is expressed in samples of patient P36 hypothesis revision PRKCDBP mutation is expressed in P36 workflows meta- workflows Wf#0# Wf#1# Wf#2# simMetrics# com parison* hypothesis# revisedHyp# hypothesisRevision*
  • 46. Hypothesis Testing: My Contribution [Garijo et al 2017] 46Capturing Context in Scientific Experiments: Towards Computer-Driven Science HG2 HE2 HG1 HE1 HS2 Protein EGFR Colon Cancer SubtypeA Associated With revisionOf HS1 Protein EGFR Colon Cancer Associated With wasGeneratedBy Execution 1 wasGeneratedBy HQ2 Execution 2 C1 hasConfidence Report L2 hasConfidenceLevel wasGeneratedBy HQ1 C1 hasConfidence Report L1 hasConfidenceLevel Statement Qualifier Evidence History The DISK Ontology: http://disk-project.org/ontology/disk/
  • 47. Using Context for Environmental Sciences Modeling 47Capturing Context in Scientific Experiments: Towards Computer-Driven Science Work in progress ā€¢ Modeler wants to predict a situation ā€¢ E.g., Impact of draught in the Amazon ā€¢ Intelligent system assists: ā€¢ Finding data of interest ā€¢ Connecting environmental models: hydrology, economy, agronomy, etc. ā€¢ Facilitating the execution of models ā€¢ Visualizing results My contribution: ā€¢ Extending our software ontology to capture requirements of environmental models ā€¢ Relating variables to inputs, units, time, etc. Albedo Soil moisture Soil quality Precipi tation Comm odity prices Property rights Market access Crop/forest yields Land use House hold type Climate Model Hydrology Model Economy model ā€¦ Intelligent System predictionsvariables Scenario Data Catalog Model Catalog
  • 48. Outline ā€¢ Capturing and publishing context of computational experiments ā€¢ From scientific workflows to Linked Data ā€¢ Capturing software functionality ā€¢ Representing software metadata ā€¢ Using context to facilitate reusability and exploration of experiments ā€¢ Detecting commonalities among experiments ā€¢ Explaining computational results ā€¢ Using context in Intelligent Systems ā€¢ Hypothesis testing ā€¢ Environmental sciences modeling ā€¢ A vision for context capture in computer-driven science 48Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  • 49. Where are we headed? 49 Scientist Driven Science Computer Driven Science Scientist Scientist + Automated Tools Scientist + Intelligent System Intelligent System + Scientist ā€¢ Can an Intelligent System co-author a paper? Can it be an author? ā€¢ Can it win a Nobel prize? [Kitano, ISWC 2016] ā€¢ What do we need to capture (in Software, Data, Methods, Provenance)? 1. Functionality and abstraction 2. Granularity 3. Importance Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  • 50. Next steps for context capture in computational experiments ā€¢ Capturing different levels of abstraction in experiments ā€¢ Using user expertise to curate captured context ā€¢ What do users consider important? ā€¢ Improve explanation of details ā€¢ How can we identify the core function of a software step? ā€¢ Represent the goal and objectives of a computational experiment 50Capturing Context in Scientific Experiments: Towards Computer-Driven Science RDF Triple store
  • 51. Summing up ā€¢ Context is needed to understand and reuse computational experiments ā€¢ Sharing context from computational experiments ā€¢ Scientific workflows and their executions ā€¢ Software functionality and metadata ā€¢ Getting value out of context ā€¢ Reusability, exploration, explanation ā€¢ Used to power intelligent systems! ā€¢ Next steps ā€¢ Representing functionality and levels of abstraction ā€¢ Interact with users to curate context 51Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  • 52. Special thanks ā€¢ Yolanda Gil ā€¢ Varun Ratnakar ā€¢ Oscar Corcho ā€¢ Pinar Alper ā€¢ Khalid Belhajjame ā€¢ Asuncion Gomez Perez ā€¢ Idafen Santana Perez ā€¢ Felisa Verdejo ā€¢ Francisco Garijo 52Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  • 53. References ā€¢ [Kinnings et al, PLOS 2010]: Kinnings SL, Xie L, Fung KH, Jackson RM, Xie L, Bourne PE (2010) The Mycobacterium tuberculosis Drugome and Its Polypharmacological Implications. PLoS Comput Biol 6(11): e1000976. https://doi.org/10.1371/journal.pcbi.1000976 ā€¢ [Garijo et al PLOS]: Garijo D, Kinnings S, Xie L, Xie L, Zhang Y, Bourne PE, et al. (2013) Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome. PLoS ONE 8(11): e80278. https://doi.org/10.1371/journal.pone.0080278 ā€¢ [Garijo et al 2014a]: Garijo, D.; Alper, P.; Belhajjame, K.; Corcho, O.; Gil, Y.; and Goble, C .Common motifs in scientific workflows: An empirical analysis. Future Generation Computer Systems, 36: 338--351. 2014. ā€¢ [Garijo et al 2014b]: Garijo, D.; Corcho, O.; Gil, Y.; Gutman, B. A; Dinov, I. D; Thompson, P.; and Toga, A Fragflow automated fragment detection in scientific workflows. W In e-Science (e-Science), 2014 IEEE 10th International Conference on, volume 1, pages 281--289, 2014. IEEE ā€¢ [Garijo and Gil 2016]: Gil, Y.; and Garijo, D. Towards Automating Data Narratives. In Proceedings of the 22nd International Conference on Intelligent User Interfaces, pages 565--576, 2017. ACM ā€¢ [Garijo et al 2017]: Garijo, D.; Gil, Y.; and Ratnakar, V. The DISK Hypothesis Ontology: Capturing Hypothesis Evolution for Automated Discovery. In Proceedings of the Workshop on Capturing Scientific Knowledge (SciKnow), held in conjunction with the ACM International Conference on Knowledge Capture (K-CAP), Austin, Texas, 2017. ā€¢ [Garijo et al 2017 FGCS]: Garijo, D.; Gil, Y.; and Corcho, O. Abstract, link, publish, exploit: An end to end framework for workflow sharing. Future Generation Computer Systems, . 2017. ā€¢ [Gil et al 2015]: Gil, Y.; Ratnakar, V.; and Garijo, D. OntoSoft: Capturing scientific software metadata. In Proceedings of the 8th International Conference on Knowledge Capture, pages 32, 2015. ACM ā€¢ [Kitano ISWC 2016]: Kitano, H. Artificial Intelligence to Win the Nobel Prize and Beyond: Creating the Engine for Scientific Discovery. Keynote http://iswc2016.semanticweb.org/pages/program/keynote- kitano.html 53Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  • 54. Capturing Context in Scientific Experiments: Towards Computer-Driven Science: Daniel Garijo Information Sciences Institute and Department of Computer Science https://w3id.org/people/dgarijo @dgarijov dgarijo@isi.edu

Editor's Notes

  1. This slide details what we can do to fix the current situation
  2. Data driven, usually represented as Directed Acyclic Graphs (DAGs) State the benefits (briefly)
  3. Workflow template and instance: steps and their dependencies Workflow execution trace: provenance of the results Experiment metadata: specific methods, author contribution, etc.
  4. P-Plan is simple and extensible (to cater to cases that require more complex wf operators) Say that P-Plan has been used for describing scientific processes in social sciences and lab protocols
  5. State that the focus is workflow description
  6. Explain that this is necessary to relate software together. And for capturing the role of software in a experiment
  7. Overview of the steps here. Say clearly that
  8. Overview of the steps here. Say clearly that
  9. Overview of the steps here. Say clearly that
  10. Overview of the steps here. Say clearly that
  11. Motivation.
  12. Motivation.
  13. Functionality: Relation between similar software, data and methods. GOALS of a method. Granularity: What level of detail is needed to communicate a finding? Importance: What analysis are important? What are the most important steps?
  14. In this slide, I could mention potential collaboration opportunities, such as AMRs and work from Gully to represent tables from papers