Engler and Prantl system of classification in plant taxonomy
Using Open Science to advance science - advancing open data
1. Using Open Science to advance science
–
advancing open data
Robert Oostenveld
Donders Institute, Radboud University, Nijmegen, NL
Karolinska Institutet, Stockholm, SE
r.oostenveld@donders.ru.nl
2. FieldTrip toolbox
Open Source MATLAB-based
toolbox for MEG, EEG and iEEG
analysis
Development started around
2004 with the “F.C. Donders
Centre” (now the Donders
Institute)
Estimated 3000 users, 1500
people on the discussion list,
close to 1000 citations per
year
8. Shifts in research funding
EU: train young researchers for jobs
in society in European Training
Networks.
EU/NL: Public-Private partnerships for
better knowledge transfer and
utilization.
“NWO is of the opinion that research
results paid for by public funds should
be freely accessible worldwide. This
applies to both scientific publications
and other forms of scientific output.”
10. The problem
many studies with low statistical power
publish or perish results in reporting bias
The consequence
overreporting of false positives
overestimates effect size
low reproducibility of results
12. 70 independent teams analyzed the same dataset,
testing the same 9 hypotheses
no two teams chose identical workflows to analyse the
data … resulted in sizeable variation in the results
analytical flexibility can have substantial effects on
scientific conclusions
results emphasize the importance of validating and
sharing complex analysis workflows
14. Scientific efficiency – not only money
Although hardware might be getting more affordable …
• Patients are not available in abundance
• Effect sizes of interest are getting smaller
• Larger samples needed to boost sensitivity
• Larger datasets needed for machine learning
15. Scientific efficiency – not only data
• Collaboration and networks (team science) needed
to increase our shared knowledge and understanding
16. Open Science
Open educational resources
Open access publications
Open peer review
Open methodology
Open source
Open hardware
Open data
Inclusive and ethical
17.
18. Open Data
Shared data allows for
Improved reproducibility
Small effects that require large group sizes
Data mining, discovery science and generating new hypothesis
Results in methodological opportunities
Improve algorithms
Estimate effect and group size
Make informed decisions on analysis pipeline
Prevent harking and p-hacking
19. Open Data
Findable
Make your data available on repository with a persistent identifier (DOI, handle)
and metadata
Accessible
Be explicit about data usage terms (agreement with downloader)
Interoperable
Make your data human and machine readable, e.g. BIDS
Reusable
Make sure you document enough details, e.g. “data descriptor” paper
this can be cited, along with citing our data -> measurable impact!
20. Open Data – challenges with our data
• Neuroimaging data is large
• Many files
• Many GB
• Complex organization (not a simple table)
• Neuroimaging data can be sensitive
• Data from human research participants (not “subjects”)
• Ethical framework – Declaration of Helsinki
• Legal framework – General Data Protection Regulation
22. What is is?
BIDS is a way to organize your existing raw data
To improve consistent and complete documentation
To facilitate re-use by your future self and others
BIDS is not
A new file format
A search engine
A data sharing tool
23. BIDS for MRI, MEG, EEG, iEEG …
in future also PET, eye-tracker, genetics etc.
data/README
CHANGES
dataset_description.json
participants.tsv
/sub-01/anat/…
/sub-01/meg/…
/sub-01/eeg/sub-01_task-auditory_eeg.edf
/sub-01/eeg/sub-01_task-auditory_eeg.json
/sub-01/eeg/sub-01_task-auditory_channels.tsv
/sub-01/eeg/sub-01_task-auditory_events.tsv
/sub-01/eeg/sub-01_electrodes.tsv
/sub-01/eeg/sub-01_coordinates.json
Actual EEG data
Directory structure
Metadata
24. Open Standard
For all toolboxes
For all researchers
Academic/Industrial
Open/Closed Source
Our current research data
will outlive our current
research tools.
Aim for >10 years.
25.
26. Data from human participants
General Data Protection Regulation (GDPR)
Challenges:
Explicit and strict protection of personal data
Opportunities:
Less influence of national legislation differences
Learn from each other
Develop best practices
https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=uriserv:OJ.L_.2016.119.01.0001.01.ENG
27. Personal data
name
address
date of birth
phone number
license plate number
IP address
...
Crime Scene Investigation
http://www.abc.net.au/news/2017-09-19/csi/8960590
This is the information the police will first search for.
In case this cannot be found, CSI is called in.
28. Biometric data
facial details
dental record
fingerprint
genetics
cortical folding pattern
clinical data
gait/movement pattern
…
These are identifying in case they are
sufficiently unique and stable over time.
29. Personal Data is needed
and should be managed
Required for administration
Contacting your participants
Paying your participants
Follow up incidental findings
Often not required to address the research question
Sometimes used as confound (e.g. age, but not date of birth)
Check whether the sample is representative (e.g. social status)
Possibly required to assess scientific integrity
GDPR – data minimization
Only collect what you need
Only use it for the intended purpose
Delete (contact) data that you do not need any more
30. Personal Data
Personal data
Name, address, date of birth
Special personal data = “bijzondere persoonsgegevens in NL”
Race
Religion or beliefs
Health
Sexual activities
Political preference, membership of a union
Criminal record
Indirect personal data – identifies someone … when linked to another database
Fingerprint, DNA, facial details
Anatomical MRI
Specific pattern of data (e.g. answers on a questionnaire or interview)
https://autoriteitpersoonsgegevens.nl/nl/over-privacy/persoonsgegevens/wat-zijn-persoonsgegevens
31. Organize personal data for deletion
name address phone date of birth pseudonym age gender
John Doe 7 Willow road 918 247462 19-7-1984 sub-01 35 M
Fern Travers sub-02
Griffin Mora sub-03
Peter Dillon sub-04
Kathy Kirk sub-05
… …
Don’t put identifying details in the header of the binary files (e.g. DICOM)
Don’t put it in the file names (e.g. BrainVision *.vhdr/vmrk/eeg)
Delete as soon as requirements fulfilled (e.g. incidental findings procedure)
Don’t delete what needs to be retained (signed informed consent forms)
32. Gradient between
personal and research data
indirect personal
data
personal data
a lot of research data
easy easyhard
Keep private
don’t share
but delete
Share as it is
with others
?
33. Limit possible identification
Anonymous
Nobody is able to identify the participant
Pseudonymization
Use a code instead of the participants name
De-identification
Remove (indirectly) identifying features
Blur the indirect personal data
Deface anatomical MRI
Age at the time of acquisition instead of date of birth
Use age bins instead of years
Questionnaire outcomes rather than individual item scores
…
35. Personal and research data
indirect personal
data
personal data
a lot of research data
36. Personal and research data
data minimization
pseudonymization
data minimization
de-identifying, blurring
alotofresearchdata
personaldata
indirect
personaldata
Share
responsibly with
legal constraints
on reuse
Keep safe
and private (or delete)
37. Legal constraints
Contract between you as researcher
… and the funding agency
… and the ethics committee
… and the participants/patients
… and the publisher of the results
… and the recipient of the data upon sharing
38. Legal constraints – Data Use Agreement
CC0 - Public Domain
No copyright.
The person who associated a work with this deed
has dedicated the work to the public domain by
waiving all of his or her rights to the work
worldwide under copyright law, including all related
and neighboring rights, to the extent allowed by law.
You can copy, modify, distribute and perform the
work, even for commercial purposes, all without
asking permission.
Donders Institute - Data Use Agreement
for identifiable human data
I will comply with all relevant rules and regulations
imposed by my institution and my government ….
I will not attempt to establish the identity of or attempt
to contact any of the included human subjects. I will not
link this data to any other database in a way that could
provide identifying information ….
I will not redistribute or share the data with others,
including individuals in my research group, unless they
have independently applied and been granted access to
this data.
I will acknowledge the use of the data and data derived
from the data when publicly presenting …
Failure to abide by these guidelines will result in
termination of my privileges to access to these data.https://creativecommons.org/publicdomain/zero/1.0/
https://data.donders.ru.nl/doc/dua/
https://open-brain-consent.readthedocs.io/
participant → you → recipient
39.
40. Where to share?
Institutional repository
Donders https://data.donders.ru.nl
Radboud University http://data.ru.nl
In the UK Oxford, Cambridge, Edinburg
…
National repository
https://easy.dans.knaw.nl
https://dataverse.nl
https://data.4tu.nl
Domain specific repository
http://openneuro.org
General repository
Zenodo
Harvard dataverse
Commercial publishers
https://datadryad.org
https://figshare.com
41. Considerations for shared data
• For the ethics board
• Be explicit about sharing, e.g. https://open-brain-consent.readthedocs.io
• For our research participants and the GDPR
• Use pseudonyms
• Remove identifying features (names, dates, faces)
• For the researchers that want to share
• Allow uploading and reorganization of large datasets (1GB-1TB)
• Provide guidelines for structuring the data
• Provide methods to review the data, also for journal editors/reviewers
• Provide versioning of datasets
• For researchers that want to reuse the data
• Allow browsing the data
• Allow selective downloads to get a taste of it
• Allow bulk downloads
42. Summary
Open Data improves reproducibility and accelerates new research
BIDS helps to organize your data FAIR and easy to understand
Open Source community is building tools to create and reuse data
There is more to Open Science, also education, open access publications,
methodology and data
Editor's Notes
https://jmopendata.cbs.nl/#/JM/nl/dataset/71009ned/barv?dl=23D6F
Open Access - cOAlition S and Plan S
Review and critical evaluation beyond publication – open methods, tools and data