2. 2
RIDING THE WAVE
HOW EUROPE CAN GAIN FROM THE
RISING TIDE OF SCIENTIFIC DATA
A VISION FOR 2030
3. Global collaboratories
They can engage in whole new
forms of scientific inquiry and treat
information at a scale we are only
beginning to see.
… and help us solving today’s
Grand Challenges such as climate
change and energy supply.
3
5. Main drivers
Grand Challenges facing the world demand
international collaboration across disciplines
Research Democracy (zooniverse.com)
Physical and virtual research infrastructures are largely
data driven
Sheer amount of data being generated
More value from reuse of data
Eg Australia spends $3B per annum on data collection
More and more researchers are seeing the value of
sharing
Many countries developing open research data policies
5
6. Copenhagen ICRI Meeting
Alan Blatecky (NSF)
“Let’s just get on with it”
European Commission, NSF/NIST, Australia agreed via
different funding schemes encouraged collaboration.
Various meetings in 2012 plus very regular video calls
resulted in deciding on the name “Research Data
Alliance”
Ross Wilkinson, Fran Berman, and John Wood formed
initial council
Huge bottom up activity – community driven
6
7. Evolving structure
Definition of Interest and Working Groups
Focus on concrete deliverables accepted and used by the
community
Each region contributed to the secretariat (US/EU/Aus)
Need for secretary-general to bring it all together became
obvious (Australia is funding the first year)
Not for profit Legal entity developed (Funded by EC)
Close cooperation between funders (and with RDA Council)
to bring in other regions/countries of the world (RDA-C)
RDA structure developed including Council, TAB and OAB
7
8. Big Bang!
First Plenary and formal start of RDA
Gothenburg March 2013
Registration filled within a few days
Community less interested in processes and
organisation – rather wanted more time for BOF, IG and
WG side meetings
Council adopted methodologies for populating TAB
(initially 6 nominated by Council – remainder elected.
After one year the 6 step down and a further election will
take place).
Process for selecting Council agreed with RDAC
8
9. Lessons
9
Bottom up activity – huge momentum, needs support
Members’ enthusiasm reflects need for Infrastructure
Groups not used to focusing on concrete deliverables
that are adopted by the community
Need to encourage more user communities to become
involved to influence the agenda
Need for developing active data specialists in all regions
12. The RDA Community today:
Over 1000 members from 55 countries
Asia
3%
Africa
2%
Asia-pacific
4%
South
America
1%
12
13. RDA Plenaries: Venue for community building and
WG / IG progress
Plenary 1
RDA Plenary 1 / Launch
March 2013 in Gothenburg,
Sweden
240 participants
3 WG, 9 IG
RDA Plenary 2
September 2013 in
Washington, DC
Plenary 2
380 participants
6 WG, 17 IG, 5 BOF
Data Citation Summit colocated in RDA “neutral
space”
First Organizational
Assembly meet-up
Fran
Berman
1
133
14. 14
RDA Organizational Structure
RDA Council
RDA Membership
Responsible for overarching mission, vision, impact of RDA
Secretary-General and
Secretariat
Technical Advisory
Board
Responsible for Technical
roadmap and interactions
Responsible for
administration and
operations
Organizational Advisory
Board and
Organizational
Assembly
Responsible for organizational
and strategic advice
Working Groups
Responsible for impactful, outcome-oriented efforts
Interest Groups
Responsible for defining and refining common issues
RDA Colloquium (Research Funders)
Operational and community sponsorship
15. Organizational Evolution over the last year
RDA Membership
RDA Council
7 out of 9 Council members now appointed, all appointed by Plenary 3
Technical Advisory Board
11 out of 12 TAB members
now chosen, all 12 chosen
by Plenary 3
Secretary-General and
Secretariat
New Secretary General to
be in place by Plenary 3
Organizational Advisory
Board and Organizational
Assembly
25 organizations interested
in Membership; 7
organizations interested in
Affiliate status
Working Groups
3 WGs at Plenary 1, 6 WGs at Plenary 2, potentially 12+ WGs at Plenary 3
Interest Groups
9 IGs at Plenary 1, 17 IGs at Plenary 2 + 5 BOFs, 29+ IGs at Plenary 3 + ?? Birds-of-a Feather
RDA Colloquium
Operational and community sponsorship
15
16. RDA Organizational Partners
Member Applicants
• Institute for Quantitative Social Science at Harvard
• Barcelona Supercomputing Center
• Intersect Australia Limited
• European Data Infrastructure (EUDAT)
• Microsoft
• International Association of STM Publishers
• Oracle
• New Zealand eScience Infrastructure
• STFC - Science & Technology Facilities Council
• Washington University Libraries
• Corporation for National Research Initiatives (CNRI)
• Purdue University Libraries
• Terrestrial Ecosystems Research Network
• Research Data Canada
• University of Michigan Libraries
• eResearch Services and Scholarly Application
Development Division of Information Services
Interested Affiliates
• American University Library
• Committee on Data for Science and Technology
(CODATA)
Other interested Organizations
• Connecting Research and Researchers (ORCID)
• Australian Antarctic Data Centre
• DataCite
• Australian National Data Service
• International Oceanographic Data and Information
Exchange (IODE)
• CERN
• CJSD Consulting
• Columbia University Libraries/Information Services
• CSC - IT Center for Science Ltd.
• Digital Curation Centre
• IBM
• Scholarly Publishing and Academic Resources Coalition
(SPARC)
• World Data System (WDS)
16
17. RDA Community-Driven Groups
Repositories, Data
Descriptions Registry
Interoperability, DSA-WDS
Partnership Working Group
on Certification
Birds-of-a-Feather
(met at Plenary 2)
Linked Data
Chemical Safety Data
Education and Skills
Development in Data
Intensive Science
Libraries and Research Data
Cloud Computing and Data
Analysis Training for the
Developing World
Working Groups
Data Type Registries
Persistent Identifier Types
Data Foundations and
Terminology
Metadata Standards
Practical Policy
Data Categories and Codes
WG Case statements being
prepared: Citing Dynamic
Data, Publishing Data
Workflows, Publishing Data
Services, Data Bibliometrics,
Cost Recovery Models for
Interest Groups
Agricultural Data
Interoperability
Certification of Trusted
Repositories (joint with ICSUWDS)
Data Citation
Metadata
Marine Data Harmonization
Community Capability Model
Engagement
Preservation e-Infrastructure
Legal Interoperability (joint
with CODATA)
Defining Urban Data
Exchange for Science
Marine Data Harmonization
Structural Biology
Big Data Analytics
Data Brokering
Blue = new between Plenary 1
and Plenary 2
Green = new since Plenary 2
17
Publishing Data (joint with
WDS)
Toxicogenomics
Interoperability
Research Data Provenance
Materials Data Management
Global Registry of Trusted
Data Repositories and
Services
Digital Practices in History
and Ethnography
Biodiversity Data Integration
Long tail of Research Data
Development of cloud
computing capacity and
education in developing
world
Service Management IG
(pending)
Domain Repositories
Interest Group (pending)
Federated Identity
Management (pending)
Persistent Identifier Interest
Group – PID-IG (pending)
18. Community-Driven RDA Groups by Focus
Domain Science - focused
Toxicogenomics Interoperability
IG
Structural Biology IG
Biodiversity Data Integration IG
Agricultural Data Interoperability
IG
Digital History and Ethnography
IG
Defining Urban Data Exchange for
Science IG
Marine Data Harmonization IG
Materials Data Management IG
Reference and Sharing focused
Data Stewardship focused
Data Citation IG
Data Categories and Codes WG
Legal Interoperability IG
18
Community Needs focused
Community Capability Model
IG
Engagement IG
Clouds in Developing
Countries IG
Preservation e-infrastructure
Long-tail of Research Data IG
Research Data Provenance IG
Certification of Digital
Publishing Data IG
Repositories IG
Global Registry of Trusted Data
Repositories and Services IG
Base Infrastructure - focused
Metadata IG
Data Foundations and Terminology WG
Big Data Analytics IG
Metadata Standards WG
Data Brokering IG
Practical Policy WG
PID Information Types WG
Data Type Registries WG
Domain Repositories IG
19. Coming in 2014
19
RDA Plenary 3
March 26-28, 2014 in
Dublin, Ireland
Hosted by Australia and
Ireland
Theme: “The Data Sharing
community - Playing Your
Part”
RDA Plenary 4
September 2014 in The
Netherlands
Being planned now …
Plenary 3
Plenary 4
21. Groups that Met at the 2nd RDA Plenary
Birds-of-a-Feather
Working Groups
Linked Data
Chemical Safety Data
Education and Skills
Development in Data
Intensive Science
Libraries and Research
Data
Cloud Computing and
Data Analysis Training
for the Developing
World
Data Type Registries
Metadata Standards
Practical Policy
Persistent Identifier Types
Data Foundations and
Terminology
Language Codes
Interest Groups
Agricultural Data
Big Data Analytics
Data Brokering
Certification of Trusted
Repositories (joint with
ICSU-WDS)
Long tail of Research
Data
Marine Data
Harmonization
Community Capability
Model
Data Publishing (joint
with WDS)
Toxicogenomics
Interoperability
Research Data
Provenance
Data Citation
Metadata
21
Economic Models and
Infrastructure for
Federated Materials Data
Management
Engagement
Preservation eInfrastructure
Legal Interoperability (joint
with CODATA)
Global Registry of
Trusted Data
Repositories and
Services
Digital Practices in
History and
Ethnography
Data Citation
Harmonization Summit
DataCite,force11,
CODATA/ICST,
ESIP, DCC, etc.
23. First RDA Infrastructure Deliverables
in 2014 (1)
23
Data Type Registries WG
Defining a system of data type registries
Defining a formal model for describing types and
building a working model of a registry.
To be adopted by CNRI, International DOI
Foundation, and used by the Deep Carbon
Observatory and others
(working in conjunction with PID group)
Scheduled to complete Summer, 2014
Persistent Identifier Information
Types
Defining a minimal set of types that must be
associated with a PID (e.g. checksum, author).
Specifying an API for interaction with PID types
Adopted and used by Data Conservancy and
DKRZ
(working in conjunction with DTR group)
Metadata Standards
Scheduled to complete Summer, 2014
Creating use cases and prototype
directory of current metadata
standards from starting point of DCC
directory and stakeholder
contributions.
To be hosted and used by JISC,
DataOne and others
Scheduled to complete Fall, 2014
24. First RDA Infrastructure Deliverables
in 2014 (2)
24
Language Codes
Operationalization of ISO language
categories for repositories
Adopted and used by the Language Archive,
PARADISEC
Proposal of data categories associated with
the CMDI schema as ISO standards.
Scheduled to complete Fall, 2014
Data Foundations and
Terminology
Defining a common vocabulary for data
terms based on existing models.
Creating formal definitions in a structured
vocabulary too which also provides an open
registry for data terms.
Scheduled to complete Summer, 2014
Survey of policies in production use
across data management centers. Test
bed of machine-actionable policies
(IRODS, DataVerse, dCache) at RENCI,
DataNet Federation Consortium,
CESNET, Odum Institute.
Deployment of 5 policy sets (integrity,
access control, replication, provenance /
event tracking, publication ) on test beds.
Publication of standard policies for use as
starter kits.
Scheduled to complete Summer, 2014
Tested and adopted by EUDAT, DKRZ,
Deep Carbon Observatory, CLARIN, EPOS,
and others
(active input from all RDA WGs)
Practical Code policies (rules)
25. RDA Language Codes Working Group
Delivering data interoperability
to Linguistics, Musicology, etc.
How do different disciplines
exchange data about human
languages?
approach to enable a “rough
consensus” to be rapidly
achieved
Uses a metadata approach
compatible with building blocks of
other RDA working groups
Leverages ISO Standards, but
meets the need of researchers for Delivers a practical approach to
fine grained language distinction
language interoperability in 18
months
Enables data discovery across
disciplines
Brings together expertise across
disciplines and across standards
25
26. RDA Data Type Registries
Delivering interoperability
building block that enables
machines to share data from all
disciplines
Data within disciplines will
generally have ways of
organising their data.
26
observations, time series, a set of
time series describing a complex
phenomenon, and so forth
Enables data citation
Supports Deep Carbon Observatory Data
Management
No single solution for all, but
practical solutions that get used
If data in geophysics is needed
by hydrologists, they not only
need access, but usability, so the Engaged with other Building
Blocks
form of the data needs to be
machine understandable
WG will create a Data Type
Registry methodology, data
model, and prototype
27. RDA Agricultural Data Interoperability
Many initiatives to make
Agricultural data more available
27
Interoperability: provide a
common framework for
describing, representing linking
GFAR (Global Forum on Agricultural
and publishing Wheat data with
Research) with FAO(Food and Agriculture
Organization of the UN)
respect to open standards.
Interest group discussion of
CGIAR (Cooperative Group on
International Agricultural research)
Agricultural Data policies
Possible Germplasm Data
CIARD movement to open up access to
agricultural knowledge worldwide.
Working Group
Group working together to spin of
RDA Interest Group formed to
activities that deliver
seek short sharp initiates that can
implementation and adoption in
make a quick difference
18 months
First initiative: Wheat Data
28. RDA Path to Impact
Variable, but
Need to work quickly
Need to be concrete
All required to demonstrate adoption,
and community support
28
29. Shared Research Data Infrastructure
All countries need data infrastructure to tackle the big
problems
They need international collaboration across disciplines
Data volume, variety and velocity is increasing
More value from reuse of data
So countries need to share the cost of research data
infrastructure
Need to future proof investments
RDA is a good way of lowering the cost and increasing
the interoperability of research data infrastructure
29