2. Session outline
īŽ Managing active data
ī¨ Storage options
īŽ Long-term retention of data
ī¨ Selection criteria
ī¨ Data repositories
īŽ Finding and citing data
ī¨ Data registries and metadata
īŽ Presentation based on: Sarah Jones, Graham Pryor and Angus Whyte, How to
Develop Research Data Management Services â a guide for HEIs (DCC, 2013):
īŽ http://www.dcc.ac.uk/resources/how-guides/how-develop-rdm-services
īŽ Some slides reused from RDMRose training materials:
īŽ http://rdmrose.group.shef.ac.uk/
4. Managing active data: key tasks
īŽ Researchers:
ī¨ Have a duty to ensure that research data is stored securely and backed-up on a
regular basis
ī¨ Have choices (e.g. network drives, laptops, external storage devices, online /
cloud-based storage)
ī¨ Need to take data security seriously
ī¨ This should be considered as part of the data management planning process
īŽ Institutions:
ī¨ Need to be constantly review data holdings and RDM practices in order to
evaluate whether current storage infrastructures are sufficient
ī¨ May need to make a case for investing in the provision of additional data storage
capability
ī¨ Need procedures for the allocation and management of storage
ī¨ Need to be flexible, taking account of a diverse range of research contexts and
data storage requirements
5. Research data storage
īŽ Trend for some HEIs to enhance the capacity of
research data storage facilities
ī¨ Extending capacity of existing filestores (e.g. Bath)
ī¨ Exploring secure cloud storage
ī¨ Utilising High Performance Computing facilities
īŽ Managing storage
ī¨ University of Bristol (data.bris) â registered researchers (data
stewards) are allocated 5TB storage to manage, e.g. deciding
how long data should be kept, who has access, etc.
ī¨ http://data.blogs.ilrt.org
6. Options for managing active data
īŽ Cloud storage options
ī¨ There may be benefits in terms of costs and expertise
ī¨ There may also be risks (e.g. loss of control, jurisdictional
issues)
ī¨ Janet Brokerage - promoting the use of cloud and off-site data
centre facilities
īŽ Academic dropbox-like services
ī¨ Dropbox is often used for sharing and synching data between
machines, but institutions are keen to retain control
īŽ Systems developed in-house
ī¨ Typically developed with an disciplinary focus, e.g. BRISSkit
(biomedicine)
8. Selecting data for retention
īŽ RCUK, Common Principles on Data Policy (2011):
ī¨ âData with acknowledged long-term value should be preserved and remain
accessible and usable for future researchâ
ī¨ http://www.rcuk.ac.uk/research/Pages/DataPolicy.aspx
īŽ Institutions will need to establish clear criteria to guide decisions on what
should be kept
ī¨ It will not be possible to retain everything
ī¨ Carefully considered selection processes are essential to help prioritise that data
that has long-term value
īŽ Institutional selection processes will need to take account of:
ī¨ Data that institutions are legally obliged to retain (or destroy), e.g. for contractual
or regulatory reasons
ī¨ Different disciplinary practices (e.g., some disciplines will have mature data
sharing infrastructures and will already deposit data with third party services)
ī¨ Researcher sensitivities about losing control of data (deposit agreements)
9. Developing guidance on selection
īŽ Establishing guidelines, processes and good
practice for data selection and deposit can be
one of the more challenging aspects of an RDM
service
ī¨ There is a need for buy-in from researchers
ī¨ There is a need for clarity on what kinds of data are
within the remit of an institutional RDM service
ī¨ There may be a need to apply different levels of
curation, e.g. depending on the perceived value of the
data accepted
10. DCC selection categories
īŽ DCC How to Select and Appraise Research Data for
Curation (Whyte and Wilson, 2010) proposes seven
main criteria:
ī¨ Relevance to mission
ī¨ Scientific or historic value
ī¨ Uniqueness
ī¨ Potential for redistribution
ī¨ Non-replicability
ī¨ Economic case
ī¨ Full documentation
īŽ http://www.dcc.ac.uk/resources/how-guides/appraise-
select-data
12. Data repositories
īŽ Focusing on how data will be preserved and
made available for others
ī¨ Main options:
īŽ Developing an institutional data repository
ī¨ Building, where possible, on existing systems, e.g. IR, CRIS,
etc.
ī¨ Essex Research Data demo: http://researchdata.essex.ac.uk/
īŽ Liaising with external research data repositories (or data
centres)
ī¨ Often subject based, some UK data centres supported by
funding bodies
īŽ Providing researchers with information on external services
14. RCUK Common Principles
īŽ RCUK, Common Principles on Data Policy (2011):
īŽ âTo enable research data to be discoverable and
effectively re-used by others, sufficient metadata should
be recorded and made openly available to enable other
researchers to understand the research and re-use
potential of the data. Published results should always
include information on how to access the supporting
dataâ
īŽ Also EPSRC Principle 6
īŽ http://www.rcuk.ac.uk/research/Pages/DataPolicy.aspx
15. EPSRC Expectation V
īŽ âResearch organisations will ensure that appropriately
structured metadata describing the research data they
hold is published (normally within 12 months of the data
being generated) and made freely accessible on the
internet; in each case the metadata must be sufficient to
allow others to understand what research data exists,
why, when and how it was generated, and how to
access it. Where the research data referred to in the
metadata is a digital object it is expected that the
metadata will include use of a robust digital object
identifier (For example as available through the
DataCite organisation - http://datacite.org).â
īŽ http://www.epsrc.ac.uk/about/standards/researchdata/P
ages/expectations.aspx
May-13
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmros
e
16. Some questions to consider
īŽ What metadata is required to adequately record
datasets? What is âsufficient metadataâ for discovery and
re-use?
īŽ Does any of this metadata already exist?
ī¨ If so, where might it be found?
ī¨ If not, how can the appropriate metadata be generated or
captured?
īŽ Will there be a need to share this metadata, e.g. with
third-party discovery services? National data services?
ī¨ If so, what standards exist to support metadata sharing?
17. Examples: UKOLN Scoping Study
īŽ Scientific Data Application Profile Scoping Study (UKOLN, 2009)
ī¨ Building on work undertaken on the Scholarly Works Application Profile
(SWAP)
ī¨ Analysed the metadata used by UK data centres and repositories,
selected domain models (e.g. DDI, CCLRC Metadata Model, CIDOC
CRM)
ī¨ Concluded that:
īŽ Simple Dublin Core (e.g., as mandated by OAI-PMH) would be insufficient
īŽ There was sufficient convergence between the different schemas to suggest
that a generic metadata profile could be constructed
īŽ A generic metadata profile would benefit interdisciplinary research and
institution based services (e.g. IRs)
ī¨ http://www.ukoln.ac.uk/projects/sdapss/
18. Examples: DataCite metadata (1)
īŽ DataCite:
ī¨Organisation aiming to facilitate easier access
to (and citation of) research data, e.g. through
the use of persistent identifiers (DOIs)
ī¨DataCite Metadata Schema (currently v. 2.2,
2011) defines core metadata properties
ī¨Broadly based on Dublin Core concepts
ī¨http://schema.datacite.org
19. Examples: DataCite metadata (2)
īŽ Mandatory Properties:
ī¨ Identifier
ī¨ Creator
ī¨ Title
ī¨ Publisher
ī¨ PublicationYear
īŽ Administrative Metadata
ī¨ LastMetadataUpdate
ī¨ MetadataVersionNumber
īŽ Optional Properties:
ī¨ Subject
ī¨ Contributor
ī¨ Date
ī¨ Language
ī¨ ResourceType
ī¨ AlternateIdentifier
ī¨ RelatedIdentifier
ī¨ Size
ī¨ Format
ī¨ Version
ī¨ Rights
ī¨ Description
20. Examples: University of Oxford
īŽ The DaMaRO project at the University of Oxford is developing
a metadata schema for its DataFinder (Rumsey, 2012).
īŽ A three-tier metadata approach:
ī¨ Mandatory minimal metadata to enable basic discovery, such as
Creator, Title, Publisher, Date, Location, Access terms &
conditions
ī¨ Mandatory contextual metadata (mostly administrative and
partly based on EPSRC expectations), such as Funding Agency,
Grant Number, Last access request date, Project Information,
Data Generation Process, Why the data was generated, Date
(range) of data collection, Reasons for embargoes
ī¨ Optional metadata (including discipline-specific metadata) to
enable reuse, such as machine settings and the experimental
conditions under which the data were gathered
May-13
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmros
e
21. Examples: University of Essex
īŽ RDE Metadata Profile for EPrints
ī¨ Based on DataCite, INSPIRE, DDI 2.1 and DataShare
ī¨ Mixture of generic schema and standards specific to
social science data
ī¨ http://data-
archive.ac.uk/media/375386/rde_eprints_metadatapr
ofile.pdf
īŽ Seems to be convergence on layered approach
22. Some practical questions (1)
īŽ Technical choices for institutions:
ī¨ Developing new institutional services, e.g. the
approach taken by ANDS:
http://www.ands.org.au/guides/metadata-stores-
solutions.html
īŽ Defining metadata stores by their coverage, the granularity of
data that they describe, and the specialisation of their
descriptions (e.g. collection-level, object level, local,
institutional, national and discipline-specific)
ī¨ Building upon existing infrastructures, e.g.:
īŽ Institutional Repositories
īŽ CRIS (e.g. Pure, Symplectic, Converis)
23. Some practical questions (2)
īŽ Research Information Management interaction?
ī¨ There is interest in what RIM standards like CERIF can offer RDM (e.g.
potentially richer metadata structures for linking research outputs with
organisational groupings and funding streams, some level of buy-in from
funding bodies), but implementation
ī¨ CERIF for Datasets (C4D): http://cerif4datasets.wordpress.com
īŽ We need to think about how metadata can be shared with:
ī¨ Discipline-based repositories and data centres
ī¨ Emerging national (and international) discovery infrastructures
īŽ Australian National Data Service
ī¨ Uses RIF-CS schema (based on ISO 2146:2010) as a data interchange format
īŽ Jisc and DCC are currently exploring the options for collating metadata
about research data at national level
25. Data Citation
īŽ Issues include (Ball & Duke, 2011a and b):
ī¨ At what granularity should data be made citeable?
ī¨ How to credit each contributor in a dataset that is
assembled from very many contributions?
ī¨ Where in a research paper should a data citation be
given (e.g. a paper describing a dataset versus
subsequent papers using it)?
ī¨ What to do with frequently updated data?
May-13
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmros
e
26. DataCite
īŽ DataCite (http://www.datacite.org) is a not-for-profit
organisation that aims to promote and support the
sharing of research data
īŽ They are developing an infrastructure that supports
methods of data citation, discovery, and access
īŽ They are currently leveraging the DOI (Digital Object
Identifier) infrastructure, which is also used for research
articles
īŽ They can provide DOIs for datasets
īŽ DataCite DOIs have to resolve to a public landing page
with information about the dataset and a direct link to it
May-13
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmros
e
27. DataCite
īŽ Basic form:
īŽ Creator (PublicationYear): Title. Publisher. Identifier
īŽ Version and ResourceType are optional extra elements
īŽ For citation purposes, DataCite recommends that DOI
names are displayed as linkable, permanent URLs
īŽ More info in DataCite (2011)
īŽ University of Poppleton (2011): Precipitation
measurements 1905-2010 taken at Western Bank
weather station. Meteorological service, The University
of Poppleton. http://dx.doi.org/10.1594/UoP.MS.298
May-13
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmros
e
28. References
īŽ Ball, A., (2009). Scientific Data Application Profile Scoping Study Report. Bath:
UKOLN, University of Bath. Retrieved from: http://www.ukoln.ac.uk/projects/sdapss/
īŽ Ball, A., & Duke, M. (2011a). Data Citation and Linking. DCC Briefing Papers.
Edinburgh: Digital Curation Centre. Retrieved from
http://www.dcc.ac.uk/resources/briefing-papers/introduction-curation/data-citation-
and-linking
īŽ Ball, A., & Duke, M. (2011b). How to Cite Datasets and Link to Publications. DCC
How-To Guides. Edinburgh: Digital Curation Centre. Retrieved from
http://www.dcc.ac.uk/resources/how-guides/cite-datasets
īŽ DataCite (2011). DataCite Metadata Schema for the Publication and Citation of
Research Data. Version 2.2. London: DataCite. Retrieved from
http://schema.datacite.org/meta/kernel-2.2/doc/DataCite-MetadataKernel_v2.2.pdf.
doi:10.5438/0005
īŽ Rumsey, S. (2012). Just enough metadata: Metadata for research datasets in
institutional data repositories [PowerPoint presentation]. Oxford: The University of
Oxford. Retrieved from
http://damaro.oucs.ox.ac.uk/docs/Just%20enough%20metadata%20v3-1.pdf
May-13
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmros
e