This document discusses persistent identifiers for datasets and describes the EZID service for assigning identifiers. It begins with an overview of data citation and identifiers, explaining what identifiers are and providing an example. It then describes how EZID can be used to create and manage persistent identifiers and associated metadata over time. The document concludes by discussing considerations for identifier selection and the role of identifiers in the life cycle of a dataset from creation through publication and archiving.
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
Data Citation and Identifiers: DOIs, ARKs, and EZID
1. Dataset Citation and Identifiers
DOIs, ARKs, and EZID
Joan Starr
California Digital Library
April, 2012
@joan_starr
2. Dataset Citation & Identifiers
Data Citation
Identifiers 101
Dataset Identification with EZID
Choosing an Identifier
Life Cycle Data Management
Looking ahead
By Brain farts (Joschua) http://www.flickr.com/photos/brainfarts/97676505/
3. Partnership between CDL | 10 UC campuses | Peer institutions
Provide solutions, services, resources for digital assets
Pool & distribute diverse experience, expertise, & resources
4. Data Citation
By barryegan (Vitor Leite) http://www.flickr.com/photos/vixon/116447718/
9. Dataset Citation & Identifiers
Data Citation
Identifiers 101
Dataset Identification with EZID
Choosing an Identifier
Life Cycle Data Management
Looking ahead
By Brain farts (Joschua) http://www.flickr.com/photos/brainfarts/97676505/
11. What is an identifier?
What you see: alphanumeric string (never changes)
Associated with: location of object (such as a URL)
Optional: who, what, when, etc (i.e. metadata)
By Joelk75: http://www.flickr.com/photos/75001512@N00/2728233597/
12. Identifier example
string: doi:10.9999/FK40K2GTV
html version: http://dx.doi.org/10.9999/FK40K2GTV
location: http://www.bologna.edu/biology/xfg/123.xls
metadata
creator: Dr. Felix Kottor
title: Data for chromosomal study of catfish (Ictalurus
punctatus)
publisher: University of Bologna
date: 8/31/2011
13. Identifier example
string: doi:10.9999/FK40K2GTV
html version: http://dx.doi.org/10.9999/FK40K2GTV
location: http://www.state.edu/ecology/783sdr/123.xls
metadata
creator: Dr. Felix Kottor
title: Data for chromosomal study of catfish (Ictalurus
punctatus)
publisher: Dryad Data Repository
date: 10/01/2011
14. Identifiers 201
By Christi Nielsen http://www.flickr.com/photos/christinielsen/476326980/
16. Dataset Citation & Identifiers
Data Citation
Identifiers 101
Dataset Identification with EZID
Choosing an Identifier
Life Cycle Data Management
Looking ahead
By Brain farts (Joschua) http://www.flickr.com/photos/brainfarts/97676505/
17. EZID: long-term identifiers made easy
take control of the
management and
distribution of your research,
share and get credit for it,
and build your reputation
through its collection and
documentation
Primary Functions
1. Create persistent identifiers
2. Manage identifiers over time
3. Manage associated metadata over time
23. Dataset Citation & Identifiers
Data Citation
Identifiers 101
Dataset Identification with EZID
Choosing an Identifier
Life Cycle Data Management
Looking ahead
By Brain farts (Joschua) http://www.flickr.com/photos/brainfarts/97676505/
24. DOIs and ARKs
• both can work like regular hyperlinks.
• both can refer to a
subset or portion of
a resource.
• both become persistent
when the target URL
is maintained.
http://content.cdlib.org/ark:/13030/tf0v19n605/,
courtesy of UC Davis Special Collections
25. DOIs vs ARKs
• Case sensitive
• Special feature supports granularity
• Informative
• Less costly
26. DOIs vs ARKs: suffix pass-through
• string: ark:/99999/Big4 /*
• location: http://x.y.z/foo/Big4/db/*
27. DOIs vs ARKs: suffix pass-through
• string: ark:/99999/Big4/table/cell/45-8.txt
• location: http://x.y.z/foo/Big4/db /table/cell/45-8.txt
28. DOIs vs ARKs
• Established brand in publishing
• Indexed by major A&I citation databases
• Cannot be deleted
• More costly
29. Dataset Citation & Identifiers
Data Citation
Identifiers 101
Dataset Identification with EZID
Choosing an Identifier
Life Cycle Data Management
Looking ahead
By Brain farts (Joschua) http://www.flickr.com/photos/brainfarts/97676505/
30. The Life of Data
By jfcherry http://www.flickr.com/photos/67272961@N03/6123892769/
31. A life cycle approach
CDL Curation and Publishing Services
http://www.cdlib.org
Create, edit, share, and save
data management plans
Open source add-in for Microsoft Excel
as a data collection tool
Create and manage
persistent identifiers
Curation repository:
store, manage, and share research data
Open access scholarly publishing services:
papers, journals, books, seminars & more
An infrastructure to publish and get credit Data Publication
for sharing research data
32. Identifiers and the data life cycle
Track your Organize
results your data
Get
more
citations
Meet funder requirements
33. Dataset Citation & Identifiers
Data Citation
Identifiers 101
Dataset Identification with EZID
Choosing an Identifier
Life Cycle Data Management
Looking ahead
By Brain farts (Joschua) http://www.flickr.com/photos/brainfarts/97676505/
34. 1. New User Interface.
By Leonard John Matthews http://www.flickr.com/photos/mythoto/3964995003/
35.
36.
37. 2. Growing user community
http://www.cdlib.org/services/uc3/ezid/clients.html
Thanks to Scott Edmunds, GigaScience Journal for input
40. For more information
EZID
EZID application: http://n2t.net/ezid/
EZID website: http://www.cdlib.org/services/uc3/ezid/
EZID on Twitter: @ezidCDL
Joan Starr: uc3@ucop.edu @joan_starr
Librarians THINK ABOUT THE METADATA THAT THE CITATION REPRESENTS.TO US, IT LOOKS LIKE….DESCRIPTIONAnd we want it to support DISCOVERY AND PRESERVATIONOur interests coincide with the researchers.RE-USE just like ACCESS demands:To know that the data exist, Know where to get the data, andBeable to get the datain a form that is easily integrated into local workflows. And bothPRESERVATION and DATA MANAGEMENT demand:The object be easy to maintainThe funders’ requirements are for data management and And the library’s charge is to preserve our institutions’ scholarly assets
From ICPSR—Inter-University Consortium for Political and Social Research http://www.icpsr.umich.edu/icpsrweb/ICPSR/curation/citations.jspTitleAuthorDateVersionPersistent identifier (such as the Digital Object Identifier, Uniform Resource Name URN, or Handle System)From ESIP –Earth Science Information Partners ((http://wiki.esipfed.org/index.php/Interagency_Data_Stewardship/Citations/provider_guidelines)
On Mon, Nov 7, 2011 at 12:19 PM, <Rebecca.Lawrence@f1000.com> wrote:Dear all, We (Faculty of 1000 and GigaScience/BGI) are currently writing an open letter to Nature/Science about the fact that data DOIs need to be included in the proper reference list of a paper so that for one, they can be picked up by Thomson Reuters and counted properly as data citations, as there have been some instances recently of publishers refusing to include these formally in the ref list. In fact there has been quite a lot of discussion recently in various venues about how important this is and we would like to obviously get as many publishers as possible to agree to sign up to this.
So here is what this means. Here is an example of a data set deposited with one of our clients, Dryad.Dryad is an international repository of data underlying peer-reviewed articles in the basic and applied biosciences.
So just what are these Identifiers?
DOIs are one kind of persistent identifier.But what is an identifier?An identifier is an alphanumeric string assigned to an object, and if that assignment is managed with some metadata and the object is made available over time, the identifier becomes a VERY reliable way of keeping track of that object.
(this is not an actual DOI, nor an actual study)
And here’s that same DOI some time later.THE STRING NEVER CHANGES. This means it can be cited, tracked and associated with all kinds of metadata.
We’re going to look at that same DOI so we can talk about it’s structure. Remember: this is a STRING associated with a TARGET URL.DOI structure is based on the Handle system of identifiers, because you can think of DOIs are a special implementation of the Handle system.So, here is the segment called the PREFIX.All DOI prefixes begin with ’10’ and this is followed by a “dot” and more numbers. The prefix is a unique number assigned to the specific registrant of DOIs. CDL has its own prefix, for example. NCAR has one too. The prefix is the common element in every DOI the registrant makes.The second part is the suffix--the part after the slash. This part has to be unique for every DOI created with the prefix.
EZID is CDL’s application for offering DataCite DOIs, ARKs as well as other identifiers. Soon we’ll support URNs, for example.
How can we be in the business of issuing DataCite DOIs? California Digital Library was one of the founding members.DataCite was indeed formed in 2009 by 10 Libraries and Research Centers with a Mission: “"Helping you find, access, and reuse data“The number has now grown to 15. In addition there are 3 associate members, including the Korea Institute of Science and Technology Information and BGI, so there is a presence in Asia.DATACITE’s primary methodology for achieving this mission: issuing DOIs (Digital Object Identifiers) for datasets.
If you go to the Home Page, you can use the UI to test EZID. CLICK for HELP TAB.
On the Help screen, you have the choice of creating a test ARK or DOI.
EZID creates the identifier and sends you to the MANAGE tab where you have the opportunity to enter a target URL and other metadata.
When you hover over a field, it opens up for editing as you can see here. This is where you would go if you wanted to maintain the metadata or the target URL.
ARKs come from the Library and Museum world and have been adopted by some large cultural organizations around the world.FLEXIBLE: can identify objects of any type: digital, physical, living and intangible.CASE SENSITVE: MORE OPTIONS (CD, Cd, cD, cd are all distinct)ARKs have a feature called suffix pass-through—remember sufffixes? It means you can register the root of a file structure and get pointers to the rest of the file structure for free. I’ll show you an example in a minute.ARKs CAN GIVE YOU EXTRA INFORMATION with something called “inflections” or different endings, ? and ??—in test nowIf the registrant has supplied the information, an ARK should return ? metadata and ?? commitment to persistence
Register a single ARKmapped to the top level of your database organization.And you get a kind of wild card. If you have 10,000 nameable parts of your database, you only need to register that one top level item.The ARK server will PASS THROUGH any suffix you later add (but don’t register) to your location server.
Here is an example.It’s a powerful way to handle a very large number of items. We have this in testing mode right now and hope to bring it into production later this year.
The gold standardDOIs are for keepsDOIs should be assigned to objects that are under good long-term management, and where there is an intention is to make the object persistently available.DOIs must be registered exclusively with metadata that is available to public view.Can DOIs and ARKs work together?These two identifier schemes can work well together, and EZID offers them both, along with policy support consistent across both schemes.
Researchers know all too well about the life of a dataset.You start out in a laptop (or a tablet) travelling around, or under a deskMaybe then you get emailed across the country or around the world.Years can go by as you get updated and altered.Eventually, maybe you have a day in the sun: your researcher decides to write up the results and cite you.Then, perhaps, it’s back to a server in the dark. Or, you move from server to server. Will you be forgotten?
That’s why we at California Digital Library have taken a life cycle approach with an array of tools.CDL has developed an array of tools and services ranging from the first stage of developing a data management plan, through to formal publication. We encourage researchers to assign an ID early in the process - to provide a credible data management plan for funders;- to make the later stages easier and - to manage situations where changes might occur during the course of the research—a researcher changes institutions or a research team changes the location of their data, for example.
One unified destination for all EZID INFORMATION.—screenshot of home pageNOTE: This screen is subject to change between now and release date!
b) Another big change you’ll see immediately is the new Manage IDs screen, where you can view all the identifiers you have created, as well as the last 10 you’ve been working on.In the next few months, we will also be introducing enhanced support for the DataCite metadata scheme, as well as other features that make reporting on institution accounts easier.
For data citation to work, it is necessary for data providers and publishers to adopt the practice.Government data centers, university-hosted research institutes, research libraries offering data management services, publishers beginning to support the data behind scholarly work.
And, we are banding together with the other two US DataCite full members (Purdue University and the Office of Scientific and Technical Information—OSTI) to create the DataCite US Alliance. We’ll be growing the number of US Affiliate members and, with them, we’ll have a larger voice to support researchers and organizations here in the US. We’ve noticed that there are patterns of research and data management practice here distinct from those in Europe, where DataCite is headquartered.Let me know if you are interested.
This is the key to tracking identifiers, building those statistics—the second piece of making data citation work. The final key is community usage—scholars must cite data and the community must use the metrics.DataCite metadata in harvestable form (OAI-PMH)Ex Libris is harvesting now. Discussions underway with Thomson Reuters and Elsevier.