SlideShare a Scribd company logo
1 of 15
Download to read offline
Curating for Value in Different Data Stewardship
Paradigms
Mark A. Parsons
!
!
!
!
International Digital Curation Conference
San Francisco, California
26 January 2014

Unless otherwise noted, the slides in this presentation are licensed by Mark A. Parsons under a Creative Commons Attribution-Share Alike 3.0 License
The role of “curation”
• What was my role at the National Snow and Ice Data Center?
• data manager
• data scientist
• information scientist
• informaticist
• data steward
• data curator
• project manager
• operations manager
• data “janitor”

• “In between” work
Curation
Oxford English Dictionary:
cuˈration, n.
Etymology:  Middle English, < Old French curacion , < Latin cūrātiōn-em ,
noun of action < cūrāre to cure v.
 1. The action of curing; healing, cure. (c1374—1677)
DCC: Digital curation involves maintaining, preserving, and adding value to
digital research data throughout its lifecycle.
NAS: the active management and enhancement of digital information for
current and future use.
Parsons (for today’s purposes):
• to add generative value
The generative value of data
• Generative value per Jonathan Zittrain (2008) as
interpreted and extended to data by John Wilbanks:
“the capacity to produce unanticipated change through
unfiltered contributions from broad and varied
audiences.” —J. Zittrain
• Data become more generative by being more adaptable,
more easily mastered, more accessible, and more
connected and influential.
• Not net present value but net potential value.
accessibility

adaptability

ease
of mastery

leverage

slide courtesy John Wilbanks 2013

Unless otherwise noted, the slides in this presentation are licensed by Mark A. Parsons under a Creative Commons Attribution-Share Alike 3.0 License
slide courtesy John Wilbanks 2013

Unless otherwise noted, the slides in this presentation are licensed by Mark A. Parsons under a Creative Commons Attribution-Share Alike 3.0 License
EASY TO USE
NO OPEN LICENSE

accessibility

adaptability

ease
of mastery

leverage

slide courtesy John Wilbanks 2013

Unless otherwise noted, the slides in this presentation are licensed by Mark A. Parsons under a Creative Commons Attribution-Share Alike 3.0 License
accessibility

NO OPEN LICENSE
DOWNLOAD AVAILABLE

adaptability

ease
of mastery

leverage

Unless otherwise noted, the slides in this presentation are licensed by Mark A. Parsons under a Creative Commons Attribution-Share Alike 3.0 License
slide courtesy John Wilbanks

2013
The power of metaphor
Language gets its power because it is defined relative to
frames, prototypes, metaphor, narratives, images and
emotions. Part of its power comes from unconscious
aspects: we are not consciously aware of all that it evokes
in us, but it is there, hidden, always at work. If we hear the
same language over and over, we will think more and more
in terms of the frames and metaphors activated by that
language.
	 	 —George Lakoff (2008)
as quoted in Parsons & Fox. 2013. Is data publication the right metaphor? Data Science
Journal 12.
Four perspectives on roles, i.e. practice
Lawrence B, et al.. 2011. Citation and peer review of data: Moving towards
formal data publication. International Journal of Digital Curation 6 (2).
• A data publication perspective
Baker KS and GC Bowker. 2007. Information ecology: Open system
environment for data, memories, and knowing. Journal of Intelligent
Information Systems 29 (1): 127-144.
• An ecological infrastructuring perspective
Schopf JM. 2012. Treating data like software: A case for production quality
data. Proceedings of the Joint Conference on Digital Libraries 11-14 June
2012, Washington DC.
• A production software perspective
My experience at NSIDC and how the center defined and redefined curation
roles over the years in response to evolving user needs and project mandates.
• A hybrid, ad hoc perspective
Controls in the process
• Who controls what? when?
• “gatekeepers” and “controllers”
• “assessors” and “analysers”
• “testing” by machine and human.
• “reviewers”
!

• Drivers at NSIDC
• reduce user enquiries; so as users have evolved, the controls and their
associated roles have changed.
• funding requirements
• different types of data streams (e.g. NRT satellite data vs. indigenous
knowledge).
Controls and value
• Controls usually increase the generative value of the data.
• Focus is often on mastery.
• Controls can delay the release of data.
• For example, QC and documentation to avoid “misuse”
• Issues: When is this an excuse to hoard? When should the QC occur?
Shouldn’t it be continuous? Who can certify quality? The provider, user,
curator, peer (who’s that)?
• Controls can reduce the adaptability and connection of the data through
controlled access mechanisms or data models.
• There can be tensions between the different aspects of generativity.
• Flexibility in controls can help prioritize and handle growing demands and
sometimes conflicting requirements.
Some suggested practice
1. “Get out of the way!” —David DeRoure
2. Assess how control steps both augment and inhibit attributes of
generative value.
3. Consciously consider different perspectives and different terminologies
when designing data workflows.
• Ask “how” questions as well as “what” questions of users and providers.
4. Iterate controls in steps and optimize across multiple dimensions of value.
5. Allow multiple players (users, curators, providers, stakeholders) to
annotate, adapt, and update data.
6. Define and explain different “levels of service”.
7. 4 - 6 require well documented and referenced versioning.
Practice guiding theory
• How can we define and measure “objective” value of data (and associated
information including code) as opposed to “subjective” quality of data?
• How do different metaphors and models of data stewardship/publication/
production/presentation affect attitudes and actions of different
stakeholders in the data lifecycle?
Thank you!
rd-alliance.org
enquiries@rd-alliance.org or parsom3@rpi.edu
Twitter: @resdatall
!

upcoming Plenaries:
	 26-28 March in Dublin
	 22-24 Sept. in Amsterdam

More Related Content

Similar to Curating for Value in Different Data Stewardship Paradigms

RDAP 15: Research Data Integration in the Purdue Libraries
RDAP 15: Research Data Integration in the Purdue LibrariesRDAP 15: Research Data Integration in the Purdue Libraries
RDAP 15: Research Data Integration in the Purdue LibrariesASIS&T
 
RDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOneRDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOneResearch Data Alliance
 
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataNIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataSusanna-Assunta Sansone
 
Studying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and ResidentsStudying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and ResidentsLynn Connaway
 
Studying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and ResidentsStudying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and ResidentsOCLC
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...SEAD
 
Successfully Defending Your Dissertation Using NVivo
Successfully Defending Your Dissertation Using NVivo Successfully Defending Your Dissertation Using NVivo
Successfully Defending Your Dissertation Using NVivo QSR International
 
New Paradigm for Ensuring and Improving Data Quality and Usability
New Paradigm for Ensuring and Improving Data Quality and UsabilityNew Paradigm for Ensuring and Improving Data Quality and Usability
New Paradigm for Ensuring and Improving Data Quality and UsabilityGe Peng
 
NCME Big Data in Education
NCME Big Data  in EducationNCME Big Data  in Education
NCME Big Data in EducationPhilip Piety
 
Impact the UX of Your Website with Contextual Inquiry
Impact the UX of Your Website with Contextual InquiryImpact the UX of Your Website with Contextual Inquiry
Impact the UX of Your Website with Contextual InquiryRachel Vacek
 
Choosing What to Hold and What to Fold: Database Quality Decisions in Tough ...
Choosing What to Hold and What to Fold: Database Quality Decisions in Tough ...Choosing What to Hold and What to Fold: Database Quality Decisions in Tough ...
Choosing What to Hold and What to Fold: Database Quality Decisions in Tough ...tfons
 
Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...
Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...
Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...Eric Stephan
 
California Ocean Science Trust " Building a Sustainable Knowledge Base for ...
California Ocean Science Trust " Building a Sustainable Knowledge Base for ...California Ocean Science Trust " Building a Sustainable Knowledge Base for ...
California Ocean Science Trust " Building a Sustainable Knowledge Base for ...Tom Moritz
 
Getting Started and Finishing your Dissertation Using NVivo
Getting Started and Finishing your Dissertation Using NVivoGetting Started and Finishing your Dissertation Using NVivo
Getting Started and Finishing your Dissertation Using NVivoQSR International
 
FORCE11: Creating a data and tools ecosystem
FORCE11:  Creating a data and tools ecosystemFORCE11:  Creating a data and tools ecosystem
FORCE11: Creating a data and tools ecosystemMaryann Martone
 
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...DuraSpace
 
Umm, how did you get that number? Managing Data Integrity throughout the Data...
Umm, how did you get that number? Managing Data Integrity throughout the Data...Umm, how did you get that number? Managing Data Integrity throughout the Data...
Umm, how did you get that number? Managing Data Integrity throughout the Data...John Kinmonth
 
20160607 citation4software opening
20160607 citation4software opening20160607 citation4software opening
20160607 citation4software openingDaniel S. Katz
 

Similar to Curating for Value in Different Data Stewardship Paradigms (20)

RDAP 15: Research Data Integration in the Purdue Libraries
RDAP 15: Research Data Integration in the Purdue LibrariesRDAP 15: Research Data Integration in the Purdue Libraries
RDAP 15: Research Data Integration in the Purdue Libraries
 
RDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOneRDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOne
 
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataNIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
 
Studying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and ResidentsStudying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and Residents
 
Studying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and ResidentsStudying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and Residents
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
 
Digital Curation 101 - Taster
Digital Curation 101 - TasterDigital Curation 101 - Taster
Digital Curation 101 - Taster
 
Successfully Defending Your Dissertation Using NVivo
Successfully Defending Your Dissertation Using NVivo Successfully Defending Your Dissertation Using NVivo
Successfully Defending Your Dissertation Using NVivo
 
New Paradigm for Ensuring and Improving Data Quality and Usability
New Paradigm for Ensuring and Improving Data Quality and UsabilityNew Paradigm for Ensuring and Improving Data Quality and Usability
New Paradigm for Ensuring and Improving Data Quality and Usability
 
NCME Big Data in Education
NCME Big Data  in EducationNCME Big Data  in Education
NCME Big Data in Education
 
Impact the UX of Your Website with Contextual Inquiry
Impact the UX of Your Website with Contextual InquiryImpact the UX of Your Website with Contextual Inquiry
Impact the UX of Your Website with Contextual Inquiry
 
Choosing What to Hold and What to Fold: Database Quality Decisions in Tough ...
Choosing What to Hold and What to Fold: Database Quality Decisions in Tough ...Choosing What to Hold and What to Fold: Database Quality Decisions in Tough ...
Choosing What to Hold and What to Fold: Database Quality Decisions in Tough ...
 
Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...
Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...
Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...
 
California Ocean Science Trust " Building a Sustainable Knowledge Base for ...
California Ocean Science Trust " Building a Sustainable Knowledge Base for ...California Ocean Science Trust " Building a Sustainable Knowledge Base for ...
California Ocean Science Trust " Building a Sustainable Knowledge Base for ...
 
Getting Started and Finishing your Dissertation Using NVivo
Getting Started and Finishing your Dissertation Using NVivoGetting Started and Finishing your Dissertation Using NVivo
Getting Started and Finishing your Dissertation Using NVivo
 
Open Discovery Initiative Successes - January 28, 2015
Open Discovery Initiative Successes - January 28, 2015Open Discovery Initiative Successes - January 28, 2015
Open Discovery Initiative Successes - January 28, 2015
 
FORCE11: Creating a data and tools ecosystem
FORCE11:  Creating a data and tools ecosystemFORCE11:  Creating a data and tools ecosystem
FORCE11: Creating a data and tools ecosystem
 
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...
 
Umm, how did you get that number? Managing Data Integrity throughout the Data...
Umm, how did you get that number? Managing Data Integrity throughout the Data...Umm, how did you get that number? Managing Data Integrity throughout the Data...
Umm, how did you get that number? Managing Data Integrity throughout the Data...
 
20160607 citation4software opening
20160607 citation4software opening20160607 citation4software opening
20160607 citation4software opening
 

More from Mark Parsons

Parsons scidatacon2016
Parsons scidatacon2016Parsons scidatacon2016
Parsons scidatacon2016Mark Parsons
 
Data Policy for Open Science
Data Policy for Open ScienceData Policy for Open Science
Data Policy for Open ScienceMark Parsons
 
Why Data Citation Currently Misses the Point
Why Data Citation Currently Misses the PointWhy Data Citation Currently Misses the Point
Why Data Citation Currently Misses the PointMark Parsons
 
Parsons citation geodata2014
Parsons citation geodata2014Parsons citation geodata2014
Parsons citation geodata2014Mark Parsons
 
Is data publication the right metaphor?
Is data publication the right metaphor?Is data publication the right metaphor?
Is data publication the right metaphor?Mark Parsons
 
Identity, Location, and Citation at NEON
Identity, Location, and Citation at NEONIdentity, Location, and Citation at NEON
Identity, Location, and Citation at NEONMark Parsons
 

More from Mark Parsons (6)

Parsons scidatacon2016
Parsons scidatacon2016Parsons scidatacon2016
Parsons scidatacon2016
 
Data Policy for Open Science
Data Policy for Open ScienceData Policy for Open Science
Data Policy for Open Science
 
Why Data Citation Currently Misses the Point
Why Data Citation Currently Misses the PointWhy Data Citation Currently Misses the Point
Why Data Citation Currently Misses the Point
 
Parsons citation geodata2014
Parsons citation geodata2014Parsons citation geodata2014
Parsons citation geodata2014
 
Is data publication the right metaphor?
Is data publication the right metaphor?Is data publication the right metaphor?
Is data publication the right metaphor?
 
Identity, Location, and Citation at NEON
Identity, Location, and Citation at NEONIdentity, Location, and Citation at NEON
Identity, Location, and Citation at NEON
 

Recently uploaded

COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 

Recently uploaded (20)

COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 

Curating for Value in Different Data Stewardship Paradigms

  • 1. Curating for Value in Different Data Stewardship Paradigms Mark A. Parsons ! ! ! ! International Digital Curation Conference San Francisco, California 26 January 2014 Unless otherwise noted, the slides in this presentation are licensed by Mark A. Parsons under a Creative Commons Attribution-Share Alike 3.0 License
  • 2. The role of “curation” • What was my role at the National Snow and Ice Data Center? • data manager • data scientist • information scientist • informaticist • data steward • data curator • project manager • operations manager • data “janitor” • “In between” work
  • 3. Curation Oxford English Dictionary: cuˈration, n. Etymology:  Middle English, < Old French curacion , < Latin cūrātiōn-em , noun of action < cūrāre to cure v.  1. The action of curing; healing, cure. (c1374—1677) DCC: Digital curation involves maintaining, preserving, and adding value to digital research data throughout its lifecycle. NAS: the active management and enhancement of digital information for current and future use. Parsons (for today’s purposes): • to add generative value
  • 4. The generative value of data • Generative value per Jonathan Zittrain (2008) as interpreted and extended to data by John Wilbanks: “the capacity to produce unanticipated change through unfiltered contributions from broad and varied audiences.” —J. Zittrain • Data become more generative by being more adaptable, more easily mastered, more accessible, and more connected and influential. • Not net present value but net potential value.
  • 5. accessibility adaptability ease of mastery leverage slide courtesy John Wilbanks 2013 Unless otherwise noted, the slides in this presentation are licensed by Mark A. Parsons under a Creative Commons Attribution-Share Alike 3.0 License
  • 6. slide courtesy John Wilbanks 2013 Unless otherwise noted, the slides in this presentation are licensed by Mark A. Parsons under a Creative Commons Attribution-Share Alike 3.0 License
  • 7. EASY TO USE NO OPEN LICENSE accessibility adaptability ease of mastery leverage slide courtesy John Wilbanks 2013 Unless otherwise noted, the slides in this presentation are licensed by Mark A. Parsons under a Creative Commons Attribution-Share Alike 3.0 License
  • 8. accessibility NO OPEN LICENSE DOWNLOAD AVAILABLE adaptability ease of mastery leverage Unless otherwise noted, the slides in this presentation are licensed by Mark A. Parsons under a Creative Commons Attribution-Share Alike 3.0 License slide courtesy John Wilbanks 2013
  • 9. The power of metaphor Language gets its power because it is defined relative to frames, prototypes, metaphor, narratives, images and emotions. Part of its power comes from unconscious aspects: we are not consciously aware of all that it evokes in us, but it is there, hidden, always at work. If we hear the same language over and over, we will think more and more in terms of the frames and metaphors activated by that language. —George Lakoff (2008) as quoted in Parsons & Fox. 2013. Is data publication the right metaphor? Data Science Journal 12.
  • 10. Four perspectives on roles, i.e. practice Lawrence B, et al.. 2011. Citation and peer review of data: Moving towards formal data publication. International Journal of Digital Curation 6 (2). • A data publication perspective Baker KS and GC Bowker. 2007. Information ecology: Open system environment for data, memories, and knowing. Journal of Intelligent Information Systems 29 (1): 127-144. • An ecological infrastructuring perspective Schopf JM. 2012. Treating data like software: A case for production quality data. Proceedings of the Joint Conference on Digital Libraries 11-14 June 2012, Washington DC. • A production software perspective My experience at NSIDC and how the center defined and redefined curation roles over the years in response to evolving user needs and project mandates. • A hybrid, ad hoc perspective
  • 11. Controls in the process • Who controls what? when? • “gatekeepers” and “controllers” • “assessors” and “analysers” • “testing” by machine and human. • “reviewers” ! • Drivers at NSIDC • reduce user enquiries; so as users have evolved, the controls and their associated roles have changed. • funding requirements • different types of data streams (e.g. NRT satellite data vs. indigenous knowledge).
  • 12. Controls and value • Controls usually increase the generative value of the data. • Focus is often on mastery. • Controls can delay the release of data. • For example, QC and documentation to avoid “misuse” • Issues: When is this an excuse to hoard? When should the QC occur? Shouldn’t it be continuous? Who can certify quality? The provider, user, curator, peer (who’s that)? • Controls can reduce the adaptability and connection of the data through controlled access mechanisms or data models. • There can be tensions between the different aspects of generativity. • Flexibility in controls can help prioritize and handle growing demands and sometimes conflicting requirements.
  • 13. Some suggested practice 1. “Get out of the way!” —David DeRoure 2. Assess how control steps both augment and inhibit attributes of generative value. 3. Consciously consider different perspectives and different terminologies when designing data workflows. • Ask “how” questions as well as “what” questions of users and providers. 4. Iterate controls in steps and optimize across multiple dimensions of value. 5. Allow multiple players (users, curators, providers, stakeholders) to annotate, adapt, and update data. 6. Define and explain different “levels of service”. 7. 4 - 6 require well documented and referenced versioning.
  • 14. Practice guiding theory • How can we define and measure “objective” value of data (and associated information including code) as opposed to “subjective” quality of data? • How do different metaphors and models of data stewardship/publication/ production/presentation affect attitudes and actions of different stakeholders in the data lifecycle?
  • 15. Thank you! rd-alliance.org enquiries@rd-alliance.org or parsom3@rpi.edu Twitter: @resdatall ! upcoming Plenaries: 26-28 March in Dublin 22-24 Sept. in Amsterdam