SlideShare a Scribd company logo
1 of 23
Download to read offline
Topic Exploration with the
HTRC Data Capsule for Non-Consumptive
Joint Conference on Digital Libraries 2015 | Knoxville, TN| 06.21.15
Robert H. McDonald | Jiaan Zeng - Data To Insight Center
Jaimie Murdock – InPho Project
Indiana University
Tweet us - @HathiTrust #HTRC
HATHI TRUST RESEARCH CENTER
Tweet us - @InPhoproject
#HTRC @HathiTrust
Tutorial Agenda
• 9:00-9:15 - An overview of the HTRC (Robert
McDonald)
• 9:15-9:30 - HTRC Data Capsule Intro (Jiaan Zeng)
• 9:30-9:45 - Intro to Topic Models and the InPho
Explorer (Jaimie Murdock)
• 9:45-10:30 - Hands-On Parts 1&2
• 10:30-10:45 - Break
• 10:45-11:30 - Hands-On Parts 3&4
• 11:30-11:45 – Advanced Notebooks (Jaimie Murdock)
• 11:45-12:00 – HTRC Advanced Collaborative Support
(Robert McDonald)
HTRC@Events
• HTRC UnCamp 2015 – March
30-31, 2015 Ann Arbor, MI
• Stephen Downie Keynote at
JCDL 2015
• Digital Humanities 2015 – June
29-July 3, 2015 Sydney Australia
• (LSA)'s Biennial Linguistic
Institute, July 13, 2015 Chicago,
IL
• HILT 2015 – July 28-29, 2015
Indianapolis, IN
HATHI TRUST RESEARCH CENTER
Many thanks …
HTRC IU Team
• Beth Plale (PI)
• Robert H. McDonald
• Miao Chen
• Guangchen Ruan
• Zong Peng
• Milinda Pathirage
• Samitha Liyanage
• Jiaan Zeng
• Zong Peng
• Leena Unnikrishnan
• Nicholae Cline
HTRC UIUC Team
• J. Stephen Downie (PI)
• Beth Namachchivaya
• Megan Senseney
• Sayan Bhattacharyya
• Loretta Auvil
• Boris Capitanu
• Harriet Green
• Eleanor Dickson
#HTRC @HathiTrust
Outline
• What is the HTRC?
• Non-Consumptive Research Paradigm
• Current Architecture
• Future Architecture
• Advanced Collaborative Support (RFP)
#HTRC @HathiTrust
HathiTrust Digital Library
• HathiTrust is a partnership of 90+
academic & research institutions,
offering a collection of millions
of digitized titles.
• http://hathitrust.org
– IU is a founding member of the
HathiTrust along with University
of Michigan, University of
California, and the University of
Virginia
#HTRC @HathiTrust
HathiTrust Research Center
Mission
• Public research arm of HathiTrust
• Goal: enable researchers world-wide to
accomplish tera-scale text data-mining and
analysis
– Develop cutting-edge software tools for processing,
analyzing text
– Develop cyberinfrastructure to enable HPC access to
the HathiTrust Digital Library
• Established: July, 2011
• Collaborative center: Indiana University &
University of Illinois
#HTRC @HathiTrust
HTRC Timeline
• Phase I: development 01 Jul 2011 – 31 Mar 2013
– HTRC software and services release v1.0
https://github.com/htrc
• Phase II: outreach, 01 Apr 2013 – 30 June 2014
– 2nd HTRC UnCamp Sep ’13
• Phase III: operations, 01 July 2014 – present (2014-2018)
HTRC Current Users (ca 2014)
Projected Use 2019
Digital
Humanities
(60)
Education
(60)
Informatics
(60)
Observers
(20)
194 existing user accounts
Lots of user accounts; good
starting point.
Improve :
• Increase amount of real work
being accomplished as
measured by usage on HTRC’s
compute resources Quarry and
Big Red II at IU
• Develop educational uses
• Develop informatics uses
• Decrease number of observers
to 10%
 Project 200 users at any one time
of which 90% are doing relevant
education/scholarship
9
HTRC Current Users (ca Now)
#HTRC @HathiTrust
Non-Consumptive Research Paradigm
• No action or set of actions on part of users,
either acting alone or in cooperation with
other users over duration of one or multiple
sessions can result in sufficient information
gathered from collection of copyrighted works
to reassemble pages from collection.
• Definition disallows collusion between users,
or accumulation of material over time.
Differentiates human researcher from proxy
which is not a user. Users are human beings.
HTRC
Complexity hiding interface
All the complexity
Tabular info
Statistical plots
Spatial plots
Request
HTRC Version 2.0
HTRC Goals
• Provide a persistent and sustainable structure to
enable original and cutting edge research.
– Leverage data storage and computational infrastructure at Indiana &
Illinois
– Stimulate community development of new functionality and tools
– Use tools to enable discoveries that would not be possible without
the HTRC
• Enable scholars to fully utilize content of
HathiTrust Library while preventing intellectual
property misuse within U.S. copyright law.
– Provision secure computational and data environment for scholars to
perform research using HathiTrust Digital Library.
HTRC Organization
2014-18
HTRC Executive
Mgmt
Administrative
Support
Core
Development
Advanced
Research
Advanced
Collaborative
Support
Scholarly
Commons
HTRC Data Capsule
HTRC Data Capsule@IU Team
• Beth Plale (PI)
• Jiaan Zeng
• Guangchen Ruan
HTRC Data Capsule@Michigan Team
• Atul Prakash (PI)
• Alexander Crowell
Jiaan Zeng, Guangchen Ruan, Alexander Crowell, Atul Prakash, and
Beth Plale. 2014. Cloud computing data capsules for non-
consumptiveuse of texts. In Proceedings of the 5th ACM workshop
on Scientific cloud computing (ScienceCloud '14). ACM, New York,
NY, USA, 9-16. DOI=10.1145/2608029.2608031
http://doi.acm.org/10.1145/2608029.2608031
Special Thanks to
• Samitha Liyanage
• Milinda Pathirage
• Zong Peng
• Earlence Fernandes
• Ajit Aluri
@hathitrust
HTRC Data Capsule Workflow
Data Capsule Screenshots
Maintenance Mode
Secure Mode
#HTRC @HathiTrust
HTRC Advanced Collaborative Support
• ACS will be offered on a rolling basis over next
four years 2014-18
• 1st RFP Call Deadline was Jan 8, 2015 5:00pm
eastern
– RFP - http://www.hathitrust.org/htrc/acs-rfp
• For more info on the Advanced Collaborative
Support please contact:
htrc.acs.awards@gmail.com
#HTRC @HathiTrust
Scholarly Commons
User Support Service
• Develop training materials
• Educational workshops
• Tool and workset creation
• Collaborate with librarians and DH
centers at HT institutions
• Assist researchers in HTRC text data
mining research projects
• Led out of University of Illinois
Library; smaller group at IU
• Resourced at 2.7 FTE.
20
Administra ve Support
Senior Library Personnel
(4 supervisors at .05 FTE)
Senior Project
Coordinator
(.25 FTE)
Execu ve Assistant
(.5 FTE)
Core Development
Sr. So ware Architect
(1.0 FTE)
Research Programmer
(.5 FTE)
Library Research
Programmer
(.5 FTE)
IU Systems
Administrator
(.25 FTE)
User Interface Specialist
(2 years at 1.0 FTE)
Informa cs Developers
(2 developers for 2 years
at .15 FTE)
Advanced Research
CS PhD Students
LIS PhD Students
UI Systems
Administrator
(.5 FTE)
Advanced Collabora ve
Support (coordinated by
M. Chen)
Research Programmer
(.5 FTE)
Computa onal Research
Liaison
(.5 FTE)
Asst Dir Outreach &
Educa on (M. Chen)
(1 year at .25 FTE)
Scholarly Commons
Dig Humani es Specialist
(1.0 FTE)
CLIR Postdoctoral
Research Associate
(2 years at 1.0 FTE)
Digital Research
Librarian support
(.2 FTE)
Scholars Commons
Support
(.5 FTE)
LIS MS Students
(.25 FTE) (.11 FTE)
Key:
Area
Proposed for funding by HathTrust
#HTRC @HathiTrust
HTRC Future Work
• Copyrighted content in progress
• Advanced Collaborative Support
– The award model
– Award content is HTRC ACS staff time
– Collaborate with scholars on addressing their research needs related
to HTRC
– E.g. prototyping, running text analysis
– Advocate open source; encourage extending the work to a grant
submission
• Scholars Commons
– Interaction with scholars to help using HTRC tools and services
– An interface to interact with HTRC users via the channel of scholars
commons
– Series of workshops at IU and other places
– Weekly consulting time
– Every Wed 2:30 – 4:30pm, IU library, Scholars Commons 157R
– Contact: Miao Chen, Nicholae Cline
#HTRC @HathiTrust
• For details http://www.hathitrust.org/htrc/faq
• General contact info
– J. Stephen Downie, Co-Director HTRC,
jdownie@Illinois.edu
– Beth Plale, Co-Director HTRC, plale@indiana.edu
• Requests for capability, interest
– Robert McDonald, rhmcdona@indiana.edu
#HTRC @HathiTrust
Important URLs
• HTRC Portal
– http://sharc.hathitrust.org
• Data Capsule Tutorial
– http://shoutkey.com/gin
• VNC Installation Directions
– http://shoutkey.com/peat

More Related Content

What's hot

BL Labs at Bloomsbury Digital Humanities Group
BL Labs at Bloomsbury Digital Humanities Group BL Labs at Bloomsbury Digital Humanities Group
BL Labs at Bloomsbury Digital Humanities Group labsbl
 
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...
New approaches for data acquisition at europeana  iiif, sitemaps and schema.o...New approaches for data acquisition at europeana  iiif, sitemaps and schema.o...
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...Nuno Freire
 
Intro to IIIF and IIIF @NLW
Intro to IIIF and IIIF @NLWIntro to IIIF and IIIF @NLW
Intro to IIIF and IIIF @NLWGlen Robson
 
Humanities Research with the Web of Data
Humanities Research with the Web of DataHumanities Research with the Web of Data
Humanities Research with the Web of DataMathieu d'Aquin
 
IIIF Pre-conference - Usability testing conducted on the UV and Mirador
IIIF Pre-conference - Usability testing conducted on the UV and MiradorIIIF Pre-conference - Usability testing conducted on the UV and Mirador
IIIF Pre-conference - Usability testing conducted on the UV and MiradorJulien A. Raemy
 
From Structured Data to Linked Open Governmental Data
From Structured Data to Linked Open Governmental DataFrom Structured Data to Linked Open Governmental Data
From Structured Data to Linked Open Governmental DataDongpo Deng
 
BL Labs Competition 2016
BL Labs Competition 2016BL Labs Competition 2016
BL Labs Competition 2016labsbl
 
Iiif to go iiif vatican (7 minutes)
Iiif to go   iiif vatican (7 minutes)Iiif to go   iiif vatican (7 minutes)
Iiif to go iiif vatican (7 minutes)Rachel Di Cresce
 
re3data.org – a Registry of Research Data Repositories
re3data.org – a Registry of Research Data Repositoriesre3data.org – a Registry of Research Data Repositories
re3data.org – a Registry of Research Data RepositoriesHeinz Pampel
 
Wehc - Linked Data for Economic-Social historians
Wehc - Linked Data for Economic-Social historiansWehc - Linked Data for Economic-Social historians
Wehc - Linked Data for Economic-Social historiansBram van den Hout
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueHerbert Van de Sompel
 
Cosi Usage Data
Cosi   Usage DataCosi   Usage Data
Cosi Usage Datadaveyp
 
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterElephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterRobert H. McDonald
 
MPhil Lecture on Data Vis for Analysis
MPhil Lecture on Data Vis for AnalysisMPhil Lecture on Data Vis for Analysis
MPhil Lecture on Data Vis for AnalysisShawn Day
 
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...Micah Altman
 

What's hot (20)

BL Labs at Bloomsbury Digital Humanities Group
BL Labs at Bloomsbury Digital Humanities Group BL Labs at Bloomsbury Digital Humanities Group
BL Labs at Bloomsbury Digital Humanities Group
 
2014_WWW_BTOR
2014_WWW_BTOR2014_WWW_BTOR
2014_WWW_BTOR
 
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...
New approaches for data acquisition at europeana  iiif, sitemaps and schema.o...New approaches for data acquisition at europeana  iiif, sitemaps and schema.o...
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...
 
Intro to IIIF and IIIF @NLW
Intro to IIIF and IIIF @NLWIntro to IIIF and IIIF @NLW
Intro to IIIF and IIIF @NLW
 
Humanities Research with the Web of Data
Humanities Research with the Web of DataHumanities Research with the Web of Data
Humanities Research with the Web of Data
 
IIIF Pre-conference - Usability testing conducted on the UV and Mirador
IIIF Pre-conference - Usability testing conducted on the UV and MiradorIIIF Pre-conference - Usability testing conducted on the UV and Mirador
IIIF Pre-conference - Usability testing conducted on the UV and Mirador
 
Scholze imcw 2014-11-25
Scholze imcw 2014-11-25Scholze imcw 2014-11-25
Scholze imcw 2014-11-25
 
From Structured Data to Linked Open Governmental Data
From Structured Data to Linked Open Governmental DataFrom Structured Data to Linked Open Governmental Data
From Structured Data to Linked Open Governmental Data
 
Scholze goportis 4-11-14
Scholze goportis 4-11-14Scholze goportis 4-11-14
Scholze goportis 4-11-14
 
BL Labs Competition 2016
BL Labs Competition 2016BL Labs Competition 2016
BL Labs Competition 2016
 
Open data and linked data
Open data and linked dataOpen data and linked data
Open data and linked data
 
Iiif to go iiif vatican (7 minutes)
Iiif to go   iiif vatican (7 minutes)Iiif to go   iiif vatican (7 minutes)
Iiif to go iiif vatican (7 minutes)
 
re3data.org – a Registry of Research Data Repositories
re3data.org – a Registry of Research Data Repositoriesre3data.org – a Registry of Research Data Repositories
re3data.org – a Registry of Research Data Repositories
 
Elab 16 5-13-re3data-scholze-final
Elab 16 5-13-re3data-scholze-finalElab 16 5-13-re3data-scholze-final
Elab 16 5-13-re3data-scholze-final
 
Wehc - Linked Data for Economic-Social historians
Wehc - Linked Data for Economic-Social historiansWehc - Linked Data for Economic-Social historians
Wehc - Linked Data for Economic-Social historians
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning Issue
 
Cosi Usage Data
Cosi   Usage DataCosi   Usage Data
Cosi Usage Data
 
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterElephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
 
MPhil Lecture on Data Vis for Analysis
MPhil Lecture on Data Vis for AnalysisMPhil Lecture on Data Vis for Analysis
MPhil Lecture on Data Vis for Analysis
 
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
 

Similar to JCDL 2015 Tutorial Opening Slides

HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14Robert H. McDonald
 
The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data FrameworkThe HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data FrameworkRobert H. McDonald
 
The HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and DemoThe HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and DemoRobert H. McDonald
 
Building a Public Research Center for the HathiTrust Digital Library
Building a Public Research Center for the HathiTrust Digital LibraryBuilding a Public Research Center for the HathiTrust Digital Library
Building a Public Research Center for the HathiTrust Digital LibraryRobert H. McDonald
 
Curation Service Models - Michael Witt - RDAP12
Curation Service Models - Michael Witt - RDAP12Curation Service Models - Michael Witt - RDAP12
Curation Service Models - Michael Witt - RDAP12ASIS&T
 
The HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational ServicesThe HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational ServicesRobert H. McDonald
 
RDAP 15: Research Data Integration in the Purdue Libraries
RDAP 15: Research Data Integration in the Purdue LibrariesRDAP 15: Research Data Integration in the Purdue Libraries
RDAP 15: Research Data Integration in the Purdue LibrariesASIS&T
 
Use of ICT in educational research
Use of ICT in educational researchUse of ICT in educational research
Use of ICT in educational researchRamakanta Mohalik
 
Research into Practice case study 2: Library linked data implementations an...
	Research into Practice case study 2:  Library linked data implementations an...	Research into Practice case study 2:  Library linked data implementations an...
Research into Practice case study 2: Library linked data implementations an...Hazel Hall
 
Workshop 4: Open Science & Open Data for Librarians/Ina Smith
Workshop 4: Open Science & Open Data for Librarians/Ina SmithWorkshop 4: Open Science & Open Data for Librarians/Ina Smith
Workshop 4: Open Science & Open Data for Librarians/Ina SmithAfrican Open Science Platform
 
IFLA ARL Webinar Series: Research Ethics in an Open Research Environment
IFLA ARL Webinar Series: Research Ethics in an Open Research EnvironmentIFLA ARL Webinar Series: Research Ethics in an Open Research Environment
IFLA ARL Webinar Series: Research Ethics in an Open Research EnvironmentIFLAAcademicandResea
 
Data Science: History repeated? – The heritage of the Free and Open Source GI...
Data Science: History repeated? – The heritage of the Free and Open Source GI...Data Science: History repeated? – The heritage of the Free and Open Source GI...
Data Science: History repeated? – The heritage of the Free and Open Source GI...Peter Löwe
 
Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture
Why Data Science Matters - 2014 WDS Data Stewardship Award LectureWhy Data Science Matters - 2014 WDS Data Stewardship Award Lecture
Why Data Science Matters - 2014 WDS Data Stewardship Award LectureXiaogang (Marshall) Ma
 
Introduction to UC San Diego’s Integrated Digital Infrastructure
Introduction to UC San Diego’s Integrated Digital InfrastructureIntroduction to UC San Diego’s Integrated Digital Infrastructure
Introduction to UC San Diego’s Integrated Digital InfrastructureLarry Smarr
 
Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013University of Washington
 
Chaos&Order: Using visualization as a means to
 explore large heritage collec...
Chaos&Order: Using visualization as a means to
 explore large heritage collec...Chaos&Order: Using visualization as a means to
 explore large heritage collec...
Chaos&Order: Using visualization as a means to
 explore large heritage collec...TimelessFuture
 
OpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social SciencesOpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social Sciencesopenminted_eu
 
Data Strategy and Services at the British Library: Data, Software and PIDs
Data Strategy and Services at the British Library: Data, Software and PIDsData Strategy and Services at the British Library: Data, Software and PIDs
Data Strategy and Services at the British Library: Data, Software and PIDsSarah Anna Stewart
 

Similar to JCDL 2015 Tutorial Opening Slides (20)

HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14
 
The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data FrameworkThe HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
 
The HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and DemoThe HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and Demo
 
Building a Public Research Center for the HathiTrust Digital Library
Building a Public Research Center for the HathiTrust Digital LibraryBuilding a Public Research Center for the HathiTrust Digital Library
Building a Public Research Center for the HathiTrust Digital Library
 
Curation Service Models - Michael Witt - RDAP12
Curation Service Models - Michael Witt - RDAP12Curation Service Models - Michael Witt - RDAP12
Curation Service Models - Michael Witt - RDAP12
 
The HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational ServicesThe HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational Services
 
RDAP 15: Research Data Integration in the Purdue Libraries
RDAP 15: Research Data Integration in the Purdue LibrariesRDAP 15: Research Data Integration in the Purdue Libraries
RDAP 15: Research Data Integration in the Purdue Libraries
 
Use of ICT in educational research
Use of ICT in educational researchUse of ICT in educational research
Use of ICT in educational research
 
Research into Practice case study 2: Library linked data implementations an...
	Research into Practice case study 2:  Library linked data implementations an...	Research into Practice case study 2:  Library linked data implementations an...
Research into Practice case study 2: Library linked data implementations an...
 
Workshop 4: Open Science & Open Data for Librarians/Ina Smith
Workshop 4: Open Science & Open Data for Librarians/Ina SmithWorkshop 4: Open Science & Open Data for Librarians/Ina Smith
Workshop 4: Open Science & Open Data for Librarians/Ina Smith
 
IFLA ARL Webinar Series: Research Ethics in an Open Research Environment
IFLA ARL Webinar Series: Research Ethics in an Open Research EnvironmentIFLA ARL Webinar Series: Research Ethics in an Open Research Environment
IFLA ARL Webinar Series: Research Ethics in an Open Research Environment
 
Data Science: History repeated? – The heritage of the Free and Open Source GI...
Data Science: History repeated? – The heritage of the Free and Open Source GI...Data Science: History repeated? – The heritage of the Free and Open Source GI...
Data Science: History repeated? – The heritage of the Free and Open Source GI...
 
Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture
Why Data Science Matters - 2014 WDS Data Stewardship Award LectureWhy Data Science Matters - 2014 WDS Data Stewardship Award Lecture
Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture
 
Data Science and Urban Science @ UW
Data Science and Urban Science @ UWData Science and Urban Science @ UW
Data Science and Urban Science @ UW
 
Introduction to UC San Diego’s Integrated Digital Infrastructure
Introduction to UC San Diego’s Integrated Digital InfrastructureIntroduction to UC San Diego’s Integrated Digital Infrastructure
Introduction to UC San Diego’s Integrated Digital Infrastructure
 
Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013
 
Chaos&Order: Using visualization as a means to
 explore large heritage collec...
Chaos&Order: Using visualization as a means to
 explore large heritage collec...Chaos&Order: Using visualization as a means to
 explore large heritage collec...
Chaos&Order: Using visualization as a means to
 explore large heritage collec...
 
Referentie Architectuur Onderzoeksdata en Onderzoeksdata diensten catalogus
Referentie Architectuur Onderzoeksdata en Onderzoeksdata diensten catalogusReferentie Architectuur Onderzoeksdata en Onderzoeksdata diensten catalogus
Referentie Architectuur Onderzoeksdata en Onderzoeksdata diensten catalogus
 
OpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social SciencesOpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social Sciences
 
Data Strategy and Services at the British Library: Data, Software and PIDs
Data Strategy and Services at the British Library: Data, Software and PIDsData Strategy and Services at the British Library: Data, Software and PIDs
Data Strategy and Services at the British Library: Data, Software and PIDs
 

More from Robert H. McDonald

ER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations PanelER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations PanelRobert H. McDonald
 
TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15Robert H. McDonald
 
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Robert H. McDonald
 
ER&L 2015 Closing Keynote Slides
ER&L 2015 Closing Keynote SlidesER&L 2015 Closing Keynote Slides
ER&L 2015 Closing Keynote SlidesRobert H. McDonald
 
Owning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your PatronsOwning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your PatronsRobert H. McDonald
 
Kuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for LibrariesKuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for LibrariesRobert H. McDonald
 
Charleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to CloudCharleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to CloudRobert H. McDonald
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science Robert H. McDonald
 
New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...Robert H. McDonald
 
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...Robert H. McDonald
 
GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...Robert H. McDonald
 
HathiTrust Research Center: The Fast Version
HathiTrust Research Center: The Fast VersionHathiTrust Research Center: The Fast Version
HathiTrust Research Center: The Fast VersionRobert H. McDonald
 
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceBuilding a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceRobert H. McDonald
 
Panel Session: VIVO and the data culture of universities-VIVO@IU
Panel Session: VIVO and the data culture of universities-VIVO@IUPanel Session: VIVO and the data culture of universities-VIVO@IU
Panel Session: VIVO and the data culture of universities-VIVO@IURobert H. McDonald
 
THe HathiTrust Research Center: Digital Humanities at Scale
THe HathiTrust Research Center: Digital Humanities at ScaleTHe HathiTrust Research Center: Digital Humanities at Scale
THe HathiTrust Research Center: Digital Humanities at ScaleRobert H. McDonald
 
Repository Federation: Towards Data Interoperability
Repository Federation: Towards Data InteroperabilityRepository Federation: Towards Data Interoperability
Repository Federation: Towards Data InteroperabilityRobert H. McDonald
 
LLAMA SAAS Session on Telecommuting 6.25.12
 LLAMA SAAS Session on Telecommuting 6.25.12 LLAMA SAAS Session on Telecommuting 6.25.12
LLAMA SAAS Session on Telecommuting 6.25.12Robert H. McDonald
 

More from Robert H. McDonald (20)

ER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations PanelER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations Panel
 
TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15
 
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
 
ER&L 2015 Closing Keynote Slides
ER&L 2015 Closing Keynote SlidesER&L 2015 Closing Keynote Slides
ER&L 2015 Closing Keynote Slides
 
Owning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your PatronsOwning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your Patrons
 
Kuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for LibrariesKuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for Libraries
 
Charleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to CloudCharleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to Cloud
 
SCONUL Kuali OLE Briefing
SCONUL Kuali OLE BriefingSCONUL Kuali OLE Briefing
SCONUL Kuali OLE Briefing
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science
 
New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...
 
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
 
GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...
 
Kuali OLE @ LITA Forum 2012
Kuali OLE @ LITA Forum 2012Kuali OLE @ LITA Forum 2012
Kuali OLE @ LITA Forum 2012
 
HathiTrust Research Center: The Fast Version
HathiTrust Research Center: The Fast VersionHathiTrust Research Center: The Fast Version
HathiTrust Research Center: The Fast Version
 
HTRC Architecture Overview
HTRC Architecture OverviewHTRC Architecture Overview
HTRC Architecture Overview
 
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceBuilding a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability Science
 
Panel Session: VIVO and the data culture of universities-VIVO@IU
Panel Session: VIVO and the data culture of universities-VIVO@IUPanel Session: VIVO and the data culture of universities-VIVO@IU
Panel Session: VIVO and the data culture of universities-VIVO@IU
 
THe HathiTrust Research Center: Digital Humanities at Scale
THe HathiTrust Research Center: Digital Humanities at ScaleTHe HathiTrust Research Center: Digital Humanities at Scale
THe HathiTrust Research Center: Digital Humanities at Scale
 
Repository Federation: Towards Data Interoperability
Repository Federation: Towards Data InteroperabilityRepository Federation: Towards Data Interoperability
Repository Federation: Towards Data Interoperability
 
LLAMA SAAS Session on Telecommuting 6.25.12
 LLAMA SAAS Session on Telecommuting 6.25.12 LLAMA SAAS Session on Telecommuting 6.25.12
LLAMA SAAS Session on Telecommuting 6.25.12
 

Recently uploaded

Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Celine George
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6Vanessa Camilleri
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfChristalin Nelson
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17Celine George
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxAnupam32727
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 

Recently uploaded (20)

Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdf
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 

JCDL 2015 Tutorial Opening Slides

  • 1. Topic Exploration with the HTRC Data Capsule for Non-Consumptive Joint Conference on Digital Libraries 2015 | Knoxville, TN| 06.21.15 Robert H. McDonald | Jiaan Zeng - Data To Insight Center Jaimie Murdock – InPho Project Indiana University Tweet us - @HathiTrust #HTRC HATHI TRUST RESEARCH CENTER Tweet us - @InPhoproject
  • 2. #HTRC @HathiTrust Tutorial Agenda • 9:00-9:15 - An overview of the HTRC (Robert McDonald) • 9:15-9:30 - HTRC Data Capsule Intro (Jiaan Zeng) • 9:30-9:45 - Intro to Topic Models and the InPho Explorer (Jaimie Murdock) • 9:45-10:30 - Hands-On Parts 1&2 • 10:30-10:45 - Break • 10:45-11:30 - Hands-On Parts 3&4 • 11:30-11:45 – Advanced Notebooks (Jaimie Murdock) • 11:45-12:00 – HTRC Advanced Collaborative Support (Robert McDonald)
  • 3. HTRC@Events • HTRC UnCamp 2015 – March 30-31, 2015 Ann Arbor, MI • Stephen Downie Keynote at JCDL 2015 • Digital Humanities 2015 – June 29-July 3, 2015 Sydney Australia • (LSA)'s Biennial Linguistic Institute, July 13, 2015 Chicago, IL • HILT 2015 – July 28-29, 2015 Indianapolis, IN HATHI TRUST RESEARCH CENTER
  • 4. Many thanks … HTRC IU Team • Beth Plale (PI) • Robert H. McDonald • Miao Chen • Guangchen Ruan • Zong Peng • Milinda Pathirage • Samitha Liyanage • Jiaan Zeng • Zong Peng • Leena Unnikrishnan • Nicholae Cline HTRC UIUC Team • J. Stephen Downie (PI) • Beth Namachchivaya • Megan Senseney • Sayan Bhattacharyya • Loretta Auvil • Boris Capitanu • Harriet Green • Eleanor Dickson
  • 5. #HTRC @HathiTrust Outline • What is the HTRC? • Non-Consumptive Research Paradigm • Current Architecture • Future Architecture • Advanced Collaborative Support (RFP)
  • 6. #HTRC @HathiTrust HathiTrust Digital Library • HathiTrust is a partnership of 90+ academic & research institutions, offering a collection of millions of digitized titles. • http://hathitrust.org – IU is a founding member of the HathiTrust along with University of Michigan, University of California, and the University of Virginia
  • 7. #HTRC @HathiTrust HathiTrust Research Center Mission • Public research arm of HathiTrust • Goal: enable researchers world-wide to accomplish tera-scale text data-mining and analysis – Develop cutting-edge software tools for processing, analyzing text – Develop cyberinfrastructure to enable HPC access to the HathiTrust Digital Library • Established: July, 2011 • Collaborative center: Indiana University & University of Illinois
  • 8. #HTRC @HathiTrust HTRC Timeline • Phase I: development 01 Jul 2011 – 31 Mar 2013 – HTRC software and services release v1.0 https://github.com/htrc • Phase II: outreach, 01 Apr 2013 – 30 June 2014 – 2nd HTRC UnCamp Sep ’13 • Phase III: operations, 01 July 2014 – present (2014-2018)
  • 9. HTRC Current Users (ca 2014) Projected Use 2019 Digital Humanities (60) Education (60) Informatics (60) Observers (20) 194 existing user accounts Lots of user accounts; good starting point. Improve : • Increase amount of real work being accomplished as measured by usage on HTRC’s compute resources Quarry and Big Red II at IU • Develop educational uses • Develop informatics uses • Decrease number of observers to 10%  Project 200 users at any one time of which 90% are doing relevant education/scholarship 9
  • 10. HTRC Current Users (ca Now)
  • 11. #HTRC @HathiTrust Non-Consumptive Research Paradigm • No action or set of actions on part of users, either acting alone or in cooperation with other users over duration of one or multiple sessions can result in sufficient information gathered from collection of copyrighted works to reassemble pages from collection. • Definition disallows collusion between users, or accumulation of material over time. Differentiates human researcher from proxy which is not a user. Users are human beings.
  • 12. HTRC Complexity hiding interface All the complexity Tabular info Statistical plots Spatial plots Request
  • 14. HTRC Goals • Provide a persistent and sustainable structure to enable original and cutting edge research. – Leverage data storage and computational infrastructure at Indiana & Illinois – Stimulate community development of new functionality and tools – Use tools to enable discoveries that would not be possible without the HTRC • Enable scholars to fully utilize content of HathiTrust Library while preventing intellectual property misuse within U.S. copyright law. – Provision secure computational and data environment for scholars to perform research using HathiTrust Digital Library.
  • 16. HTRC Data Capsule HTRC Data Capsule@IU Team • Beth Plale (PI) • Jiaan Zeng • Guangchen Ruan HTRC Data Capsule@Michigan Team • Atul Prakash (PI) • Alexander Crowell Jiaan Zeng, Guangchen Ruan, Alexander Crowell, Atul Prakash, and Beth Plale. 2014. Cloud computing data capsules for non- consumptiveuse of texts. In Proceedings of the 5th ACM workshop on Scientific cloud computing (ScienceCloud '14). ACM, New York, NY, USA, 9-16. DOI=10.1145/2608029.2608031 http://doi.acm.org/10.1145/2608029.2608031 Special Thanks to • Samitha Liyanage • Milinda Pathirage • Zong Peng • Earlence Fernandes • Ajit Aluri @hathitrust
  • 17. HTRC Data Capsule Workflow
  • 19. #HTRC @HathiTrust HTRC Advanced Collaborative Support • ACS will be offered on a rolling basis over next four years 2014-18 • 1st RFP Call Deadline was Jan 8, 2015 5:00pm eastern – RFP - http://www.hathitrust.org/htrc/acs-rfp • For more info on the Advanced Collaborative Support please contact: htrc.acs.awards@gmail.com
  • 20. #HTRC @HathiTrust Scholarly Commons User Support Service • Develop training materials • Educational workshops • Tool and workset creation • Collaborate with librarians and DH centers at HT institutions • Assist researchers in HTRC text data mining research projects • Led out of University of Illinois Library; smaller group at IU • Resourced at 2.7 FTE. 20 Administra ve Support Senior Library Personnel (4 supervisors at .05 FTE) Senior Project Coordinator (.25 FTE) Execu ve Assistant (.5 FTE) Core Development Sr. So ware Architect (1.0 FTE) Research Programmer (.5 FTE) Library Research Programmer (.5 FTE) IU Systems Administrator (.25 FTE) User Interface Specialist (2 years at 1.0 FTE) Informa cs Developers (2 developers for 2 years at .15 FTE) Advanced Research CS PhD Students LIS PhD Students UI Systems Administrator (.5 FTE) Advanced Collabora ve Support (coordinated by M. Chen) Research Programmer (.5 FTE) Computa onal Research Liaison (.5 FTE) Asst Dir Outreach & Educa on (M. Chen) (1 year at .25 FTE) Scholarly Commons Dig Humani es Specialist (1.0 FTE) CLIR Postdoctoral Research Associate (2 years at 1.0 FTE) Digital Research Librarian support (.2 FTE) Scholars Commons Support (.5 FTE) LIS MS Students (.25 FTE) (.11 FTE) Key: Area Proposed for funding by HathTrust
  • 21. #HTRC @HathiTrust HTRC Future Work • Copyrighted content in progress • Advanced Collaborative Support – The award model – Award content is HTRC ACS staff time – Collaborate with scholars on addressing their research needs related to HTRC – E.g. prototyping, running text analysis – Advocate open source; encourage extending the work to a grant submission • Scholars Commons – Interaction with scholars to help using HTRC tools and services – An interface to interact with HTRC users via the channel of scholars commons – Series of workshops at IU and other places – Weekly consulting time – Every Wed 2:30 – 4:30pm, IU library, Scholars Commons 157R – Contact: Miao Chen, Nicholae Cline
  • 22. #HTRC @HathiTrust • For details http://www.hathitrust.org/htrc/faq • General contact info – J. Stephen Downie, Co-Director HTRC, jdownie@Illinois.edu – Beth Plale, Co-Director HTRC, plale@indiana.edu • Requests for capability, interest – Robert McDonald, rhmcdona@indiana.edu
  • 23. #HTRC @HathiTrust Important URLs • HTRC Portal – http://sharc.hathitrust.org • Data Capsule Tutorial – http://shoutkey.com/gin • VNC Installation Directions – http://shoutkey.com/peat

Editor's Notes

  1. HTRC hides complexity of analytics. In this sense, it is like Google search, which is a simple interface that hides complexity to search billions of pages. The kinds of things returned from HTRC interaction are spatial relationship of words (and their frequency obviously), statistical plots of information or tabular information.
  2. Shifting the complexity hiding interface to the right, we open up the cloud to see what’s inside. HTRC at it simplest has 1) algorithms – these are drawn from SEASR and from other analysis tool suites including Mahout and mapreduce, the 2) HT corpus (and subsets of the corpus that users either have personally as part of a workset, or are publically available, and 3) other data sets that are used. HTRC brokers the bringing together of these pieces so that computation can take place on a resource like Big Red II (or XSEDE). Note that there is an arrow from the compute engine to the complexity hiding interface. This is because researcher interaction with the texts isn’t an automated workflow; it is one requiring levels of interaction with the computation as it is running.
  3. Jiaan Zeng, Guangchen Ruan, Alexander Crowell, Atul Prakash, and Beth Plale. 2014. Cloud computing data capsules for non-consumptiveuse of texts. In Proceedings of the 5th ACM workshop on Scientific cloud computing (ScienceCloud '14). ACM, New York, NY, USA, 9-16. DOI=10.1145/2608029.2608031 http://doi.acm.org/10.1145/2608029.2608031
  4. The Scholarly Commons User Support service gives HT institutions exclusive access to training and learning materials that help them establish programs that integrate HTRC tools and services into their scholarly commons programs in libraries and digital humanities centers. The SC will be physically located on the University of Illinois Library’s Scholarly commons. Several Library staff and faculty will support this service. Key among these is the Digital Humanities Research Specialist who will assist with the development of training and outreach initiatives in support of researchers working with the Hathi Trust Research Center and HathiTrust digital library affiliates who seek to start their own HTRC research services. This will involve planning, implementation and continuous development of training materials, educational workshops, and potential tools, and outreach activities in support of the usage of HTRC tools and datasets. The HTRC Digital H. Specialist will focus on development of HTRC research services at HathiTrust member institutions, and will collaborate with public services and data services librarians at HathiTrust member institutions on developing support services for digital humanities research with HTRC corpus. The specialist will work closely with the English and Digital Humanities Librarian at the University of Illinois Library to develop research data services for the humanities, with particular emphasis on the HTRC corpus and tools. Additional professionals are focused on related aspects of HTRC work, including a CLIR Postdoc researching user requirements for HTRC tools, a Technical Specialist and other technical support. These professionals contribute to the work of the Scholarly Commons and to the HT community in helping to articulate the relationship between new technologies and humanities scholarship to the community of humanists; and in advising teaching faculty on the usage of digitized textual corpora and providing technical support for use of analytical tools. The scope and responsibilities will evolve in accordance with priorities established by the Library and HathiTrust community.     The specialist will spend up to 20 percent of their time on the support of research work with the HTRC. Examples of currently supported digital humanities projects involving the HTRC corpus include: A text mining project of eighteenth-century novels for changes in dialect; A textual analysis of nineteenth-century women's serial novels for thematic patterns; A comparative literature textual analysis project; Topic modeling of twentieth-century texts for depictions of African-American women.