4.16.24 21st Century Movements for Black Lives.pptx
JCDL 2015 Tutorial Opening Slides
1. Topic Exploration with the
HTRC Data Capsule for Non-Consumptive
Joint Conference on Digital Libraries 2015 | Knoxville, TN| 06.21.15
Robert H. McDonald | Jiaan Zeng - Data To Insight Center
Jaimie Murdock – InPho Project
Indiana University
Tweet us - @HathiTrust #HTRC
HATHI TRUST RESEARCH CENTER
Tweet us - @InPhoproject
2. #HTRC @HathiTrust
Tutorial Agenda
• 9:00-9:15 - An overview of the HTRC (Robert
McDonald)
• 9:15-9:30 - HTRC Data Capsule Intro (Jiaan Zeng)
• 9:30-9:45 - Intro to Topic Models and the InPho
Explorer (Jaimie Murdock)
• 9:45-10:30 - Hands-On Parts 1&2
• 10:30-10:45 - Break
• 10:45-11:30 - Hands-On Parts 3&4
• 11:30-11:45 – Advanced Notebooks (Jaimie Murdock)
• 11:45-12:00 – HTRC Advanced Collaborative Support
(Robert McDonald)
3. HTRC@Events
• HTRC UnCamp 2015 – March
30-31, 2015 Ann Arbor, MI
• Stephen Downie Keynote at
JCDL 2015
• Digital Humanities 2015 – June
29-July 3, 2015 Sydney Australia
• (LSA)'s Biennial Linguistic
Institute, July 13, 2015 Chicago,
IL
• HILT 2015 – July 28-29, 2015
Indianapolis, IN
HATHI TRUST RESEARCH CENTER
4. Many thanks …
HTRC IU Team
• Beth Plale (PI)
• Robert H. McDonald
• Miao Chen
• Guangchen Ruan
• Zong Peng
• Milinda Pathirage
• Samitha Liyanage
• Jiaan Zeng
• Zong Peng
• Leena Unnikrishnan
• Nicholae Cline
HTRC UIUC Team
• J. Stephen Downie (PI)
• Beth Namachchivaya
• Megan Senseney
• Sayan Bhattacharyya
• Loretta Auvil
• Boris Capitanu
• Harriet Green
• Eleanor Dickson
5. #HTRC @HathiTrust
Outline
• What is the HTRC?
• Non-Consumptive Research Paradigm
• Current Architecture
• Future Architecture
• Advanced Collaborative Support (RFP)
6. #HTRC @HathiTrust
HathiTrust Digital Library
• HathiTrust is a partnership of 90+
academic & research institutions,
offering a collection of millions
of digitized titles.
• http://hathitrust.org
– IU is a founding member of the
HathiTrust along with University
of Michigan, University of
California, and the University of
Virginia
7. #HTRC @HathiTrust
HathiTrust Research Center
Mission
• Public research arm of HathiTrust
• Goal: enable researchers world-wide to
accomplish tera-scale text data-mining and
analysis
– Develop cutting-edge software tools for processing,
analyzing text
– Develop cyberinfrastructure to enable HPC access to
the HathiTrust Digital Library
• Established: July, 2011
• Collaborative center: Indiana University &
University of Illinois
9. HTRC Current Users (ca 2014)
Projected Use 2019
Digital
Humanities
(60)
Education
(60)
Informatics
(60)
Observers
(20)
194 existing user accounts
Lots of user accounts; good
starting point.
Improve :
• Increase amount of real work
being accomplished as
measured by usage on HTRC’s
compute resources Quarry and
Big Red II at IU
• Develop educational uses
• Develop informatics uses
• Decrease number of observers
to 10%
Project 200 users at any one time
of which 90% are doing relevant
education/scholarship
9
11. #HTRC @HathiTrust
Non-Consumptive Research Paradigm
• No action or set of actions on part of users,
either acting alone or in cooperation with
other users over duration of one or multiple
sessions can result in sufficient information
gathered from collection of copyrighted works
to reassemble pages from collection.
• Definition disallows collusion between users,
or accumulation of material over time.
Differentiates human researcher from proxy
which is not a user. Users are human beings.
14. HTRC Goals
• Provide a persistent and sustainable structure to
enable original and cutting edge research.
– Leverage data storage and computational infrastructure at Indiana &
Illinois
– Stimulate community development of new functionality and tools
– Use tools to enable discoveries that would not be possible without
the HTRC
• Enable scholars to fully utilize content of
HathiTrust Library while preventing intellectual
property misuse within U.S. copyright law.
– Provision secure computational and data environment for scholars to
perform research using HathiTrust Digital Library.
16. HTRC Data Capsule
HTRC Data Capsule@IU Team
• Beth Plale (PI)
• Jiaan Zeng
• Guangchen Ruan
HTRC Data Capsule@Michigan Team
• Atul Prakash (PI)
• Alexander Crowell
Jiaan Zeng, Guangchen Ruan, Alexander Crowell, Atul Prakash, and
Beth Plale. 2014. Cloud computing data capsules for non-
consumptiveuse of texts. In Proceedings of the 5th ACM workshop
on Scientific cloud computing (ScienceCloud '14). ACM, New York,
NY, USA, 9-16. DOI=10.1145/2608029.2608031
http://doi.acm.org/10.1145/2608029.2608031
Special Thanks to
• Samitha Liyanage
• Milinda Pathirage
• Zong Peng
• Earlence Fernandes
• Ajit Aluri
@hathitrust
19. #HTRC @HathiTrust
HTRC Advanced Collaborative Support
• ACS will be offered on a rolling basis over next
four years 2014-18
• 1st RFP Call Deadline was Jan 8, 2015 5:00pm
eastern
– RFP - http://www.hathitrust.org/htrc/acs-rfp
• For more info on the Advanced Collaborative
Support please contact:
htrc.acs.awards@gmail.com
20. #HTRC @HathiTrust
Scholarly Commons
User Support Service
• Develop training materials
• Educational workshops
• Tool and workset creation
• Collaborate with librarians and DH
centers at HT institutions
• Assist researchers in HTRC text data
mining research projects
• Led out of University of Illinois
Library; smaller group at IU
• Resourced at 2.7 FTE.
20
Administra ve Support
Senior Library Personnel
(4 supervisors at .05 FTE)
Senior Project
Coordinator
(.25 FTE)
Execu ve Assistant
(.5 FTE)
Core Development
Sr. So ware Architect
(1.0 FTE)
Research Programmer
(.5 FTE)
Library Research
Programmer
(.5 FTE)
IU Systems
Administrator
(.25 FTE)
User Interface Specialist
(2 years at 1.0 FTE)
Informa cs Developers
(2 developers for 2 years
at .15 FTE)
Advanced Research
CS PhD Students
LIS PhD Students
UI Systems
Administrator
(.5 FTE)
Advanced Collabora ve
Support (coordinated by
M. Chen)
Research Programmer
(.5 FTE)
Computa onal Research
Liaison
(.5 FTE)
Asst Dir Outreach &
Educa on (M. Chen)
(1 year at .25 FTE)
Scholarly Commons
Dig Humani es Specialist
(1.0 FTE)
CLIR Postdoctoral
Research Associate
(2 years at 1.0 FTE)
Digital Research
Librarian support
(.2 FTE)
Scholars Commons
Support
(.5 FTE)
LIS MS Students
(.25 FTE) (.11 FTE)
Key:
Area
Proposed for funding by HathTrust
21. #HTRC @HathiTrust
HTRC Future Work
• Copyrighted content in progress
• Advanced Collaborative Support
– The award model
– Award content is HTRC ACS staff time
– Collaborate with scholars on addressing their research needs related
to HTRC
– E.g. prototyping, running text analysis
– Advocate open source; encourage extending the work to a grant
submission
• Scholars Commons
– Interaction with scholars to help using HTRC tools and services
– An interface to interact with HTRC users via the channel of scholars
commons
– Series of workshops at IU and other places
– Weekly consulting time
– Every Wed 2:30 – 4:30pm, IU library, Scholars Commons 157R
– Contact: Miao Chen, Nicholae Cline
22. #HTRC @HathiTrust
• For details http://www.hathitrust.org/htrc/faq
• General contact info
– J. Stephen Downie, Co-Director HTRC,
jdownie@Illinois.edu
– Beth Plale, Co-Director HTRC, plale@indiana.edu
• Requests for capability, interest
– Robert McDonald, rhmcdona@indiana.edu
HTRC hides complexity of analytics. In this sense, it is like Google search, which is a simple interface that hides complexity to search billions of pages. The kinds of things returned from HTRC interaction are spatial relationship of words (and their frequency obviously), statistical plots of information or tabular information.
Shifting the complexity hiding interface to the right, we open up the cloud to see what’s inside. HTRC at it simplest has 1) algorithms – these are drawn from SEASR and from other analysis tool suites including Mahout and mapreduce, the 2) HT corpus (and subsets of the corpus that users either have personally as part of a workset, or are publically available, and 3) other data sets that are used. HTRC brokers the bringing together of these pieces so that computation can take place on a resource like Big Red II (or XSEDE). Note that there is an arrow from the compute engine to the complexity hiding interface. This is because researcher interaction with the texts isn’t an automated workflow; it is one requiring levels of interaction with the computation as it is running.
Jiaan Zeng, Guangchen Ruan, Alexander Crowell, Atul Prakash, and Beth Plale. 2014. Cloud computing data capsules for non-consumptiveuse of texts. In Proceedings of the 5th ACM workshop on Scientific cloud computing (ScienceCloud '14). ACM, New York, NY, USA, 9-16. DOI=10.1145/2608029.2608031 http://doi.acm.org/10.1145/2608029.2608031
The Scholarly Commons User Support service gives HT institutions exclusive access to training and learning materials that help them establish programs that integrate HTRC tools and services into their scholarly commons programs in libraries and digital humanities centers.
The SC will be physically located on the University of Illinois Library’s Scholarly commons. Several Library staff and faculty will support this service. Key among these is the Digital Humanities Research Specialist who will assist with the development of training and outreach initiatives in support of researchers working with the Hathi Trust Research Center and HathiTrust digital library affiliates who seek to start their own HTRC research services. This will involve planning, implementation and continuous development of training materials, educational workshops, and potential tools, and outreach activities in support of the usage of HTRC tools and datasets. The HTRC Digital H. Specialist will focus on development of HTRC research services at HathiTrust member institutions, and will collaborate with public services and data services librarians at HathiTrust member institutions on developing support services for digital humanities research with HTRC corpus. The specialist will work closely with the English and Digital Humanities Librarian at the University of Illinois Library to develop research data services for the humanities, with particular emphasis on the HTRC corpus and tools.
Additional professionals are focused on related aspects of HTRC work, including a CLIR Postdoc researching user requirements for HTRC tools, a Technical Specialist and other technical support. These professionals contribute to the work of the Scholarly Commons and to the HT community in helping to articulate the relationship between new technologies and humanities scholarship to the community of humanists; and in advising teaching faculty on the usage of digitized textual corpora and providing technical support for use of analytical tools. The scope and responsibilities will evolve in accordance with priorities established by the Library and HathiTrust community.
The specialist will spend up to 20 percent of their time on the support of research work with the HTRC.
Examples of currently supported digital humanities projects involving the HTRC corpus include:
A text mining project of eighteenth-century novels for changes in dialect;
A textual analysis of nineteenth-century women's serial novels for thematic patterns;
A comparative literature textual analysis project;
Topic modeling of twentieth-century texts for depictions of African-American women.