Select e-Infrastructure Components for Sustainability

A Method to Select
e-Infrastructure Components
to Sustain
Daniel S. Katz
dkatz@nsf.gov & d.katz@ieee.org
Program Director, Division of
Advanced Cyberinfrastructure,
National Science Foundation
&
David Proctor
djproctor@gmail.com
International Consortium of
Research Staff Associations

Reusable Infrastructure
• Systems and components created by one or
more people and intended to be used by others
• Over the last century, research infrastructure
(from microscopes to telescopes, and from
sequencers to colliders) has become essential
for many types of research
• Over the past few decades, most interfaces to
research infrastructure, and often the
infrastructure itself, have become digital

e-Infrastructure Defined
• e-Infrastructure, or cyberinfrastructure has
been defined by Craig Stewart as consisting of
“computing systems, data storage systems,
advanced instruments and data repositories,
visualization environments, and people, all
linked together by software and high
performance networks to improve research
productivity and enable breakthroughs not
otherwise possible”
• Can discuss both e-Infrastructure itself and
e-Infrastructure elements

e-Infrastructure Element Contexts
• Technical
– Architecture: How does it fit into the overall
infrastructure? How does it interact with other
infrastructure elements?
• Social
– Developers: Who has developed the element?
– Users: Who uses the element?
– Purpose: What is the intended use of the element?
• Political
– Funders: Who funds the development and
maintenance?
– Scope: Is the element national? International?
– Impact: How is the element valued by researchers
and funders

e-Infrastructure Element
• What resources are needed to create it,
and how can those resources be
assembled and applied?
• What resources are needed to sustain it,

e-Infrastructure Element
• What resources are needed to create it,
• What resources are needed to sustain it,
• Focus on 2nd half of questions here, since
amount and type of needed resources
vary widely by element

Infrastructure Challenges (1)
• Science
– Larger teams, more disciplines, more countries
• Data
– Size, complexity, rates all increasing rapidly
– Need for interoperability (systems and policies)
• Systems
– More cores, more architectures (GPUs), more memory
hierarchy
– Changing balances (latency versus bandwidth)
– Changing limits (power, funds)
– System architecture and business models changing
(clouds)
– Network capacity growing; increased networks ->
increased security

Infrastructure Challenges (2)
• Software
– Multiphysics algorithms, frameworks
– Programing models and abstractions for science,
data, and hardware
– Validation and verification, reproducibility
– Resilience & fault tolerance
• People
– Education and training
– Career paths
– Credit and attribution

e-Infrastructure Space
Temporal DurationSpatialExtent

Temporal Duration
• Decisions that are made
about creating and
sustaining infrastructure
elements need to include
awareness of the expected
lifetime of the element.
• Rough estimates:
– Computer systems: 5 years
– Networks & instruments: 10
years
– Software: 20 years
– People: 40 years
– Data (& publications):
forever (or until they are no
longer useful, whichever
comes first)

Spatial Extent
• In academia, could range:
– Lab
– Department
– College or school
– University
– University system or
regional alliance
– Nation
– Beyond
• General research institutions,
e.g. national labs, might have
similar type units with
different names, e.g. ‘Division’
rather than ‘Department’

Purpose
• Ranges from:
– Used for one problem
• Maybe not infrastructure
– Used for variety of problems
in a discipline (e.g., climate
data from Arctic ice cores)
– Used for variety of problems
across many disciplines
(e.g., molecular dynamics
software)
– Used across all disciplines
(e.g., network, HPC system)
• Linked to temporal duration
– Lifetime of software element
may be 20 years, if the
element isn’t useful, its
lifetime will be shortened

Scale
• Number of users (and
cost) should be larger
the farther the
element is from the
origin in any direction
• Generically called
‘scale’
• Scale is a metric of
the space, though not
orthogonal to any of
the three axes

Defining Sustainability
• [Environment Change]: Response
• [Dependent Infrastructure] When infrastructure on which the
element relies change, will element continue to provide the
same functionality?
• [Collaborative Infrastructure] As both collaborative elements
and the element changes, can the element still be
combined with other elements to meet user needs?
• [New Users] Are functionality and usability of the element
clearly explained to new users? Do users have a way to ask
questions and learn about the element?
• [Existing Users] Does the element provide the functionality
that current users want? As future needs develop, is the
element modular and adaptable so that it can meet them?
• [Science] As new science and theory develop, will the
element incorporate and implement them?

WSSPE2 Sustainability Discussion
Enablers
********** healthy and vibrant communities; vibrant community to champion
software
********** designing for growth and extension - open development
******* culture in community for reuse
**** portability
**** culture in developer community to support transition between
developers
*** interdisciplinary people: science + IT experience
** planning for end of life
** make smart choices about dependencies
* thinking of software as product lines - long term vs short term view
not all communities need new software
converting use into resources

WSSPE2 Sustainability Discussion
Barriers
******* lack of incentives, including promotion and tenure process; promotion
and tenure process in academic is incompatible with sustainability
***** absent or poor documentation
***** funding to ensure sustainability is difficult to obtain
**** developers are not computer scientists; don't have software
engineering practices (in particular, those needed to scale-up projects
to support and be developed by a large sustainable community)
*** overreliance on one or two people - bus test
** rate of change of underlying technologies
** lack of business models for sustainability
* lack of training for how to build sustainability into the system
maintenance needed for software is not visible, appears to ``just
happen’’
licensing issues
staff turnover - lack of continuity

Sustainability Models
• Open source
• Closed partnership
• For profit
• Dual licensing
• Open source and paid support
• Foundation or government

Governance
• Governance tells the community how the project makes
decisions and how they can be involved.
• Community = users and/or developers and/or advocates
...
• Important particularly when open source is involved, to a
lesser extent in the other models
• Examples of open source governance models
– Benevolent dictatorship, as in Linux kernel
– Meritocracy, as in Apache Foundation projects
• Can be considered top-down and bottom-up governance,
respectively
• Note: orthogonal to top-down (cathedral) and bottom-up
(bazaar) development

Which models work where?
• Research is needed to understand and correlate
success or failure of models with different
portions of infrastructure element space
• Some sample questions:
– Do the cathedral or bazaar governance model
correlate with successful projects along any or all
axes?
• For example, perhaps one works better at small scale,
and the other works better at large scale
– Do particular resource assembly and application
models cluster along any or all axes?
• Temporal duration: government funding at large values
of; mix of models at middle values; closed partnerships
at small values?

Conclusions
• Aim: encourage thought about how
e-Infrastructure elements and e-Infrastructure
itself should be considered
• In terms of how different elements may have
commonalities and differences across types of
element, user communities, etc.
• Want to begin a discussion about these issues
• Potential questions:
– Are the axes meaningful?
– What factors correlate with them?
– Are there any clusters?
• We are eager to receive feedback

Reference
Katz, D.S. and Proctor, D.
“A Framework for Discussing e-
Research Infrastructure Sustainability”
Journal of Open Research Software
2(1):e13, 2014.
DOI: 10.5334/jors.av

Select e-Infrastructure Components for Sustainability

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Select e-Infrastructure Components for Sustainability

Similar to Select e-Infrastructure Components for Sustainability (20)

More from Daniel S. Katz

More from Daniel S. Katz (20)

Recently uploaded

Recently uploaded (20)

Select e-Infrastructure Components for Sustainability

Editor's Notes