We dive deeper into the Globus automation platform and describe how common instrument-based scenarios may be streamlined. We examine the various components of a Globus flow that takes data from the point of capture on an instrument through to distribution/publication of resulting products to collaborators.
This material was presented at the Research Computing and Data Management Workshop, hosted by Rensselaer Polytechnic Institute on February 27-28, 2024.
2. Instrument data management needs
Cryo EM
Lightsheet
Sequencer
ALS/APS
….
Local system
download
Remote analysis,
visualization
• Reliable, near-real time
data access
• Self-service access
control, management
• Grant data access to
collaborators
• Compute on data
across storage classes
• Do it all at SCALE
Local
policy
store
--/cohort045
--/cohort096
--/cohort127
3. What is needed for such automation…?
Data
capture
Image analysis
QA check
Threshold
Analysis
Visualize
Metadata
extraction
Publish to
search index
• Bridge across
different facility
resources
• Network as an
instrument
• Use variety of
resources
• Human input
• Credentials for
automation
Proceed or
discard
sample?
4. XPCS: X-ray Photon Correlation Spectroscopy
ALCF Data
Portal
Argonne Leadership
Computing Facility
APS
Publication
5
Lab Server 1
Acquisition
2
Imaging
1
Plot results
4
XPCS-Eigen
3
Science!
6
● Automate flows stage
data to ALCF for on-
demand analysis and
publication
● Metadata and plots
dynamically extracted,
and published into a
search catalog
● Scientists can select
datasets and initiate
flows to perform batch
analysis tasks
Suresh Narayanan, Nicholas Schwarz
Eagle Storage
5. Globus
Flows
End-to-end Automation: XPCS
Data capture
Data publication
Transfer
Transfer
IMM
Transfer
Move results
to repo
Compute
Run Corr
Compute
Plot results
Compute
Gather
metadata
Share
Set access
controls
Search
Ingest to
index
Transfer
Transfer
HDF5 files
7. XPCS: Integrating experiment and compute facility
8
Reprocessing of data
Experiment-time processing of data
Argonne: Ian Foster, Mike Papka,
Tom Uram, Christine Simpson, Bill
Allcock, Benoit Cote, Ryan Chard
APS: Suresh Narayanan, Miaoqi
Chu, Hannah Parraga, Nicholas
Schwarz, Laurent Chapon
UChicago: Rachana
Ananthakrishnan, Kyle Chard,
Nickolaus Saint, Ben Blaiszik
8. One-time configuration per beamline
APS
APS DM
import …
def …
APS Beamline
service account
Compute function
Automating for experiment-time processing
• Create Globus application credential
for the software at the instrument
facility
• Register the compute function(s)
needed for analysis
• Configure the flow such that service
account can run the flow
• Guest collection on Globus Connect
Personal (Windows machine), with
read permissions for service account
to read data
10. One-time configuration per beamline
ALCF
Automating for experiment-time processing
Authorized APS admins with
ALCF account allowed to
manage the endpoint, and
analysis code
• Create a local account at the
compute facility to allow
automated processing
• Install Globus Compute endpoint
in the local account, using the
Globus service account
• Set appropriate local account
policy to manage the compute
endpoint deployment
11. Beamline and
experiment ID
One-time configuration per beamline
Automated workflow during experiments
Data acquisition
ALCF
APS
ALCF
APS
APS DM
APS DM
import …
def …
APS Beamline
service account
Compute function
Automating for experiment-time processing
Authorized APS admins with
ALCF account allowed to
manage the endpoint, and
analysis code
12. Beamline and
experiment ID
One-time configuration per beamline
Automated workflow during experiments
env/
$> …
Beamline account
Data acquisition
Eagle Compute endpoint
ALCF
APS
ALCF
APS
Polaris
APS DM
APS DM
import …
def …
APS Beamline
service account
Compute function
Automating for experiment-time processing
Authorized APS admins with
ALCF account allowed to
manage the endpoint, and
analysis code
13. XPCS – Reprocessing of data
14
• Flow triggered by
the user via portal
• A separate
application
credential is used
to run the flow
• Data shared with
researcher(s) using
Globus
17. CityCOVID
• Integrated COVID-19 pandemic
monitoring, modeling, and analysis
capability
• CityCOVID is a city-scale agent-
based model
• Steps:
– Scrape daily Chicago reports
– Perform simulations at Argonne
Leadership Computing Facility
– Postprocess data at Lab Computing
Resource Center
Jonathan Ozik, Nick Collier, and
Charles Macal
19. Materials Data Facility
> 40 TB of data
> 320 published
authors
> 400 datasets
• Accept data from many
locations with flexible
interfaces
• Index dataset contents in
science-aware ways
• Dispatch data to the
community
• Using Automate to
simplify building
composable flows of
services
20. MDF Data Publication Automation
Ingest
Bulk
Ingest
Auth
Get
Credentials
Automate
Transfer
Transfer
Dataset
XTract
Extract
Metadata
Share
Set
permissions
Transfer
Move
metadata
Transfer
Transfer
Dataset
Transfers
Transfer
Dataset
Identifier
Mint DOI
Web form
Metadata
Notify
Notify
Curator
Web form
Curation
Notify
Notify
user
21. Support resources
• Globus documentation: docs.globus.org
• YouTube channel: youtube.com/GlobusOnline
• Helpdesk: support@globus.org
• Mailing Lists: globus.org/mailing-lists
• Customer engagement team (office hours)
• Professional services team (advisory, custom work)