Pablo Gomez - Solving Large-scale Challenges with ESA Datalabs

1
ESA UNCLASSIFIED - For ESA Official Use Only
Solving Large-scale Data Challenges with ESA
Datalabs
Pablo Gómez
Data Science Section, SCI-SAS
24/11/2023
ESA ESAC

2
2
• Part of the Data Science and Archives Division
• Focused on science data exploitation
• Works with different missions & interdisciplinary
Background – Data Science Section

3
3
• Part of the Data Science and Archives Division
• Focused on science data exploitation
• Works with different missions & interdisciplinary
Background – Data Science Section

4
4
“Big” Data – Where we are and what is coming
Euclid First Images

5
5
Gaia Data Release 3

6
6
Importance of archival data – Hubble Space Telescope
HST publications by type
https://archive.stsci.edu/hst/bibliography/pubstat.html
de Marchi & Merín, presented at EAS 2023
Not assigned
Partly Archival
Archival
General
Observer

7
7
ESAC Science Data Center

8
8
ESA Datalabs – datalabs.esa.int
8
in beta mode

9
9
ESA Datalabs main functionality
System/Core
Discovery
Pipelines

11
11
Example: JWST Data Analysis Tools Notebooks

12
12
A Platform Designed to Boost Science Collaboration

13
13
Web-Based & Desktop-Based Datalabs

14
14
A Platform Designed to Boost Research Productivity
14
SaaS
PaaS
IaaS
System Development
IT Development
Science Development
You can start HERE!

15
15
A Platform Designed to Boost Access to Data
SCI
… …
ESA

16
16
Leveraging on ESA’s Digital Ecosystem of Platforms
datalabs.esa.int gssc.esa.int

18
18
Data Discovery Portal / Volume Catalogue

19
19
Computing & Data Colocation – Data Volume Catalog

20
20
Datalab & Volume Integration

23
23
Pipelines: Integrated Development Environment

24
24
Pipelines: Integrated Development Environment
Common Workflow Language - CWL

25
25
Upcoming in 0.10.0 – Datalabs Marketplace (like App Store)

26
26
Recent Events
• Euclid Consortium meeting June 2023
• 200+ new users
• Stress test
• Lots of feedback
• Focus on user experience
• With ESA missions
• Experimental onboarding of external projects
ideas for new use-cases; UI improvements

27
27
JWST @ ESA Datalabs: baseline JWST area
JWST area @ ESA Datalabs
• JWST calibration pipeline
• Astroquery (inc. ESA JWST module)
• pyESASky
• JDAVIZ
• astropy
• matplotlib
• ….
Access to JWST NFS volume:
• JWST calibration files
• Example notebooks for eJWST
• Example notebooks from STSCI

28
28
The ESA Space Science Exploitation Platform
• SCI Data available for researches to work on it, made easy
• Reusable for fast implementation of Scientific Processing Pipelines
• Reusable for fast implementation of Scientific Analysis and Visualisation Tools
High-level messages
Increase Space Science Operations Efficiency
Enable Collaboration and Open Science
• Share complex processing tools and data with your team
• Share your contributions with the community in SCI‘s AppStore

29
29
Catalogue of interacting galaxies in HST archives
One example use case of ESA: Datalabs
Harnessing the Hubble Space Telescope Archives: A
Catalogue of 21,926 Interacting Galaxies
O’Ryan et al. 2023, arXiv:2303.00366
➢ Access to data directly (open large
FITS file is a few seconds, 100k
cutouts created on the order of
minutes)
➢ 92 million cutouts produced (2.5 TB)
➢ Using fine-tuned Zoobot on a sample
of mergers from CANDELS &
COSMOS
➢ Predict interacting galaxies in HST
archives: 21,926 interacting galaxies
found with high confidence (p>0.95)
➢ Other gems: strong lenses, proto-
planetary disks

30
30
ESA Datalabs for Euclid pilot studies
Detecting Solar System Object Preserving Low-Surface Brightness
Detecting Transients Cosmology Likelihood for Observables in Euclid

32
Perspective – A typical ML project
1. Setup
Tools &
Frameworks
Local folders etc.
Getting the data

33
1. Setup
Tools &
Frameworks
Local folders etc.
Getting the data

34
1 - Setup
Tools &
Frameworks
Local folders etc.
Getting the data
2 - Data Prep
I/O
Data Cleaning
Data Labeling
Gaia Data Release 3
Bing

35
1 - Setup
Tools &
Frameworks
Local folders etc.
Getting the data
2 - Data Prep
I/O
Data Cleaning
Data Labeling
3 - Models
Training
Inference
Clustering
…

36
1 - Setup
Tools &
Frameworks
Local folders etc.
Getting the data
2 - Data Prep
I/O
Data Cleaning
Data Labeling
3 - Models
Training
Inference
Clustering
…

37
Perspective – What we can build

38
Datalabs – Quo vadis?
Anomaly Detection
Finding interesting things
Dealing with the flood
Etseneth et al. 2023

39
Anomaly Detection
Learning with Few Labels
Get a few
labels
Train a semi-
supervised
model
Different Downstream Tasks
• Roughly sort unlabeled data
• Find other instances
• Incremental improvements

40
Anomaly Detection
Learning with Few Labels
Get a few
labels
Train a semi-
supervised
model
Different Downstream Tasks
• Roughly sort unlabeled data
• Find other instances
• Incremental improvements
Standardized ML Data Preprocessing

Pablo Gomez - Solving Large-scale Challenges with ESA Datalabs

Recommended

Recommended

More Related Content

Similar to Pablo Gomez - Solving Large-scale Challenges with ESA Datalabs

Similar to Pablo Gomez - Solving Large-scale Challenges with ESA Datalabs (20)

More from Advanced-Concepts-Team

More from Advanced-Concepts-Team (20)

Recently uploaded

Recently uploaded (20)

Pablo Gomez - Solving Large-scale Challenges with ESA Datalabs