SlideShare a Scribd company logo
1 of 38
© Hatfield Consultants. All Rights Reserved.
STAC, ZARR, COG, K8S and Data
Cubes: The brave new world of satellite
EO analytics in the cloud
Jason Suwala
Nov 2019
Version #
© Hatfield Consultants. All Rights Reserved. 1
Who Am I?
 UVic Engineering Grad
 Partner at Hatfield Consultants
 Director of Environmental Information
Systems
 Lots of different hats
 Digital development
 Knowledge management
 Environmental data management
 System of Systems
 CGDI
 European and Canadian Space Agency Projects
 “Bringing Science and People Together”
© Hatfield Consultants. All Rights Reserved. 2
Why are we here?
Nov 5, 2019: “Canada must become a leader in
using space data to improve our society”
– CSA President Sylvain Laporte
© Hatfield Consultants. All Rights Reserved. 3
Why are we here?
© Hatfield Consultants. All Rights Reserved. 4
NASA EOSDIS Data Growth
ESA EO Data Archive
Petabytes
0
10
20
30
40
50
60
70
80
90
100
110
2000 2003 2005 2007 2009 2011 2013 2015 2016 2018 2020 2022 2024 2026
Sentinel missions operated by ESA
Earth Explorer missions
Heritage missions
Third Party & Contributing Missions
Ref: European Space Agency, 2018
ESA’s Data Growth
© Hatfield Consultants. All Rights Reserved. 6
Timeseries Analysis over Large
Areas?
Total number of archived Landsat images acquired for Canada,
by year and sensor. (Wulder 2018)
© Hatfield Consultants. All Rights Reserved. 7
Digital Ecosystem to Monitor the Planet
› “Digital Twins”
› Towards real-time
acquisition and analysis
© Hatfield Consultants. All Rights Reserved. 8
Traditional Approaches Obsolete
› The traditional download approach is obsolete
© Hatfield Consultants. All Rights Reserved. 9
Innovation Solutions Canada
Working with the Public Health Agency of Canada to address this problem
© Hatfield Consultants. All Rights Reserved. 10
www.GEOAnalytics.ca
“Advancing Canadian Satellite Earth Observation Analytics”
© Hatfield Consultants. All Rights Reserved. 11
Brief Primer on Cloud Native
Geospatial
© Hatfield Consultants. All Rights Reserved. 12
Cloud Native Geospatial
› Simply moving a server to be hosted in
the cloud is not “cloud native”
› Cloud native:
› Horizontally scalable on commodity
hardware
› Always available
› Always current
› Virtualized resource sharing
+ Geospatial:
› Optimized file formats (COG/ZARR)
› Web-crawlable (STAC)
© Hatfield Consultants. All Rights Reserved. 13
Data goes together with compute
› Bring your algorithm to the data, not
the other way around
› Always co-locate your compute with
the data
› Above all else, minimize data
downloading
› Infrastructure options: HPC or Cloud
© Hatfield Consultants. All Rights Reserved. 14
File Formats
› “how you store your data can have an enormous effect
on performance.”
› Dr. Philip Austin, UBC
December Mosaic of the Bahamas, Image ©2017 Planet Labs,
Inc.
© Hatfield Consultants. All Rights Reserved. 15
Raster File Formats: COG
› COG = “Cloud Optimized GeoTiff”
› https://www.cogeo.org/
› “A Cloud Optimized GeoTIFF (COG) is a
regular GeoTIFF file, aimed at being
hosted on a HTTP file server, with an
internal organization that enables more
efficient workflows on the cloud. It does
this by leveraging the ability of clients
issuing ​HTTP GET range requests to ask
for just the parts of a file they need
instead of downloading the whole file.
› COG-aware software can stream just the
portion of data that it needs
› Supported by GDAL, RasterIO + Others
© Hatfield Consultants. All Rights Reserved. 16
COG versus GeoTiff
› Vincent Sarago
© Hatfield Consultants. All Rights Reserved. 17
COG versus GeoTiff
› storage size: 1.5 Gb vs 69 Mb
© Hatfield Consultants. All Rights Reserved. 18
COG versus JPEG2000
JPEG2000 COG
Size 25TB 50TB
Storage $575/month $1150/month
Data access $440 $20
Processing Time $76.81 $25.60
Cost $1091.81 $1195.60
› If you just care about storage cost JPEG2000 is your best option,
but if someone will have to pay to access/process the data, COG is
a better option
© Hatfield Consultants. All Rights Reserved. 19
Raster File Formats: NetCDF + HDF
Problems
› The most common multidimensional data format is NetCDF and
HDF
› Supercomputer simulations (like a large climate model) produce a
few petabytes of HDF files.
› Planned NASA satellite missions will produce hundreds of
petabytes a year of HDF files.
› the layout of HDF files makes them difficult to query efficiently on
cloud storage systems
› “slowdown is significant because the HDF library makes many small
4kB reads in order to gather the metadata necessary to pull out a chunk
of data. Each of those tiny reads made sense when the data was local,
but now that we’re sending out a web request each time. This means
that users can sit for minutes just to open a file.”
© Hatfield Consultants. All Rights Reserved. 20
NetCDF+HDF: store byte layout map?
› NASA proposes to use OPeNDAP Server to proxy NetCDF + HDF
files stored on S3
› The OPeNDAP server stores a map (“Byte layout map” in
illustration) of how the S3 bucket is organized, so it knows which
bytes to retrieve from the file stored in the S3 bucket based on what
the client’s application is requesting.
© Hatfield Consultants. All Rights Reserved. 21
Replace NetCDF with ZARR
› On tests run by CNES, Zarr is more than ten times faster for
reading data than NetCDF (link)
› makes large datasets easily accessible to distributed computing
› In Zarr datasets, the arrays are divided into chunks and
compressed.
› These individual chunks can be stored as files on a filesystem or as
objects in a cloud storage bucket.
› The metadata are stored in lightweight .json files.
› Zarr works well on both local filesystems and cloud-based object
stores.
› Existing NetCDF and HDF datasets can easily be converted to zarr
via xarray’s zarr functions.
› 12 June 2019: Zarr support is coming to the standard netCDF
library. (link)
© Hatfield Consultants. All Rights Reserved. 22
Operating Systems: Linux wins
› Auro: Windows costs ~ 2.75x more/hour than Linux
› GCE: Windows costs ~2x more/hour than Linux
› 2017: All Top500 ranked supercomputers run Linux
© Hatfield Consultants. All Rights Reserved. 23
Data Storage: Object Storage
› S3 = “Simple Storage Service”
› Not just on Amazon: Implemented by OpenStack Swift, MinIO,
Azure, Google Cloud, etc.
› “provides object storage through a web service interface”
› Organized using Buckets and keys
› Geographically replicated for redundancy
› Supported by GDAL, RasterIO, GeoServer
› On Linux S3 can be mounted as a user-mode file system (S3FS)
› Windows file-system access possible through rclone mount
› Auro: CAD$0.05/GB/month. AWS: USD$0.025/GB/month
(CAD$0.033)
© Hatfield Consultants. All Rights Reserved. 24
Data Storage: Object Storage
› GDAL support through network based virtual file systems
› /vsicurl/ (http/https/ftp files: random access)
› /vsicurl_streaming/ (http/https/ftp files: streaming)
› /vsis3/ (AWS S3 files: random reading)
› /vsis3_streaming/ (AWS S3 files: streaming)
› /vsigs/ (Google Cloud Storage files: random reading)
› /vsigs_streaming/ (Google Cloud Storage files: streaming)
› /vsiaz/ (Microsoft Azure Blob files: random reading)
› /vsiaz_streaming/ (Microsoft Azure Blob files: streaming)
› /vsioss/ (Alibaba Cloud OSS files: random reading)
› /vsioss_streaming/ (Alibaba Cloud OSS files: streaming)
› /vsiswift/ (OpenStack Swift Object Storage: random reading)
› /vsiswift_streaming/ (OpenStack Swift Object Storage: streaming)
› Steam drivers allow on-the-fly sequential reading without prior download of
the entire file
© Hatfield Consultants. All Rights Reserved. 25
MetaData + Searching
› OGC Existing Standards: CSW and OpenSearch
› Considerable work to implement and consume
› XML based, not JSON
› Not easily crawled by search engines
› Not RESTful
› Hard to consume
› Ideal for geospatial experts, but no one else
Source: Michael Smith’s/Harris Geospatial Dec 2018 presentation to the OGC - link
© Hatfield Consultants. All Rights Reserved. 26
MetaData + Searching: STAC
© Hatfield Consultants. All Rights Reserved. 27
MetaData + Searching: STAC
› STAC aims to define a simple universal API for geospatial data
discovery
› The core of STAC is very general and simple
› STAC appeals to non-geospatial specialists
› All metadata specific to a modality or domain is defined as an
extension. Current STAC extensions:
› Datacube
› EO
› Point cloud
› SAR
› DOI
› Working to align STAC with OGC’s “Web Feature Services version
3” (WFS v3) specification
› NASA is indexing all of its AWS data using STAC
© Hatfield Consultants. All Rights Reserved. 28
Kubernetes
› Kubernetes (K8s) is an open-source system for automating
deployment, scaling, and management of containerized
applications.
› Execution is done in parallel, on many worker nodes
› Can horizontally scale dynamically to use new compute nodes
based on metrics (such as CPU usage, HTTP requests, etc.)
© Hatfield Consultants. All Rights Reserved. 29
Kubernetes uses Docker Containers
© Hatfield Consultants. All Rights Reserved. 30
Kubernetes is a cluster manager
© Hatfield Consultants. All Rights Reserved. 31
Data Cubes
› A data cube is an “n-dimensional array”
› Latitude
› Longitude
› Time
› Data variables
› Requires Analysis Ready Data (ARD)
› Each pixel is stored as calibrated and corrected measurement
› Allows time-series analysis
© Hatfield Consultants. All Rights Reserved. 32
Data Cubes
› Non-trivial to create
and work-with
› Example
implementations:
› Xarray
› Open Data Cube
› Xcube
› Rasdaman
› Apache Spark +
GeoTrellis
© Hatfield Consultants. All Rights Reserved. 33
Conclusion
© Hatfield Consultants. All Rights Reserved. 34
Why are we here?
Nov 5, 2019: “Canada must become a leader in
using space data to improve our society”
– CSA President Sylvain Laporte
© Hatfield Consultants. All Rights Reserved. 35
Conclusion
› Bring your algorithm to the data, not the other way
around
› Let’s embrace change, together
› Ensure we don’t forget marginalized and data-poor
communities
› Canada was a leader in GIS, now we are a follower of
our peers: Europe, Australia and US
› Let’s talk about opportunities to work together to move
Canadian EO analytic capabilities forward in this new
era.
© Hatfield Consultants. All Rights Reserved. 36
www.GEOAnalytics.ca
© Hatfield Consultants. All Rights Reserved. 37
Thank You!
jsuwala@hatfieldgroup.com

More Related Content

What's hot

High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoHigh Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoAlluxio, Inc.
 
Azure AI platform - Automated ML workshop
Azure AI platform - Automated ML workshopAzure AI platform - Automated ML workshop
Azure AI platform - Automated ML workshopParashar Shah
 
MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks EDB
 
Understanding cloud with Google Cloud Platform
Understanding cloud with Google Cloud PlatformUnderstanding cloud with Google Cloud Platform
Understanding cloud with Google Cloud PlatformDr. Ketan Parmar
 
Data-centric design and the knowledge graph
Data-centric design and the knowledge graphData-centric design and the knowledge graph
Data-centric design and the knowledge graphAlan Morrison
 
Cloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech Talks
Cloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech TalksCloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech Talks
Cloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech TalksAmazon Web Services
 
Google Cloud Machine Learning
 Google Cloud Machine Learning  Google Cloud Machine Learning
Google Cloud Machine Learning India Quotient
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLMárton Kodok
 
Light-weighted HDFS disaster recovery
Light-weighted HDFS disaster recoveryLight-weighted HDFS disaster recovery
Light-weighted HDFS disaster recoveryDataWorks Summit
 
Regulating Generative AI - LLMOps pipelines with Transparency
Regulating Generative AI - LLMOps pipelines with TransparencyRegulating Generative AI - LLMOps pipelines with Transparency
Regulating Generative AI - LLMOps pipelines with TransparencyDebmalya Biswas
 
FIWARE Global Summit - NGSI-LD - NGSI with Linked Data
FIWARE Global Summit - NGSI-LD - NGSI with Linked DataFIWARE Global Summit - NGSI-LD - NGSI with Linked Data
FIWARE Global Summit - NGSI-LD - NGSI with Linked DataFIWARE
 
Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena InfluxData
 
Building a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSBuilding a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSAmazon Web Services
 
Snowball Edge  Bringing Disconnected Cloud Capabilities to the Edge
Snowball Edge  Bringing Disconnected Cloud Capabilities to the EdgeSnowball Edge  Bringing Disconnected Cloud Capabilities to the Edge
Snowball Edge  Bringing Disconnected Cloud Capabilities to the EdgeAmazon Web Services
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 
지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)
지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)
지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)Myungjin Lee
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOpsCarl W. Handlin
 
AWS EMR Cost optimization
AWS EMR Cost optimizationAWS EMR Cost optimization
AWS EMR Cost optimizationSANG WON PARK
 
VisiQuate: Azure cloud migration case study
VisiQuate: Azure cloud migration case studyVisiQuate: Azure cloud migration case study
VisiQuate: Azure cloud migration case studyLeonid Nekhymchuk
 

What's hot (20)

High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoHigh Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
 
Azure AI platform - Automated ML workshop
Azure AI platform - Automated ML workshopAzure AI platform - Automated ML workshop
Azure AI platform - Automated ML workshop
 
MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks
 
Understanding cloud with Google Cloud Platform
Understanding cloud with Google Cloud PlatformUnderstanding cloud with Google Cloud Platform
Understanding cloud with Google Cloud Platform
 
Data-centric design and the knowledge graph
Data-centric design and the knowledge graphData-centric design and the knowledge graph
Data-centric design and the knowledge graph
 
Cloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech Talks
Cloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech TalksCloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech Talks
Cloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech Talks
 
Google Cloud Machine Learning
 Google Cloud Machine Learning  Google Cloud Machine Learning
Google Cloud Machine Learning
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQL
 
Light-weighted HDFS disaster recovery
Light-weighted HDFS disaster recoveryLight-weighted HDFS disaster recovery
Light-weighted HDFS disaster recovery
 
Regulating Generative AI - LLMOps pipelines with Transparency
Regulating Generative AI - LLMOps pipelines with TransparencyRegulating Generative AI - LLMOps pipelines with Transparency
Regulating Generative AI - LLMOps pipelines with Transparency
 
FIWARE Global Summit - NGSI-LD - NGSI with Linked Data
FIWARE Global Summit - NGSI-LD - NGSI with Linked DataFIWARE Global Summit - NGSI-LD - NGSI with Linked Data
FIWARE Global Summit - NGSI-LD - NGSI with Linked Data
 
Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena
 
Building a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSBuilding a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWS
 
TCO
TCO TCO
TCO
 
Snowball Edge  Bringing Disconnected Cloud Capabilities to the Edge
Snowball Edge  Bringing Disconnected Cloud Capabilities to the EdgeSnowball Edge  Bringing Disconnected Cloud Capabilities to the Edge
Snowball Edge  Bringing Disconnected Cloud Capabilities to the Edge
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)
지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)
지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOps
 
AWS EMR Cost optimization
AWS EMR Cost optimizationAWS EMR Cost optimization
AWS EMR Cost optimization
 
VisiQuate: Azure cloud migration case study
VisiQuate: Azure cloud migration case studyVisiQuate: Azure cloud migration case study
VisiQuate: Azure cloud migration case study
 

Similar to STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO analytics in the cloud - GeoAlberta 2019

Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationRob Emanuele
 
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics
 
Introduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingIntroduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingSam Ng
 
Modernizing upstream workflows with aws storage - john mallory
Modernizing upstream workflows with aws storage -  john malloryModernizing upstream workflows with aws storage -  john mallory
Modernizing upstream workflows with aws storage - john malloryAmazon Web Services
 
Unlocking Open Data in the Cloud
Unlocking Open Data in the CloudUnlocking Open Data in the Cloud
Unlocking Open Data in the CloudAmazon Web Services
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataSafe Software
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataSafe Software
 
Processing Drone data @Scale
Processing Drone data @ScaleProcessing Drone data @Scale
Processing Drone data @ScaleDr Hajji Hicham
 
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...Amazon Web Services
 
Managing 100s of PetaBytes of data in Cloud
Managing 100s of PetaBytes of data in CloudManaging 100s of PetaBytes of data in Cloud
Managing 100s of PetaBytes of data in Cloudlohitvijayarenu
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big DataOmnia Safaan
 
Design Choices for Cloud Data Platforms
Design Choices for Cloud Data PlatformsDesign Choices for Cloud Data Platforms
Design Choices for Cloud Data PlatformsAshish Mrig
 
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...Larry Smarr
 
Demonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the CloudsDemonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the CloudsIgor Sfiligoi
 
WekaIO: Making Machine Learning Compute Bound Again
WekaIO: Making Machine Learning Compute Bound AgainWekaIO: Making Machine Learning Compute Bound Again
WekaIO: Making Machine Learning Compute Bound Againinside-BigData.com
 
Ben Evans SPEDDEXES 2014
Ben Evans SPEDDEXES 2014Ben Evans SPEDDEXES 2014
Ben Evans SPEDDEXES 2014aceas13tern
 
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak DataClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak DataAltinity Ltd
 
Scalable Storage for Massive Volume Data Systems
Scalable Storage for Massive Volume Data SystemsScalable Storage for Massive Volume Data Systems
Scalable Storage for Massive Volume Data SystemsLars Nielsen
 
Clouds, Grids and Data
Clouds, Grids and DataClouds, Grids and Data
Clouds, Grids and DataGuy Coates
 

Similar to STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO analytics in the cloud - GeoAlberta 2019 (20)

Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis Presentation
 
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks Presentation
 
Introduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingIntroduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data Processing
 
Modernizing upstream workflows with aws storage - john mallory
Modernizing upstream workflows with aws storage -  john malloryModernizing upstream workflows with aws storage -  john mallory
Modernizing upstream workflows with aws storage - john mallory
 
Unlocking Open Data in the Cloud
Unlocking Open Data in the CloudUnlocking Open Data in the Cloud
Unlocking Open Data in the Cloud
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
Processing Drone data @Scale
Processing Drone data @ScaleProcessing Drone data @Scale
Processing Drone data @Scale
 
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
 
Managing 100s of PetaBytes of data in Cloud
Managing 100s of PetaBytes of data in CloudManaging 100s of PetaBytes of data in Cloud
Managing 100s of PetaBytes of data in Cloud
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big Data
 
Design Choices for Cloud Data Platforms
Design Choices for Cloud Data PlatformsDesign Choices for Cloud Data Platforms
Design Choices for Cloud Data Platforms
 
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
 
Demonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the CloudsDemonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the Clouds
 
WekaIO: Making Machine Learning Compute Bound Again
WekaIO: Making Machine Learning Compute Bound AgainWekaIO: Making Machine Learning Compute Bound Again
WekaIO: Making Machine Learning Compute Bound Again
 
Ben Evans SPEDDEXES 2014
Ben Evans SPEDDEXES 2014Ben Evans SPEDDEXES 2014
Ben Evans SPEDDEXES 2014
 
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak DataClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
 
HDF Data in the Cloud
HDF Data in the CloudHDF Data in the Cloud
HDF Data in the Cloud
 
Scalable Storage for Massive Volume Data Systems
Scalable Storage for Massive Volume Data SystemsScalable Storage for Massive Volume Data Systems
Scalable Storage for Massive Volume Data Systems
 
Clouds, Grids and Data
Clouds, Grids and DataClouds, Grids and Data
Clouds, Grids and Data
 

Recently uploaded

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 

Recently uploaded (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 

STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO analytics in the cloud - GeoAlberta 2019

  • 1. © Hatfield Consultants. All Rights Reserved. STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO analytics in the cloud Jason Suwala Nov 2019 Version #
  • 2. © Hatfield Consultants. All Rights Reserved. 1 Who Am I?  UVic Engineering Grad  Partner at Hatfield Consultants  Director of Environmental Information Systems  Lots of different hats  Digital development  Knowledge management  Environmental data management  System of Systems  CGDI  European and Canadian Space Agency Projects  “Bringing Science and People Together”
  • 3. © Hatfield Consultants. All Rights Reserved. 2 Why are we here? Nov 5, 2019: “Canada must become a leader in using space data to improve our society” – CSA President Sylvain Laporte
  • 4. © Hatfield Consultants. All Rights Reserved. 3 Why are we here?
  • 5. © Hatfield Consultants. All Rights Reserved. 4 NASA EOSDIS Data Growth
  • 6. ESA EO Data Archive Petabytes 0 10 20 30 40 50 60 70 80 90 100 110 2000 2003 2005 2007 2009 2011 2013 2015 2016 2018 2020 2022 2024 2026 Sentinel missions operated by ESA Earth Explorer missions Heritage missions Third Party & Contributing Missions Ref: European Space Agency, 2018 ESA’s Data Growth
  • 7. © Hatfield Consultants. All Rights Reserved. 6 Timeseries Analysis over Large Areas? Total number of archived Landsat images acquired for Canada, by year and sensor. (Wulder 2018)
  • 8. © Hatfield Consultants. All Rights Reserved. 7 Digital Ecosystem to Monitor the Planet › “Digital Twins” › Towards real-time acquisition and analysis
  • 9. © Hatfield Consultants. All Rights Reserved. 8 Traditional Approaches Obsolete › The traditional download approach is obsolete
  • 10. © Hatfield Consultants. All Rights Reserved. 9 Innovation Solutions Canada Working with the Public Health Agency of Canada to address this problem
  • 11. © Hatfield Consultants. All Rights Reserved. 10 www.GEOAnalytics.ca “Advancing Canadian Satellite Earth Observation Analytics”
  • 12. © Hatfield Consultants. All Rights Reserved. 11 Brief Primer on Cloud Native Geospatial
  • 13. © Hatfield Consultants. All Rights Reserved. 12 Cloud Native Geospatial › Simply moving a server to be hosted in the cloud is not “cloud native” › Cloud native: › Horizontally scalable on commodity hardware › Always available › Always current › Virtualized resource sharing + Geospatial: › Optimized file formats (COG/ZARR) › Web-crawlable (STAC)
  • 14. © Hatfield Consultants. All Rights Reserved. 13 Data goes together with compute › Bring your algorithm to the data, not the other way around › Always co-locate your compute with the data › Above all else, minimize data downloading › Infrastructure options: HPC or Cloud
  • 15. © Hatfield Consultants. All Rights Reserved. 14 File Formats › “how you store your data can have an enormous effect on performance.” › Dr. Philip Austin, UBC December Mosaic of the Bahamas, Image ©2017 Planet Labs, Inc.
  • 16. © Hatfield Consultants. All Rights Reserved. 15 Raster File Formats: COG › COG = “Cloud Optimized GeoTiff” › https://www.cogeo.org/ › “A Cloud Optimized GeoTIFF (COG) is a regular GeoTIFF file, aimed at being hosted on a HTTP file server, with an internal organization that enables more efficient workflows on the cloud. It does this by leveraging the ability of clients issuing ​HTTP GET range requests to ask for just the parts of a file they need instead of downloading the whole file. › COG-aware software can stream just the portion of data that it needs › Supported by GDAL, RasterIO + Others
  • 17. © Hatfield Consultants. All Rights Reserved. 16 COG versus GeoTiff › Vincent Sarago
  • 18. © Hatfield Consultants. All Rights Reserved. 17 COG versus GeoTiff › storage size: 1.5 Gb vs 69 Mb
  • 19. © Hatfield Consultants. All Rights Reserved. 18 COG versus JPEG2000 JPEG2000 COG Size 25TB 50TB Storage $575/month $1150/month Data access $440 $20 Processing Time $76.81 $25.60 Cost $1091.81 $1195.60 › If you just care about storage cost JPEG2000 is your best option, but if someone will have to pay to access/process the data, COG is a better option
  • 20. © Hatfield Consultants. All Rights Reserved. 19 Raster File Formats: NetCDF + HDF Problems › The most common multidimensional data format is NetCDF and HDF › Supercomputer simulations (like a large climate model) produce a few petabytes of HDF files. › Planned NASA satellite missions will produce hundreds of petabytes a year of HDF files. › the layout of HDF files makes them difficult to query efficiently on cloud storage systems › “slowdown is significant because the HDF library makes many small 4kB reads in order to gather the metadata necessary to pull out a chunk of data. Each of those tiny reads made sense when the data was local, but now that we’re sending out a web request each time. This means that users can sit for minutes just to open a file.”
  • 21. © Hatfield Consultants. All Rights Reserved. 20 NetCDF+HDF: store byte layout map? › NASA proposes to use OPeNDAP Server to proxy NetCDF + HDF files stored on S3 › The OPeNDAP server stores a map (“Byte layout map” in illustration) of how the S3 bucket is organized, so it knows which bytes to retrieve from the file stored in the S3 bucket based on what the client’s application is requesting.
  • 22. © Hatfield Consultants. All Rights Reserved. 21 Replace NetCDF with ZARR › On tests run by CNES, Zarr is more than ten times faster for reading data than NetCDF (link) › makes large datasets easily accessible to distributed computing › In Zarr datasets, the arrays are divided into chunks and compressed. › These individual chunks can be stored as files on a filesystem or as objects in a cloud storage bucket. › The metadata are stored in lightweight .json files. › Zarr works well on both local filesystems and cloud-based object stores. › Existing NetCDF and HDF datasets can easily be converted to zarr via xarray’s zarr functions. › 12 June 2019: Zarr support is coming to the standard netCDF library. (link)
  • 23. © Hatfield Consultants. All Rights Reserved. 22 Operating Systems: Linux wins › Auro: Windows costs ~ 2.75x more/hour than Linux › GCE: Windows costs ~2x more/hour than Linux › 2017: All Top500 ranked supercomputers run Linux
  • 24. © Hatfield Consultants. All Rights Reserved. 23 Data Storage: Object Storage › S3 = “Simple Storage Service” › Not just on Amazon: Implemented by OpenStack Swift, MinIO, Azure, Google Cloud, etc. › “provides object storage through a web service interface” › Organized using Buckets and keys › Geographically replicated for redundancy › Supported by GDAL, RasterIO, GeoServer › On Linux S3 can be mounted as a user-mode file system (S3FS) › Windows file-system access possible through rclone mount › Auro: CAD$0.05/GB/month. AWS: USD$0.025/GB/month (CAD$0.033)
  • 25. © Hatfield Consultants. All Rights Reserved. 24 Data Storage: Object Storage › GDAL support through network based virtual file systems › /vsicurl/ (http/https/ftp files: random access) › /vsicurl_streaming/ (http/https/ftp files: streaming) › /vsis3/ (AWS S3 files: random reading) › /vsis3_streaming/ (AWS S3 files: streaming) › /vsigs/ (Google Cloud Storage files: random reading) › /vsigs_streaming/ (Google Cloud Storage files: streaming) › /vsiaz/ (Microsoft Azure Blob files: random reading) › /vsiaz_streaming/ (Microsoft Azure Blob files: streaming) › /vsioss/ (Alibaba Cloud OSS files: random reading) › /vsioss_streaming/ (Alibaba Cloud OSS files: streaming) › /vsiswift/ (OpenStack Swift Object Storage: random reading) › /vsiswift_streaming/ (OpenStack Swift Object Storage: streaming) › Steam drivers allow on-the-fly sequential reading without prior download of the entire file
  • 26. © Hatfield Consultants. All Rights Reserved. 25 MetaData + Searching › OGC Existing Standards: CSW and OpenSearch › Considerable work to implement and consume › XML based, not JSON › Not easily crawled by search engines › Not RESTful › Hard to consume › Ideal for geospatial experts, but no one else Source: Michael Smith’s/Harris Geospatial Dec 2018 presentation to the OGC - link
  • 27. © Hatfield Consultants. All Rights Reserved. 26 MetaData + Searching: STAC
  • 28. © Hatfield Consultants. All Rights Reserved. 27 MetaData + Searching: STAC › STAC aims to define a simple universal API for geospatial data discovery › The core of STAC is very general and simple › STAC appeals to non-geospatial specialists › All metadata specific to a modality or domain is defined as an extension. Current STAC extensions: › Datacube › EO › Point cloud › SAR › DOI › Working to align STAC with OGC’s “Web Feature Services version 3” (WFS v3) specification › NASA is indexing all of its AWS data using STAC
  • 29. © Hatfield Consultants. All Rights Reserved. 28 Kubernetes › Kubernetes (K8s) is an open-source system for automating deployment, scaling, and management of containerized applications. › Execution is done in parallel, on many worker nodes › Can horizontally scale dynamically to use new compute nodes based on metrics (such as CPU usage, HTTP requests, etc.)
  • 30. © Hatfield Consultants. All Rights Reserved. 29 Kubernetes uses Docker Containers
  • 31. © Hatfield Consultants. All Rights Reserved. 30 Kubernetes is a cluster manager
  • 32. © Hatfield Consultants. All Rights Reserved. 31 Data Cubes › A data cube is an “n-dimensional array” › Latitude › Longitude › Time › Data variables › Requires Analysis Ready Data (ARD) › Each pixel is stored as calibrated and corrected measurement › Allows time-series analysis
  • 33. © Hatfield Consultants. All Rights Reserved. 32 Data Cubes › Non-trivial to create and work-with › Example implementations: › Xarray › Open Data Cube › Xcube › Rasdaman › Apache Spark + GeoTrellis
  • 34. © Hatfield Consultants. All Rights Reserved. 33 Conclusion
  • 35. © Hatfield Consultants. All Rights Reserved. 34 Why are we here? Nov 5, 2019: “Canada must become a leader in using space data to improve our society” – CSA President Sylvain Laporte
  • 36. © Hatfield Consultants. All Rights Reserved. 35 Conclusion › Bring your algorithm to the data, not the other way around › Let’s embrace change, together › Ensure we don’t forget marginalized and data-poor communities › Canada was a leader in GIS, now we are a follower of our peers: Europe, Australia and US › Let’s talk about opportunities to work together to move Canadian EO analytic capabilities forward in this new era.
  • 37. © Hatfield Consultants. All Rights Reserved. 36 www.GEOAnalytics.ca
  • 38. © Hatfield Consultants. All Rights Reserved. 37 Thank You! jsuwala@hatfieldgroup.com

Editor's Notes

  1. https://earthdata.nasa.gov/cmr-and-esdc-in-cloud https://earthdata.nasa.gov/eosdis/cloud-evolution
  2. Landsat-8:  22,500 Landsat-8 OLI images per year, or more than 60 per day over Canada With > 430 Landsat-7 per-day,  1200 Landsat 8/7 images over Canada/day https://medium.com/@mikewulder/landsat-data-record-for-canada-an-update-38b176f49a4f
  3. https://medium.com/pangeo/step-by-step-guide-to-building-a-big-data-portal-e262af1c2977 https://medium.com/planet-stories/cng-part-5-cloud-native-geospatial-architecture-defined-193d5ffdd681
  4. https://medium.com/pangeo/step-by-step-guide-to-building-a-big-data-portal-e262af1c2977
  5. Quote: https://clouds.eos.ubc.ca/~phil/courses/parallel_python/02_xarray_zarr.html#Some-challenges-with-netcdf Image: https://medium.com/planet-stories/cng-part-7-a-vision-for-the-cloud-native-geospatial-ecosystem-7a55ae782690
  6. https://medium.com/planet-stories/cloud-native-geospatial-part-2-the-cloud-optimized-geotiff-6b3f15c696ed https://www.eclipse.org/community/eclipse_newsletter/2018/december/geotrellis.php
  7. https://medium.com/@_VincentS_/do-you-really-want-people-using-your-data-ec94cd94dc3f
  8. https://medium.com/@_VincentS_/do-you-really-want-people-using-your-data-ec94cd94dc3f
  9. https://medium.com/@_VincentS_/do-you-really-want-people-using-your-data-ec94cd94dc3f
  10. http://matthewrocklin.com/blog/work/2018/02/06/hdf-in-the-cloud NASA statement: https://earthdata.nasa.gov/cmr-and-esdc-in-cloud
  11. https://earthdata.nasa.gov/eosdis-data-in-the-cloud-user-requirements
  12. https://pangeo.io/data.html
  13. https://medium.com/descarteslabs-team/thunder-from-the-cloud-40-000-cores-running-in-concert-on-aws-bf1610679978
  14. Rclone mount: https://rclone.org/commands/rclone_mount/
  15. https://gdal.org/user/virtual_file_systems.html#network-based-file-systems
  16. https://drive.google.com/file/d/1Xf6Ix6pnMVpUAFh-bVw0EmkHls_WPUS9/view
  17. https://drive.google.com/file/d/1Xf6Ix6pnMVpUAFh-bVw0EmkHls_WPUS9/view
  18. https://drive.google.com/file/d/1Xf6Ix6pnMVpUAFh-bVw0EmkHls_WPUS9/view
  19. https://drive.google.com/file/d/1Xf6Ix6pnMVpUAFh-bVw0EmkHls_WPUS9/view
  20. https://towardsdatascience.com/why-you-should-care-about-docker-9622725a5cb8
  21. https://towardsdatascience.com/machine-learning-with-big-data-86bcb39f2f0b
  22. https://drive.google.com/file/d/1Xf6Ix6pnMVpUAFh-bVw0EmkHls_WPUS9/view
  23. https://drive.google.com/file/d/1Xf6Ix6pnMVpUAFh-bVw0EmkHls_WPUS9/view