Mission-Critical, Real-Time Fault-Detection for NASA's Deep Space Network using Apache Kafka (Rishi Verma, NASA Jet Propulsion Laboratory) Kafka Summit SF 2019
NASA's Deep Space Network (DSN) operates spacecraft communication links for NASA deep-space spacecraft missions, including the Curiosity Rover, the Voyager twin spacecraft, Galileo, New Horizons, etc., and has done so reliably for over fifty years. The DSN Complex Event Processing (DCEP) software assembly is a new software system being deployed worldwide into NASA's DSN Deep Space Communication Complexes (DSCC's), including facilities in Spain, Australia, and the United States. The system brings into the DSN next-generation "Big Data" and "Fast Data" infrastructural tools, including Apache Kafka, for correlating real-time network data with other critical data assets, including predicted antenna pointing parameters and extensive logging of physical hardware in the DSN. The ultimate use case is to ingest, filter, store, and visualize all of the DSN's monitor and control data and to actively ensure the successful DSN tracking, ranging, and communication integrity of dozens of concurrent deep-space missions. The system is also intended to support future autonomy applications, including automated anomaly detection in real-time network monitor streams and automated reconfiguration of antenna related assets as needed by future, increasingly autonomous spacecraft. This talk will focus upon the software system behind DCEP, and introduce novel approaches to increasing NASA spacecraft link-control operator cognizance into anomalies that may and do occur during spacecraft tracking activities. This talk will also offer lessons learned, and provide a glimpse into one of the most unique, "out-of-this-world", applications of Apache Kafka.
Similar to Mission-Critical, Real-Time Fault-Detection for NASA's Deep Space Network using Apache Kafka (Rishi Verma, NASA Jet Propulsion Laboratory) Kafka Summit SF 2019
Similar to Mission-Critical, Real-Time Fault-Detection for NASA's Deep Space Network using Apache Kafka (Rishi Verma, NASA Jet Propulsion Laboratory) Kafka Summit SF 2019 (20)
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Mission-Critical, Real-Time Fault-Detection for NASA's Deep Space Network using Apache Kafka (Rishi Verma, NASA Jet Propulsion Laboratory) Kafka Summit SF 2019
1. Mission-Critical, Real-Time Fault-Detection
for NASA’s Deep Space Network using
Apache Kafka
Rishi Verma
Jet Propulsion Laboratory,
California Institute of Technology
Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not constitute or imply its endorsement
by the United States Government or the Jet Propulsion Laboratory, California Institute of Technology.
2. Goldstone Complex Madrid Complex Canberra Complex Network Control (JPL)
“NASA s Deep Space Network (DSN) was established in
December 1963 to provide a communications infrastructure for
deep space missions”
3. Goldstone Complex Madrid Complex Canberra Complex Network Control (JPL)
“The DSN is responsible for Tracking, Telemetry, and
Command of NASA spacecraft using the DSN antennas and
partner tracking sites located around the world”
4. Goldstone Complex Madrid Complex Canberra Complex Network Control (JPL)
“The DSN also supports many international spacecraft as well
as scientific investigations through radio astronomy, radio
science, and radar activities”
6. A Diverse Set of Supported Missions
Not all supported missions depicted
7. “Uplinking”
- Send data to spacecraft
- Point accurately in the sky
- Detect equipment failures
- Schedule tracks
efficiently
“Downlinking”
- Receive data from spacecraft
- Ignore noise
- Adapt to weather
- Alert upon interference
- Deal with signal dispersion
Cosmic
Background
Noise
Hot Body
Noise
Spacecraft
DSN
Antenna
Deep Space Communications 101
Science
- Radar activities
- Radio Science
- Radio Astronomy
13. Predicted
Loss of Signal
(models)
Actual
Loss of Signal
(logs & equip. status)
Time 0 Time 1 Time 2 Time 3 Time 4
Signal Power
to Noise Ratio
38 sec discrepancy
Use Case: Providing Better Context for Ops
Historically: 2.6 sec discrepency
* Numbers are not necessarily reflective of actual conditions
14. 34 Meter Beam Waveguide 70 Meter Antenna
Use Case: Correcting for Data Heterogeneity
15. Other Use Cases
• Finding Trends over Very Long Time Periods
• Matching Incidents to Known Discrepancies
• Historically-informed Visualizations
• Reducing Costs for Building Real-Time Fault-Detection Apps
17. “Complex Events”
Real-time
Hardware /
Software Logs
Real-time
Equipment
Status
Predicted
Models
GUI’s for
Ops Staff
DSN
Automation
Input for more
Complex Events
Expert Domain
Knowledge /
ML-trained
Classifiers
18. Madrid Deep Space
Communications
Complex (MDSCC)
Network Operations &
Control Center (NOCC)
at JPL
Four Deployments of DCEP
Real-time
Data Syncing
Between
Data Centers
DCEP
DCEP
DCEPDCEP
Canberra Deep Space
Communications
Complex (CDSCC)
Goldstone
Deep Space
Communications
Complex (GDSCC)
19. Realtime Equip. Status
Expert
Knowledge
Predicted Models
Local Data / Services
ML Models
Realtime Logs
DSN Complex Event
Processor (DCEP)
Link Ops,
Network Ops,
Mission Ops
Results
e.g. fault alerts,
trends, automation,
etc.
DCEP at a Deep Space
Communications Complex
(DSCC)
Compute
Cluster
20. Real-time
(Equip. Status,
Logs)
Level 0 Topics Level 1 Topics
DSN
Proprietary
• Scope: Raw data
• Structure: 3 topics 9 partitions
• Data rate: ~2 Mbits / sec
• Scope: Processed data
• Structure: 3 topics 9 partitions
• Data rate: variable
KStreams
Data
Raw Data
Indices
• Scope: All DCEP data
• Size: ~10 TB
Non real-time
(models, files) Model & File Data
KStreams
21. DCEP Users
• “Track Visualizer”
• Anomaly Identification and Diagnosis AssistaNt (AIDAN)
• Direct Data Access for DSN Staff
24. Direct Data Access for DSN Staff
Discrepancy Report
Corroboration for
Investigators
Custom Historical
Queries for DSN
Programmers
Custom Real-Time
Event Generation
for DSN Programmers
25. Thank You.
Acknowledgements / Contributions:
Slide Material Contribution
• Dr. Les Deutsch
• Michael Levesque
Team
• Hong Chhay
• Saman Saeedi
• Hannah Gooden
• Natalie Gallegos
• Richard Kim
• Shan Malhotra
• Jay Wyatt
Imagery
• https://deepspace.jpl.nasa.gov
• https://photojournal.jpl.nasa.gov/beta
• Preston Dyches
• Melody Ho
Video
• Hong Chhay (Track Visualizer)
• Gary Doran (Anomaly Identification and Diagnosis
AssistaNt)