Real-time data beats slow data. That’s true for almost every use case. Nevertheless, enterprise architects build new infrastructures with the Lambda architecture that includes separate batch and real-time layers.
This video explores why a single real-time pipeline, called Kappa architecture, is the better fit for many enterprise architectures. Real-world examples from companies such as Disney, Shopify, Uber, and Twitter explore the benefits of Kappa but also show how batch processing fits into this discussion positively without the need for a Lambda architecture.
The main focus of the discussion is on Apache Kafka (and its ecosystem) as the de facto standard for event streaming to process data in motion (the key concept of Kappa), but the video also compares various technologies and vendors such as Confluent, Cloudera, IBM Red Hat, Apache Flink, Apache Pulsar, AWS Kinesis, Amazon MSK, Azure Event Hubs, Google Pub Sub, and more.
Video recording of this presentation:
https://youtu.be/j7D29eyysDw
Further reading:
https://www.kai-waehner.de/blog/2021/09/23/real-time-kappa-architecture-mainstream-replacing-batch-lambda/
https://www.kai-waehner.de/blog/2021/04/20/comparison-open-source-apache-kafka-vs-confluent-cloudera-red-hat-amazon-msk-cloud/
https://www.kai-waehner.de/blog/2021/05/09/kafka-api-de-facto-standard-event-streaming-like-amazon-s3-object-storage/
Optimizing AI for immediate response in Smart CCTV
Kappa vs Lambda Architectures and Technology Comparison
1. Kappa vs. Lambda Architecture
Use Cases, Trade-offs, Technologies, Comparison
Kai Waehner
Field CTO
kai.waehner@confluent.io
linkedin.com/in/kaiwaehner
@KaiWaehner
confluent.io
kai-waehner.de
2. An Event Streaming Platform
The Underpinning of Data in Motion
2
Microservices
DBs
SaaS apps
Mobile
Customer 360
Real-time fraud
detection
Data warehouse
Producers
Consumers
Database
change
Microservices
events
SaaS
data
Customer
experiences
Streams of real time events
Stream processing apps
Connectors
Connectors
Stream processing apps
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
3. STREAM
PROCESSING
CONNECTORS
Example Architecture for Data in Motion
ksqlDB
KStreams
Real-time decision making for claim processing and fraud detection
Dashboard
Oracle
DB
Oracle
CDC
CONNECTOR
Salesforce CDC
CONNECTOR
Salesforce
Source / Sink
CONNECTOR
Fraud Detection App
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
4. Kafka Connect
Kafka Cluster
CRM Integration
Domain-Driven Design for your Integration Layer
Legacy
Integration
Custom
Application
ESB Connector
Java / Python /
ksqlDB / etc.
Schema Registry
Event Streaming Platform
CRM Domain Legacy Domain Payment Domain
è Independent and loosely coupled, but scalable, highly available and reliable!
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
5. Lambda Architecture
Option 1: Unified serving layer
7
Data
Source
Real-Time Layer
(Data Processing in Motion)
Batch Layer
(Data Processing at Rest)
Serving
Layer
Real-Time App
(Data Processing in Motion)
Batch App
(Data Processing at Rest)
ms
min/hr
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
6. 8
Data
Source
Real-Time Layer
(Data Processing in Motion)
Batch Layer
(Data Processing at Rest)
Real-time Query
Mixed Query
ms
min/hr
Speed
View
Batch
View
Batch Query
Lambda Architecture
Option 2: Separate serving layers
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
7. Concerns with the Lambda Architecture
9
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
8. 10
Data
Source
Real-Time Layer
(Data Processing in Motion)
Real-Time App
(Data Processing in Motion)
Storage
Batch App
(Data Processing at Rest)
Storage
ms
min/hr
Storage
Kappa Architecture
One pipeline for real-time and batch consumers
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
9. Kappa is NOT a free lunch
11
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
10. Kappa Concerns Solved
• Data availability / retention
à Compacted Topics, Tiered Storage
• Data consistency and fault-tolerance
à Exactly-once semantics, Multi-Region Clusters, Cluster Linking
• Handling late-arriving data
à State management in the streaming application, proper data
sinks, replay with guaranteed ordering and timestamps
• Data reprocessing and backfill
à Dynamic clusters, stateful applications (Kafka Streams, ksqlDB,
external stream processing framework like Apache Flink)
• Data integration
à Kafka Connect for sources and sinks, clients for any language,
REST Proxy (real-time but also batch and RPC
12
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
12. Kappa @ Shopify
14
Kappa Building Blocks
The Log (Kafka)
Durability with Topic Compaction and Tiered Storage
Consistency via Exactly-Once Semantics (EOS)
Data Integration via Kafka Connect
Elasticity via dynamic Kafka clusters
Streaming Framework (Kafka Streams / Flink)
Reliability and scalability
Fault tolerance
State management
Sinks
Update/Upsert for simplified design:
RDBMS, NoSQL, Compacted Kafka Topics
Append-only: Regular Kafka Topics, Time Series
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
15. Benefits of the Kappa Architecture
The Kappa architecture leverages a single source of truth with a focus on simplicity in
the enterprise architecture
• Improve streaming to handle all the cases
• One codebase that is always in synch
• One set of infrastructure and technology
• The heart of the infrastructure is real-time, scalable, and reliable
• Improved data quality with guaranteed ordering and no mismatches
• No need to re-architect for new use cases, just connect new consumers (real-time, near
real-time, batch, RPC)
18
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
17. Use Cases for Reprocessing Historical Events
Give me all events from time A to time B
Real-time Producer
Time
• New consumer application
• Error-handling
• Compliance / regulatory processing
• Query and analyze existing events
• Schema changes in analytics platform
• Model training
Real-time Consumer
Consumer of Historical Data
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
19. Confluent Tiered Storage for Kafka
24
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
20. honeycomb - Observability
• Kafka is the “beating heart” of Honeycomb, powering the 99.99% ingest availability SLO
• Ingest telemetry data
• Buffer big data before processing in “retriever” columnar storage database
• True decoupling to innovate more quickly by shipping to each service
• Guard against the risk of a bug in retriever corrupting customer data
• Confluent Tiered Storage frees the engineering from being storage-bound
• Has grown 10x in two years while TCO for Kafka has only gone up 20%
• Replayability from Tiered Storage after outage for error handling
25
https://www.honeycomb.io/blog/scaling-kafka-observability-pipelines/
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
21. Kappa Architecture
for Streaming Analytics with Kafka and TensorFlow
26
MQTT Proxy
MongoDB
Storage
MongoDB
Dashboards
Search
Analytics
Kafka Cluster Kafka Connect
Car Sensors
Kafka Ecosystem
TensorFlow
Other Components
Kafka Streams
Application
All
Data
Critical
Data
Ingest
Data
Potential Detect
TensorFlow
Train Analytic
Model
ksqlDB
Analytic
Model
Preprocess Data Consume
Data
Deploy
Analytic Model
Tiered Storage
Mobile App
BI Tool
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
22. Direct streaming ingestion
for model training
with TensorFlow I/O + Kafka Plugin
(no additional data storage
like S3 or HDFS required!)
Time
Model B
Model A
Producer
Distributed Commit
Log
Streaming Ingestion and Model Training
with TensorFlow IO
https://github.com/tensorflow/io
27
Model X
(at a later time)
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
23. “CREATE STREAM AnomalyDetection AS
SELECT sensor_id, detectAnomaly(sensor_values)
FROM car_engine;“
User Defined Function (UDF)
Model Deployment with
Apache Kafka, ksqlDB and TensorFlow
28
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
24. Car Engine Car Self-driving Car
Alternatives for Data in Motion
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
25. Native Kafka Kafka Protocol
(not fully compliant)
Non Kafka
The Event Streaming Landscape – Cloud-native? Complete? Everywhere?
Apache Kafka Products and Cloud Services, “Compatible” Offerings, and other Streaming Technologies
Self Managed
(Everywhere)
Partially
Managed
Fully Managed
(Cloud only)
(Cloud
only)
(Everywhere)
(Kafka mapper not
part of cloud offering)
Platforms Tools
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture