Apache Kafka is the de facto standard for data streaming to process data in motion. With its significant adoption growth across all industries, I get a very valid question every week: When NOT to use Apache Kafka? What limitations does the event streaming platform have? When does Kafka simply not provide the needed capabilities? How to qualify Kafka out as it is not the right tool for the job?
This session explores the DOs and DONTs. Separate sections explain when to use Kafka, when NOT to use Kafka, and when to MAYBE use Kafka.
No matter if you think about open source Apache Kafka, a cloud service like Confluent Cloud, or another technology using the Kafka protocol like Redpanda or Pulsar, check out this slide deck.
A detailed article about this topic:
https://www.kai-waehner.de/blog/2022/01/04/when-not-to-use-apache-kafka/
Unlocking the Future of AI Agents with Large Language Models
When NOT to use Apache Kafka?
1. When NOT to use
Apache Kafka?
kai-waehner.de | @KaiWaehner | Field CTO @ Confluent
2. Data Streaming with Apache Kafka
DWH
APP
STREAM
PROCESSING
CONNECTORS
ksqlDB
KStreams
APP
Streaming ETL
Data Processing
Real-time Analytics
Stateless and Stateful
Business Applications
Fully-managed
Pipelines
Connectivity to
Data Infrastructure,
SaaS, AI/ML
Data Governance
Connectivity
Filtering and Routing
Change Data Capture
Built-in Scale and Fault Tolerance
Oracle
DB
ORACLE CDC
SOURCE
PREMIUM
CONNECTOR
Real-time Data Sharing
across Hybrid and Multi-Cloud
Storage
Backpressure Handling
Slow Consumers
Replayability
kai-waehner.de | @KaiWaehner | When NOT to use Apache Kafka?
3. Markets
DaaS
Digital
replatforming/
Legacy Modernization
Customer
360
Faster transactional
processing / analysis
incl. Machine Learning / AI
Microservices
Architecture
Online Fraud
Detection
Online Security
(syslog, log aggregation,
Splunk replacement)
Middleware
replacement
Website / Core
Operations /
Payments
(Central Nervous System)
Real-time
app updates
Customer
Experience
Core Business
Platform
Operational
Efficiency (Agility)
Migrate to
Cloud
Fraud
Detection
Regulatory
Increase
Revenue
(make money)
Decrease
Costs
(save money)
Mitigate
Risk
(protect money)
Business Value
10 business
use case
Strategic
Driver
20 business
use case
Data Eng. /
Infrastructure
use case
Use Cases for Data Streaming by Business Value
kai-waehner.de | @KaiWaehner | When NOT to use Apache Kafka?
4. When NOT to use
Apache Kafka?
kai-waehner.de | @KaiWaehner | When NOT to use Apache Kafka?
5. Kafka is a Database BUT NOT for Complex Analytics
kai-waehner.de | @KaiWaehner | When NOT to use Apache Kafka?
Durable
Fault-tolerant
Tiered Storage
Compacted Topics
Exactly-once Semantics
RocksDB on Client Side
ksqlDB
Interactive Queries
“You Name It”
Connect
6. Kafka is NOT a Proxy for Millions of Clients
kai-waehner.de | @KaiWaehner | When NOT to use Apache Kafka?
“Last Mile” Integration
is usually a Proxy
(like HTTP or MQTT)
7. Kafka is NOT an API Management Platform
kai-waehner.de | @KaiWaehner | When NOT to use Apache Kafka?
Orders Customers
Payments
Stock
API
(HTTP/REST)
Data Streaming
Data Integration
Real-Time Apps
API Gateway
API Lifecycle
Data Sharing
Monetization
REST
Proxy
Stream
Exchange
8. Kafka is NOT the right tool for processing large messages *
kai-waehner.de | @KaiWaehner | When NOT to use Apache Kafka?
Claim Check Enterprise
Integration Pattern:
* BUT works well for some use cases, e.g.:
- Splitting large legacy CSV files
- Externalizing large payloads on-the-fly
- Image processing at the edge
- Uploading large files into the DWH
Pre-Processing and Data Correlation
e.g. enrich with other metadata
(ksqlDB)
Store big files in data lake
(e.g. AWS S3)
Consume and correlate
metadata
(Kafka Streams)
Automated
Orchestration
(Kafka Clients)
Real time analytics and
other business applications
(Kafka Clients + other tools)
Send metadata
including link to video
in object store
(Kafka Producer)
Download big files
from data lake
9. Kafka is NOT an IoT Platform *
kai-waehner.de | @KaiWaehner | When NOT to use Apache Kafka?
Siemens S7
Kafka
Connect
Storage
Kafka Streams
/
ksqlDB
Stateless +
Stateful
REST Proxy
HTTP(S)
SCADA
DCS
ERP
MES
Cloud
Factory
* BUT Kafka is a fundamental
part of most IoT projects, e.g.:
- Scalable real-time data hub
for IoT data AND IT data
- Edge and hybrid cloud
- Direct integration with IoT
protocols
- Integration via 3rd party with
IoT protocols
Analytics
Database
Data Lake
CRM
Kafka Connect
Cluster Linking
10. Kafka is NOT for hard real-time requirements
kai-waehner.de | @KaiWaehner | When NOT to use Apache Kafka?
OT - Connected Vehicle
(Car, Train, Drone)
OT - Manufacturing
(Field Bus, PLC, Machine, Robot)
IT – Enterprise Software
(Data Center, Cloud, Car IT)
Central Data Center / Public Cloud
Vehicle Data
Robot Data All Data
C
C++
Rust
C
C++
Rust
Java
Python
Go
[#] Hard Real Time
= Deterministic network
with zero spikes + zero latency
[#] Soft Real Time
+ Near Real Time
+ Batch
Cluster Linking Cluster Linking