Kafka Introduction.pptx

KAFKA INTRODUCTION
KAFKA VS JMS: SIMILARITIES AND DIFFERENCES

EXAMPLE: MATECO
Mateco Business ‘Streams’
• Microsoft Dynamics 365
• CRM + ERP
• Invoicing
• Q.Rent
• Q.Planning
• Q.Service
• Q.Trade
• Integrations
Each ‘Stream’ has their own IT squad and
their own set of applications.

WHY DO WE NEED MESSAGES/EVENTS?
 IT Architecture with multiple teams
 Make the teams independent as much as possible
 No central database
 No distributed transactions
 No blocking REST calls
 But: Keep data consistent across teams’ services
 ‘Eventual Consistency’
 Solution
 each service publishes an event whenever it updates its
data
 other services subscribe to events
 when an event is received, that service updates its data

EVENT STREAMING
 Information is packaged as ‘events’, with enough
information for any consumer to be able to handle it
 The event is published in a central event store
 The sender does not need to know which consumers
will process it
 Allows:
 Replicate information across independent services
 In pseudo-real-time
 Create new applications without disrupting other
applications

JMS TOPICS AND QUEUES VS KAFKA TOPICS
JMS
 Topic: publish-subscribe
 All subscribers receive all
messages
 Messages are only sent to active
subscribers
 Queue: send-receive
 Messages are queue’d until a
consumer consumes it.
 Allows horizontal scaling
 Needs a separate queue for a
different receiver group
Kafka Topic
 Stores all messages
 Limited retention (optional)
 Stores offsets on the server for
each consumer group
 Multiple consumer(group)s can
receive all messages
 Consumers can restart from the
beginning
 New consumers can be added

KAFKA ARCHITECTURE: CONSUMER GROUPS

KAFKA ARCHITECTURE: BROKERS
Broker: Server for Partitions
• Partition master
• Partition replica
ZooKeeper
• Used for synchronization within brokers
• Optional since Kafka 2.8
• Older Kafka versions used ZooKeeper
for connection management and to
store offsets

MESSAGE GROUPS VS PARTITIONS
JMS
 Message Group ID is an optional header
 Guarantees that messages for the same
Message Group ID are processed by the same
thread
KAFKA
 Each message can have a key
 A topic is divided into partitions, each message will
be put into a partition based on a hash of its key
 Random partition if there’s no key
 Subscribers get a fixed number of partitions
assigned
 Partitions are also used for horizontal scaling
 Partitions are distributed across servers
 Consumers only need to connect to owners of their
partitions

KAFKA DEMO
 Run Kafka on Kubernetes
 Using helm charts, e.g: https://artifacthub.io/packages/helm/bitnami/kafka
 Alternative: Confluent Cloud, AWS Managed Kafka, …
 Demo application using Spring Boot: https://www.baeldung.com/java-kafka-streams-vs-kafka-consumer

KAFKA TOPIC CLEANUP POLICY
Kafka topics are stored in append-only segments.
 Cleanup is still done, but per segment, not per
message
 Time-based retention
 Size-based retention
 Unlimited retention
 Each topic has a cleanup policy that decides what
happens when the retention expires
 Delete: delete the oldest messages
 Compact: delete messages with duplicate keys
 Keep only the latest version of the value for each key
 How to delete a key: overwrite it with an empty value,
a.k.a ‘tombstone’

KEY/VALUE SERIALIZATION
 Message Keys and Values are Binary, but can be
serialized/deserialized by the client
 AVRO is a popular encoding format, more
compact than JSON
 You can generate java classes from AVRO schema’s

AVRO IDL EXAMPLE
@namespace("example.avr")
protocol ExampleProtocol {
record User {
string name;
int? favorite_number;
string? favorite_color = “red”;
}
}
Optional: Schema Registry server

KAFKA STREAMS
 Library for Event Processing
 Write your event processing logic as a series of processors
 Map, aggregate, reduce events
 Write to tables, join with other tables
 Kafka Streams makes it fault-tolerant and highly scalable
 Load-balances the processors via Kafka partitioning
 Re-distributes the load between processors via intermediate topics
 Tables can be replicated (global tables) or partitioned (local tables)

Kafka Introduction.pptx

Recommended

Recommended

More Related Content

Similar to Kafka Introduction.pptx

Similar to Kafka Introduction.pptx (20)

More from Geert Pante

More from Geert Pante (11)

Recently uploaded

Recently uploaded (20)

Kafka Introduction.pptx