SlideShare a Scribd company logo
1 of 60
Download to read offline
101* ways to configure Kafka
- badly
Hva FINN.no har lært av å kjøre Kafka
Audun Fauchald Strand
Lead Developer Infrastructure
@audunstrand
bio: gof, mq, ejb,
wli, bpel eda, soa
esb, ddd, k8s
Henning Spjelkavik
Architect
@spjelkavik
bio: Skiinfo (Vail Resorts),
FINN.no
enjoys reading jstacks
agenda
Introduction to kafka
kafka @ finn.no
101* mistakes
questions
“From a certain point onward
there is no longer any turning
back. That is the point that must
be reached.”
― Franz Kafka, The Trial
introduction to
kafka
why use kafka
#notAnESB
what is a log
terminology
components
giant leap
“A First Sign of the Beginning of
Understanding is the Wish to
Die.”
― Franz Kafka
https://commons.wikimedia.org/wiki/File:Kafka.jpg
Why use Kafka?
“Apache Kafka is publish-subscribe messaging
rethought as a distributed commit log.”
Fast
Scalable
Durable
Distributed by design
Sweet spot: High volume, low latency
Quora:
“Use Kafka if you have a fire hose of events (100k+/sec)
you need delivered in partitioned order 'at least once' with
a mix of online and batch consumers, you want to be able
to re-read messages”
“Use Rabbit if you have messages (20k+/sec) that need to
be routed in complex ways to consumers, you want per-
message delivery guarantees, you don't care about ordered
delivery”
What is a (data) log / journal?
A log is perhaps the simplest possible storage abstraction.
It is an append-only, totally-ordered sequence of records ordered by time.
Appended to the end of the log, reads proceed left-to-right.
Each entry is assigned a unique sequential log entry number.
The ordering of records defines a notion of "time" since entries to the left are
defined to be older then entries to the right.
This is a data log, not an application log (i.e not log4j)
Changelog 101: Tables and Events are Dual
Duality: a log of changes and a table.
Accounting
log: credit and debit (events pr key)
table: all current balances (i.e state pr key)
In a sense the log is the more fundamental data structure: in addition to creating the
original table you can also transform it to create all kinds of derived tables.
producers writes to brokers
consumers reads from brokers
everything is distributed
data is stored in topics
topics are split into partitions
which are replicated
kafka cluster
consumer
producerproducer
producer producer
consumer
consumer
consumer
consumer
consumer
producer
producer
terminology
producer
“ad.new”
consumer
group.1
“ad.new”
Broker1
Broker2
Broker3
P0
R1
P0
R2
P1
R3
P2
R1
P1
R2
P2
R3
zookeeper
components consumer
group.id
“ad.new”
consumer
group.1
“ad.new”
1:data 2:... 3:... 4:... 5:... 6:... 7:...
old messages newer messages
...
consumer.group.1
consumer.group.2
producer a1
Giant leap?
In fact, persistent replicated messaging is such a giant leap in messaging architecture that it may be worthwhile to point out a few side
effects:
a. Per-message acknowledgments have disappeared
b. ordered delivery
c. The problem of mismatched consumer speed has disappeared. A slow consumer can peacefully co-exist with a fast
consumer now
d. Need for difficult messaging semantics like delayed delivery, re-delivery etc. has disappeared. Now it is all up to the
consumer to read whatever message whenever - onus has shifted from broker to consumer
e. The holy grail of message delivery guarantee: at-least-once is the new reality - both Kafka and Azure Event Hub
provides this guarantee. You still have to make your consumers and downstream systems idempotent so that recovering
from a failure and processing the same message twice does not upset it too much, but hey - that has always been the
case
http://blogs.msdn.com/b/opensourcemsft/archive/2015/08/08/choose-between-azure-event-hub-and-kafka-_2d00_-what-
you-need-to-know.aspx
Confluent platform
Top 5
1. no consideration of data on the
inside vs outside
2. schema not externally defined
3. same config for every
client/topic
4. 128 partitions as default config
5. running on 8 overloaded nodes
reklame
FINN.no
https://finn.no/apply-here
https://twitter.com/@FINN_tech
http://tech.finn.no
https://github.com/finn.no
60 millions pageviews a day
250-300 microservices
130 developers
1479 deployed changes into
production last week
6 minutes from commit to deploy
(median)
#javazone @spjelkavik @audunstrand
Schibsted Media Group
6800 people in 30 countries
FINN.no is a part of
6900
employees
30 countries
200 million
users
jobs.schibsted.com
kafka @ finn.no
kafka
@finn.no
architecture
use cases
tools
#javazone @spjelkavik @audunstrand
in the beginning ...
Architecture governance board decided to use RabbitMQ as message queue.
Kafka was installed for a proof of concept, after developers spotted it januar 2013.
#javazone @spjelkavik @audunstrand
2013 - POC
“High” volume
Stream of classified ads
Ad matching
Ad indexed
mod05
zk
kafka
mod07
zk
kafka
mod01
zk
kafka
mod03
zk
kafka
mod06
zk
kafka
mod08
zk
kafka
mod02
zk
kafka
mod04
zk
kafka
dc 1
dc 2
Version 0.8.1
4 partitions
common client
java library
thrift
#javazone @spjelkavik @audunstrand
2014 - Adoption and
complaining
low volume/ high
reliability
Ad Insert
Product Orchestration
Payment
Build Pipeline
click streams
mod05
zk
kafka
mod07
zk
kafka
mod01
zk
kafka
mod03
zk
kafka
mod06
zk
kafka
mod08
zk
kafka
mod02
zk
kafka
mod04
zk
kafka
dc 1
dc 2
Version 0.8.1
4 partitions
experimenting
with
configuration
common java
library
#javazone @spjelkavik @audunstrand
tooling
alerting
#javazone @spjelkavik @audunstrand
2015 - Migration and
consolidation
“reliable messaging”
asynchronous
communication
between services
store and forward
zipkin
slack notifications
dc 1
dc 2
Version 0.8.2
5-20 partitions
multiple
configurations
broker05
zk
kafka
broker01
zk
kafka
broker03
zk
kafka
broker04
zk
kafka
broker02
zk
kafka
#javazone @spjelkavik @audunstrand
tooling
Grafana dashboard visualizing jmx stats
via Prometheus
Kafka-manager from Yahoo
Kafka-cat cli
#javazone @spjelkavik @audunstrand
2016 - Confluent
broker01
broker05
Schema
Registry
kafka
kafka
broker03
kafka
broker04
kafka
broker02
kafka
zk02 zk zk03 zk
Rest Proxy
zk01 zk
platform
schema registry
data replication
kafka connect
kafka streams
101* mistakes
“God gives the
nuts, but he
does not crack
them.”
― Franz Kafka
Pattern
Language
why is it a mistake
what is the consequence
what is the correct solution
what has finn.no done
Top 5
1. no consideration of data on the
inside vs outside
2. schema not externally defined
3. same config for every
client/topic
4. 128 partitions as default config
5. running on 8 overloaded nodes
#javazone @spjelkavik @audunstrand
mistake:
no consideration of data on
the inside vs outside
https://flic.kr/p/6MjhUR
#javazone @spjelkavik @audunstrand
why is it a mistake
everything published on Kafka (0.8.2) is visible to any client that can access
#javazone @spjelkavik @audunstrand
what is the consequence
direct reads across services/domains is quite normal in legacy and/or enterprise
systems
coupling makes it hard to make changes
unknown and unwanted coupling has a cost
Kafka had no security per topic - you must add that yourself
#javazone @spjelkavik @audunstrand
what is the correct solution
Consider what is data on the inside, versus data on the outside
Convention for what is private data and what is public data
If you want to change your internal representation often, map it before publishing it
publicly (Anti corruption layer)
#javazone @spjelkavik @audunstrand
what has finn.no done
Decided on a naming convention (i.e Public.xyzzy) for public topics
Communicates the intention (contract)
#javazone @spjelkavik @audunstrand
mistake:
schema not externally
defined
#javazone @spjelkavik @audunstrand
why is it a mistake
data and code needs separate versioning strategies
version should be part of the data
defining schema in a java library makes it more difficult to access data from non-
jvm languages
very little discoverability of data, people chose other means to get their data
difficult to create tools
#javazone @spjelkavik @audunstrand
what is the consequence
development speed outside jvm has been slow
change of data needs coordinated deployment
no process for data versioning, like backwards compatibility checks
difficult to create tooling that needs to know data format, like data lake
and database sinks
#javazone @spjelkavik @audunstrand
what is the correct solution
confluent.io platform has a separate schema registry
apache avro
multiple compatibility settings and evolutions strategies
connect
more automatic tooling
#javazone @spjelkavik @audunstrand
what has finn.no done
still using java library, with schemas in builders
confluent platform 2.0 is planned for the next step, not (just) kafka 0.9
#javazone @spjelkavik @audunstrand
mistake:
running mixed load with a
single, default configuration
https://flic.kr/p/qbarDR
#javazone @spjelkavik @audunstrand
why is it a mistake
Historically - One Big Database with Expensive License
Database world - OLTP and OLAP
Changed with Open Source software and Cloud
Tried to simplify the developer's day with a single config
Kafka supports very high throughput and highly reliable
#javazone @spjelkavik @audunstrand
what is the consequence
Trade off between throughput and degree of reliability
With a single configuration - the last commit wins
Either high throughput, and risk of loss - or potentially too slow
#javazone @spjelkavik @audunstrand
what is the correct solution
Understand your use cases and their needs!
Use proper pr topic configuration
Consider splitting / isolation
#javazone @spjelkavik @audunstrand
Defaults that are quite reliable
Exposing configuration variables in the client
Ask the questions;
at least once delivery
ordering - if you partition, what must have strict ordering
99% delivery - is that good enough?
what level of throughput is needed
what has finn.no done
#javazone @spjelkavik @audunstrand
Configuration
Configuration for production
Partitions
Replicas (default.replication.factor)
Minimum ISR (min.insync.replicas)
Wait for acknowledge when producing messages (request.required.acks, block.on.buffer.full)
Retries
Leader election
Configuration for consumer
Number of threads
#javazone @spjelkavik @audunstrand
Gwen Shapira recommends...
akcs = all
block.on.buffer.full = true
retries = MAX_INT
max.inflight.requests.per.connect = 1
Producer.close()
replication-factor >= 3
min.insync.replicas = 2
unclean.leader.election = false
#javazone @spjelkavik @audunstrand
mistake:
default configuration of 128 partitions
for each topic
https://flic.kr/p/6KxPgZ
#javazone @spjelkavik @audunstrand
why is it a mistake
partitions are kafkas way of scaling consumers, 128 partitions can handle 128
consumer processes
in 0.8; clusters could not reduce the number of partitions without deleting data
highest number of consumers today is 20
#javazone @spjelkavik @audunstrand
what is the consequence
our 0.8 cluster was configured with 128 partitions as default, for all topics.
many partitions and many topics creates many datapoints that must be coordinated
zookeeper must coordinate all this
rebalance must balance all clients on all partitions
zookeeper and kafka went down (may 2015)
(500 topics * 128 partitions)
#javazone @spjelkavik @audunstrand
what is the correct solution
small number of partitions as default
increase number of partitions for selected topics
understand your use case
reduce length of transactions on consumer side
partitions per topic: max(t/p, t/c)
Max partitions on a broker 100 x brokers x replication factor => 1500 in our case
http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/
#javazone @spjelkavik @audunstrand
what has finn.no done
5 partitions as default
4 topics have more than 5 partitions, topics with lots of traffic
#javazone @spjelkavik @audunstrand
mistake:
deploy a proof of concept
hack - in production ; i.e
why we had 8 zk nodes
https://flic.kr/p/6eoSgT
#javazone @spjelkavik @audunstrand
why is it a mistake
Kafka was set up by Ops for a proof of concept - not for hardened production use
By coincidence we had 8 nodes for kafka, the same 8 nodes for zookeeper
Zookeeper is dependent on a majority quorum, low latency between nodes
The 8 nodes were NOT dedicated - in fact - they were overloaded already
#javazone @spjelkavik @audunstrand
what is the consequence
Zookeeper recommends 3 nodes for normal usage, 5 for high, and any more is
questionable
More nodes leads to longer time for finding consensus, more communication
If we get a split between data centers, there will be 4 in each
You should not run Zk between data centers, due to latency and outage
possibilities
#javazone @spjelkavik @audunstrand
what is the correct solution
Have an odd number of Zookeeper nodes - preferrably 3, at most 5
Don’t cross data centers
Check the documentation before deploying serious production load
Don’t run a sensitive service (Zookeeper) on a server with 50 jvm-based services,
300% over committed on RAM
Watch GC times
#javazone @spjelkavik @audunstrand
what has finn.no done
dc 1
dc 2
broker05
zk
kafka
broker01
zk
kafka
broker03
zk
kafka
broker04
zk
kafka
broker02
zk
kafka
Version 0.8.2
5-20 partitions
multiple
configurations
#javazone @spjelkavik @audunstrand
“They say
ignorance is
bliss.... they're
wrong ”
― Franz Kafka
“It's only because of
their stupidity that
they're able to be so
sure of themselves.”
― Franz Kafka, The
Trial
Audun Fauchald Strand
@audunstrand
Henning Spjelkavik
@spjelkavik
Q?
https://www.finn.no/apply-here
https://tech.finn.no
https://twitter.com/@FINN_tech
https://github.com/finn.no
http://www.schibsted.com/en/Career/
#javazone @spjelkavik @audunstrand
Runner up
Using pre-1.0 software
Have control of topic creation
Kafka is storage - treat it like one also ops-wise
Client side rebalancing
Commiting on all consumer threads, believing that you only commited on one
#javazone @spjelkavik @audunstrand
References / Further reading
Designing data intensive systems, Martin Kleppmann
Data on the inside - data on the outside, Pat Helland
The Confluent Blog, http://confluent.io/
Kafka - The definitive guide
This presentation, in English: http://www.confluent.io/blog/the-top-sessions-from-
kafka-summit-2016
www.finn.no/apply-here
jobs.schibsted.com/
tech.finn.no
twitter.com/@FINN_tech
github.com/finn.no

More Related Content

What's hot

Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Gwen (Chen) Shapira
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereGwen (Chen) Shapira
 
Papers we love realtime at facebook
Papers we love   realtime at facebookPapers we love   realtime at facebook
Papers we love realtime at facebookGwen (Chen) Shapira
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...confluent
 
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 PeopleKafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 Peopleconfluent
 
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data PlatformStream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platformconfluent
 
Kafka Summit SF 2017 - One Data Center is Not Enough: Scaling Apache Kafka Ac...
Kafka Summit SF 2017 - One Data Center is Not Enough: Scaling Apache Kafka Ac...Kafka Summit SF 2017 - One Data Center is Not Enough: Scaling Apache Kafka Ac...
Kafka Summit SF 2017 - One Data Center is Not Enough: Scaling Apache Kafka Ac...confluent
 
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaDecoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaGrant Henke
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaGuozhang Wang
 
Help, my Kafka is broken! (Emma Humber, IBM) Kafka Summit SF 2019
Help, my Kafka is broken! (Emma Humber, IBM) Kafka Summit SF 2019Help, my Kafka is broken! (Emma Humber, IBM) Kafka Summit SF 2019
Help, my Kafka is broken! (Emma Humber, IBM) Kafka Summit SF 2019confluent
 
Developing with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaDeveloping with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaJoe Stein
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaShiao-An Yuan
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in KafkaJoel Koshy
 
Kafka at Peak Performance
Kafka at Peak PerformanceKafka at Peak Performance
Kafka at Peak PerformanceTodd Palino
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...confluent
 
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...HostedbyConfluent
 
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache KafkaKafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafkaconfluent
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache KafkaAmir Sedighi
 

What's hot (20)

Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be there
 
Papers we love realtime at facebook
Papers we love   realtime at facebookPapers we love   realtime at facebook
Papers we love realtime at facebook
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
 
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 PeopleKafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
 
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data PlatformStream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
 
Kafka Summit SF 2017 - One Data Center is Not Enough: Scaling Apache Kafka Ac...
Kafka Summit SF 2017 - One Data Center is Not Enough: Scaling Apache Kafka Ac...Kafka Summit SF 2017 - One Data Center is Not Enough: Scaling Apache Kafka Ac...
Kafka Summit SF 2017 - One Data Center is Not Enough: Scaling Apache Kafka Ac...
 
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaDecoupling Decisions with Apache Kafka
Decoupling Decisions with Apache Kafka
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
 
Help, my Kafka is broken! (Emma Humber, IBM) Kafka Summit SF 2019
Help, my Kafka is broken! (Emma Humber, IBM) Kafka Summit SF 2019Help, my Kafka is broken! (Emma Humber, IBM) Kafka Summit SF 2019
Help, my Kafka is broken! (Emma Humber, IBM) Kafka Summit SF 2019
 
Developing with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaDeveloping with the Go client for Apache Kafka
Developing with the Go client for Apache Kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in Kafka
 
Kafka at Peak Performance
Kafka at Peak PerformanceKafka at Peak Performance
Kafka at Peak Performance
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
 
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
 
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache KafkaKafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 

Similar to 101 ways to configure kafka - badly

Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreAzure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreDataStax Academy
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Slim Baltagi – Flink vs. Spark
Slim Baltagi – Flink vs. SparkSlim Baltagi – Flink vs. Spark
Slim Baltagi – Flink vs. SparkFlink Forward
 
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf
 
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaA noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaData Con LA
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningGuido Schmutz
 
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support PerspectiveApache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support PerspectiveHostedbyConfluent
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...Athens Big Data
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
Connect K of SMACK:pykafka, kafka-python or?
Connect K of SMACK:pykafka, kafka-python or?Connect K of SMACK:pykafka, kafka-python or?
Connect K of SMACK:pykafka, kafka-python or?Micron Technology
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBrian Ritchie
 
Extending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesExtending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesNicola Ferraro
 
Testing Delphix: easy data virtualization
Testing Delphix: easy data virtualizationTesting Delphix: easy data virtualization
Testing Delphix: easy data virtualizationFranck Pachot
 
London In-Memory Computing Meetup - A Change-Data-Capture use-case: designing...
London In-Memory Computing Meetup - A Change-Data-Capture use-case: designing...London In-Memory Computing Meetup - A Change-Data-Capture use-case: designing...
London In-Memory Computing Meetup - A Change-Data-Capture use-case: designing...Nicolas Fränkel
 
Making clouds: turning opennebula into a product
Making clouds: turning opennebula into a productMaking clouds: turning opennebula into a product
Making clouds: turning opennebula into a productCarlo Daffara
 
Making Clouds: Turning OpenNebula into a Product
Making Clouds: Turning OpenNebula into a ProductMaking Clouds: Turning OpenNebula into a Product
Making Clouds: Turning OpenNebula into a ProductNETWAYS
 
OpenNebulaConf 2013 - Making Clouds: Turning OpenNebula into a Product by Car...
OpenNebulaConf 2013 - Making Clouds: Turning OpenNebula into a Product by Car...OpenNebulaConf 2013 - Making Clouds: Turning OpenNebula into a Product by Car...
OpenNebulaConf 2013 - Making Clouds: Turning OpenNebula into a Product by Car...OpenNebula Project
 
Liveperson DLD 2015
Liveperson DLD 2015 Liveperson DLD 2015
Liveperson DLD 2015 LivePerson
 

Similar to 101 ways to configure kafka - badly (20)

Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreAzure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Slim Baltagi – Flink vs. Spark
Slim Baltagi – Flink vs. SparkSlim Baltagi – Flink vs. Spark
Slim Baltagi – Flink vs. Spark
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7
 
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaA noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
 
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support PerspectiveApache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Connect K of SMACK:pykafka, kafka-python or?
Connect K of SMACK:pykafka, kafka-python or?Connect K of SMACK:pykafka, kafka-python or?
Connect K of SMACK:pykafka, kafka-python or?
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache Kafka
 
Extending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesExtending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with Kubernetes
 
Testing Delphix: easy data virtualization
Testing Delphix: easy data virtualizationTesting Delphix: easy data virtualization
Testing Delphix: easy data virtualization
 
London In-Memory Computing Meetup - A Change-Data-Capture use-case: designing...
London In-Memory Computing Meetup - A Change-Data-Capture use-case: designing...London In-Memory Computing Meetup - A Change-Data-Capture use-case: designing...
London In-Memory Computing Meetup - A Change-Data-Capture use-case: designing...
 
Making clouds: turning opennebula into a product
Making clouds: turning opennebula into a productMaking clouds: turning opennebula into a product
Making clouds: turning opennebula into a product
 
Making Clouds: Turning OpenNebula into a Product
Making Clouds: Turning OpenNebula into a ProductMaking Clouds: Turning OpenNebula into a Product
Making Clouds: Turning OpenNebula into a Product
 
OpenNebulaConf 2013 - Making Clouds: Turning OpenNebula into a Product by Car...
OpenNebulaConf 2013 - Making Clouds: Turning OpenNebula into a Product by Car...OpenNebulaConf 2013 - Making Clouds: Turning OpenNebula into a Product by Car...
OpenNebulaConf 2013 - Making Clouds: Turning OpenNebula into a Product by Car...
 
Liveperson DLD 2015
Liveperson DLD 2015 Liveperson DLD 2015
Liveperson DLD 2015
 

More from Henning Spjelkavik

Hles 2021 Digital transformation - How to use digital tools to improve our ev...
Hles 2021 Digital transformation - How to use digital tools to improve our ev...Hles 2021 Digital transformation - How to use digital tools to improve our ev...
Hles 2021 Digital transformation - How to use digital tools to improve our ev...Henning Spjelkavik
 
Digital techlunsj hos FINN.no 2020-06-10
Digital techlunsj hos FINN.no 2020-06-10Digital techlunsj hos FINN.no 2020-06-10
Digital techlunsj hos FINN.no 2020-06-10Henning Spjelkavik
 
10 years of microservices at finn.no - why is that dragon still here (ndc o...
10 years of microservices at finn.no  - why is that dragon still here  (ndc o...10 years of microservices at finn.no  - why is that dragon still here  (ndc o...
10 years of microservices at finn.no - why is that dragon still here (ndc o...Henning Spjelkavik
 
How FINN became somewhat search engine friendly @ Oslo SEO meetup 2018
How FINN became somewhat search engine friendly @ Oslo SEO meetup 2018How FINN became somewhat search engine friendly @ Oslo SEO meetup 2018
How FINN became somewhat search engine friendly @ Oslo SEO meetup 2018Henning Spjelkavik
 
An approach to it in a high level event - IOF HLES 2017
An  approach to it in a high level event - IOF HLES 2017An  approach to it in a high level event - IOF HLES 2017
An approach to it in a high level event - IOF HLES 2017Henning Spjelkavik
 
Smidig 2016 - Er ledelse verdifullt likevel?
Smidig 2016 - Er ledelse verdifullt likevel?Smidig 2016 - Er ledelse verdifullt likevel?
Smidig 2016 - Er ledelse verdifullt likevel?Henning Spjelkavik
 
Geomatikkdagene 2016 - Kart på FINN.no
Geomatikkdagene 2016 - Kart på FINN.noGeomatikkdagene 2016 - Kart på FINN.no
Geomatikkdagene 2016 - Kart på FINN.noHenning Spjelkavik
 
Hvorfor vi bør brenne gammel management litteratur
Hvorfor vi bør brenne gammel management litteraturHvorfor vi bør brenne gammel management litteratur
Hvorfor vi bør brenne gammel management litteraturHenning Spjelkavik
 
How we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.noHow we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.noHenning Spjelkavik
 
HLES 2015 It in a high level event
HLES 2015 It in a high level eventHLES 2015 It in a high level event
HLES 2015 It in a high level eventHenning Spjelkavik
 
Strategisk design med "Impact Mapping"
Strategisk design med "Impact Mapping"Strategisk design med "Impact Mapping"
Strategisk design med "Impact Mapping"Henning Spjelkavik
 
Smidig 2014 - Impact Mapping - Levér det som teller
Smidig 2014 - Impact Mapping - Levér det som tellerSmidig 2014 - Impact Mapping - Levér det som teller
Smidig 2014 - Impact Mapping - Levér det som tellerHenning Spjelkavik
 
Kart på FINN.no - Fra CGI til slippy map
Kart på FINN.no - Fra CGI til slippy mapKart på FINN.no - Fra CGI til slippy map
Kart på FINN.no - Fra CGI til slippy mapHenning Spjelkavik
 
Arena and TV-production - at IOF Open Technical Meeting in Lavarone 2014
Arena and TV-production - at IOF Open Technical Meeting in Lavarone 2014Arena and TV-production - at IOF Open Technical Meeting in Lavarone 2014
Arena and TV-production - at IOF Open Technical Meeting in Lavarone 2014Henning Spjelkavik
 
Jz2010 Hvordan enkel analyse kan øke stabiliteten og hastigheten
Jz2010 Hvordan enkel analyse kan øke stabiliteten og hastighetenJz2010 Hvordan enkel analyse kan øke stabiliteten og hastigheten
Jz2010 Hvordan enkel analyse kan øke stabiliteten og hastighetenHenning Spjelkavik
 
Fornebuløpet - Treningsprogram
Fornebuløpet - TreningsprogramFornebuløpet - Treningsprogram
Fornebuløpet - TreningsprogramHenning Spjelkavik
 
Verdistrømanalyse Smidig 2009
Verdistrømanalyse   Smidig 2009Verdistrømanalyse   Smidig 2009
Verdistrømanalyse Smidig 2009Henning Spjelkavik
 

More from Henning Spjelkavik (20)

Hles 2021 Digital transformation - How to use digital tools to improve our ev...
Hles 2021 Digital transformation - How to use digital tools to improve our ev...Hles 2021 Digital transformation - How to use digital tools to improve our ev...
Hles 2021 Digital transformation - How to use digital tools to improve our ev...
 
Digital techlunsj hos FINN.no 2020-06-10
Digital techlunsj hos FINN.no 2020-06-10Digital techlunsj hos FINN.no 2020-06-10
Digital techlunsj hos FINN.no 2020-06-10
 
10 years of microservices at finn.no - why is that dragon still here (ndc o...
10 years of microservices at finn.no  - why is that dragon still here  (ndc o...10 years of microservices at finn.no  - why is that dragon still here  (ndc o...
10 years of microservices at finn.no - why is that dragon still here (ndc o...
 
How FINN became somewhat search engine friendly @ Oslo SEO meetup 2018
How FINN became somewhat search engine friendly @ Oslo SEO meetup 2018How FINN became somewhat search engine friendly @ Oslo SEO meetup 2018
How FINN became somewhat search engine friendly @ Oslo SEO meetup 2018
 
An approach to it in a high level event - IOF HLES 2017
An  approach to it in a high level event - IOF HLES 2017An  approach to it in a high level event - IOF HLES 2017
An approach to it in a high level event - IOF HLES 2017
 
Smidig 2016 - Er ledelse verdifullt likevel?
Smidig 2016 - Er ledelse verdifullt likevel?Smidig 2016 - Er ledelse verdifullt likevel?
Smidig 2016 - Er ledelse verdifullt likevel?
 
Geomatikkdagene 2016 - Kart på FINN.no
Geomatikkdagene 2016 - Kart på FINN.noGeomatikkdagene 2016 - Kart på FINN.no
Geomatikkdagene 2016 - Kart på FINN.no
 
IT for Event Directors
IT for Event DirectorsIT for Event Directors
IT for Event Directors
 
Hvorfor vi bør brenne gammel management litteratur
Hvorfor vi bør brenne gammel management litteraturHvorfor vi bør brenne gammel management litteratur
Hvorfor vi bør brenne gammel management litteratur
 
How we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.noHow we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.no
 
HLES 2015 It in a high level event
HLES 2015 It in a high level eventHLES 2015 It in a high level event
HLES 2015 It in a high level event
 
Strategisk design med "Impact Mapping"
Strategisk design med "Impact Mapping"Strategisk design med "Impact Mapping"
Strategisk design med "Impact Mapping"
 
Smidig 2014 - Impact Mapping - Levér det som teller
Smidig 2014 - Impact Mapping - Levér det som tellerSmidig 2014 - Impact Mapping - Levér det som teller
Smidig 2014 - Impact Mapping - Levér det som teller
 
Kart på FINN.no - Fra CGI til slippy map
Kart på FINN.no - Fra CGI til slippy mapKart på FINN.no - Fra CGI til slippy map
Kart på FINN.no - Fra CGI til slippy map
 
Arena and TV-production - at IOF Open Technical Meeting in Lavarone 2014
Arena and TV-production - at IOF Open Technical Meeting in Lavarone 2014Arena and TV-production - at IOF Open Technical Meeting in Lavarone 2014
Arena and TV-production - at IOF Open Technical Meeting in Lavarone 2014
 
Misbruk av målstyring
Misbruk av målstyringMisbruk av målstyring
Misbruk av målstyring
 
Jz2010 Hvordan enkel analyse kan øke stabiliteten og hastigheten
Jz2010 Hvordan enkel analyse kan øke stabiliteten og hastighetenJz2010 Hvordan enkel analyse kan øke stabiliteten og hastigheten
Jz2010 Hvordan enkel analyse kan øke stabiliteten og hastigheten
 
Fornebuløpet - Brosjyre
Fornebuløpet - BrosjyreFornebuløpet - Brosjyre
Fornebuløpet - Brosjyre
 
Fornebuløpet - Treningsprogram
Fornebuløpet - TreningsprogramFornebuløpet - Treningsprogram
Fornebuløpet - Treningsprogram
 
Verdistrømanalyse Smidig 2009
Verdistrømanalyse   Smidig 2009Verdistrømanalyse   Smidig 2009
Verdistrømanalyse Smidig 2009
 

Recently uploaded

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Recently uploaded (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

101 ways to configure kafka - badly

  • 1. 101* ways to configure Kafka - badly Hva FINN.no har lært av å kjøre Kafka Audun Fauchald Strand Lead Developer Infrastructure @audunstrand bio: gof, mq, ejb, wli, bpel eda, soa esb, ddd, k8s Henning Spjelkavik Architect @spjelkavik bio: Skiinfo (Vail Resorts), FINN.no enjoys reading jstacks
  • 2. agenda Introduction to kafka kafka @ finn.no 101* mistakes questions “From a certain point onward there is no longer any turning back. That is the point that must be reached.” ― Franz Kafka, The Trial
  • 4. why use kafka #notAnESB what is a log terminology components giant leap “A First Sign of the Beginning of Understanding is the Wish to Die.” ― Franz Kafka https://commons.wikimedia.org/wiki/File:Kafka.jpg
  • 5. Why use Kafka? “Apache Kafka is publish-subscribe messaging rethought as a distributed commit log.” Fast Scalable Durable Distributed by design Sweet spot: High volume, low latency Quora: “Use Kafka if you have a fire hose of events (100k+/sec) you need delivered in partitioned order 'at least once' with a mix of online and batch consumers, you want to be able to re-read messages” “Use Rabbit if you have messages (20k+/sec) that need to be routed in complex ways to consumers, you want per- message delivery guarantees, you don't care about ordered delivery”
  • 6. What is a (data) log / journal? A log is perhaps the simplest possible storage abstraction. It is an append-only, totally-ordered sequence of records ordered by time. Appended to the end of the log, reads proceed left-to-right. Each entry is assigned a unique sequential log entry number. The ordering of records defines a notion of "time" since entries to the left are defined to be older then entries to the right. This is a data log, not an application log (i.e not log4j)
  • 7. Changelog 101: Tables and Events are Dual Duality: a log of changes and a table. Accounting log: credit and debit (events pr key) table: all current balances (i.e state pr key) In a sense the log is the more fundamental data structure: in addition to creating the original table you can also transform it to create all kinds of derived tables.
  • 8. producers writes to brokers consumers reads from brokers everything is distributed data is stored in topics topics are split into partitions which are replicated kafka cluster consumer producerproducer producer producer consumer consumer consumer consumer consumer producer producer terminology
  • 10. 1:data 2:... 3:... 4:... 5:... 6:... 7:... old messages newer messages ... consumer.group.1 consumer.group.2 producer a1
  • 11. Giant leap? In fact, persistent replicated messaging is such a giant leap in messaging architecture that it may be worthwhile to point out a few side effects: a. Per-message acknowledgments have disappeared b. ordered delivery c. The problem of mismatched consumer speed has disappeared. A slow consumer can peacefully co-exist with a fast consumer now d. Need for difficult messaging semantics like delayed delivery, re-delivery etc. has disappeared. Now it is all up to the consumer to read whatever message whenever - onus has shifted from broker to consumer e. The holy grail of message delivery guarantee: at-least-once is the new reality - both Kafka and Azure Event Hub provides this guarantee. You still have to make your consumers and downstream systems idempotent so that recovering from a failure and processing the same message twice does not upset it too much, but hey - that has always been the case http://blogs.msdn.com/b/opensourcemsft/archive/2015/08/08/choose-between-azure-event-hub-and-kafka-_2d00_-what- you-need-to-know.aspx
  • 13. Top 5 1. no consideration of data on the inside vs outside 2. schema not externally defined 3. same config for every client/topic 4. 128 partitions as default config 5. running on 8 overloaded nodes
  • 15. FINN.no https://finn.no/apply-here https://twitter.com/@FINN_tech http://tech.finn.no https://github.com/finn.no 60 millions pageviews a day 250-300 microservices 130 developers 1479 deployed changes into production last week 6 minutes from commit to deploy (median)
  • 16. #javazone @spjelkavik @audunstrand Schibsted Media Group 6800 people in 30 countries FINN.no is a part of 6900 employees 30 countries 200 million users jobs.schibsted.com
  • 19. #javazone @spjelkavik @audunstrand in the beginning ... Architecture governance board decided to use RabbitMQ as message queue. Kafka was installed for a proof of concept, after developers spotted it januar 2013.
  • 20. #javazone @spjelkavik @audunstrand 2013 - POC “High” volume Stream of classified ads Ad matching Ad indexed mod05 zk kafka mod07 zk kafka mod01 zk kafka mod03 zk kafka mod06 zk kafka mod08 zk kafka mod02 zk kafka mod04 zk kafka dc 1 dc 2 Version 0.8.1 4 partitions common client java library thrift
  • 21. #javazone @spjelkavik @audunstrand 2014 - Adoption and complaining low volume/ high reliability Ad Insert Product Orchestration Payment Build Pipeline click streams mod05 zk kafka mod07 zk kafka mod01 zk kafka mod03 zk kafka mod06 zk kafka mod08 zk kafka mod02 zk kafka mod04 zk kafka dc 1 dc 2 Version 0.8.1 4 partitions experimenting with configuration common java library
  • 23. #javazone @spjelkavik @audunstrand 2015 - Migration and consolidation “reliable messaging” asynchronous communication between services store and forward zipkin slack notifications dc 1 dc 2 Version 0.8.2 5-20 partitions multiple configurations broker05 zk kafka broker01 zk kafka broker03 zk kafka broker04 zk kafka broker02 zk kafka
  • 24. #javazone @spjelkavik @audunstrand tooling Grafana dashboard visualizing jmx stats via Prometheus Kafka-manager from Yahoo Kafka-cat cli
  • 25. #javazone @spjelkavik @audunstrand 2016 - Confluent broker01 broker05 Schema Registry kafka kafka broker03 kafka broker04 kafka broker02 kafka zk02 zk zk03 zk Rest Proxy zk01 zk platform schema registry data replication kafka connect kafka streams
  • 26. 101* mistakes “God gives the nuts, but he does not crack them.” ― Franz Kafka
  • 27. Pattern Language why is it a mistake what is the consequence what is the correct solution what has finn.no done
  • 28. Top 5 1. no consideration of data on the inside vs outside 2. schema not externally defined 3. same config for every client/topic 4. 128 partitions as default config 5. running on 8 overloaded nodes
  • 29. #javazone @spjelkavik @audunstrand mistake: no consideration of data on the inside vs outside https://flic.kr/p/6MjhUR
  • 30. #javazone @spjelkavik @audunstrand why is it a mistake everything published on Kafka (0.8.2) is visible to any client that can access
  • 31. #javazone @spjelkavik @audunstrand what is the consequence direct reads across services/domains is quite normal in legacy and/or enterprise systems coupling makes it hard to make changes unknown and unwanted coupling has a cost Kafka had no security per topic - you must add that yourself
  • 32. #javazone @spjelkavik @audunstrand what is the correct solution Consider what is data on the inside, versus data on the outside Convention for what is private data and what is public data If you want to change your internal representation often, map it before publishing it publicly (Anti corruption layer)
  • 33. #javazone @spjelkavik @audunstrand what has finn.no done Decided on a naming convention (i.e Public.xyzzy) for public topics Communicates the intention (contract)
  • 35. #javazone @spjelkavik @audunstrand why is it a mistake data and code needs separate versioning strategies version should be part of the data defining schema in a java library makes it more difficult to access data from non- jvm languages very little discoverability of data, people chose other means to get their data difficult to create tools
  • 36. #javazone @spjelkavik @audunstrand what is the consequence development speed outside jvm has been slow change of data needs coordinated deployment no process for data versioning, like backwards compatibility checks difficult to create tooling that needs to know data format, like data lake and database sinks
  • 37. #javazone @spjelkavik @audunstrand what is the correct solution confluent.io platform has a separate schema registry apache avro multiple compatibility settings and evolutions strategies connect more automatic tooling
  • 38. #javazone @spjelkavik @audunstrand what has finn.no done still using java library, with schemas in builders confluent platform 2.0 is planned for the next step, not (just) kafka 0.9
  • 39. #javazone @spjelkavik @audunstrand mistake: running mixed load with a single, default configuration https://flic.kr/p/qbarDR
  • 40. #javazone @spjelkavik @audunstrand why is it a mistake Historically - One Big Database with Expensive License Database world - OLTP and OLAP Changed with Open Source software and Cloud Tried to simplify the developer's day with a single config Kafka supports very high throughput and highly reliable
  • 41. #javazone @spjelkavik @audunstrand what is the consequence Trade off between throughput and degree of reliability With a single configuration - the last commit wins Either high throughput, and risk of loss - or potentially too slow
  • 42. #javazone @spjelkavik @audunstrand what is the correct solution Understand your use cases and their needs! Use proper pr topic configuration Consider splitting / isolation
  • 43. #javazone @spjelkavik @audunstrand Defaults that are quite reliable Exposing configuration variables in the client Ask the questions; at least once delivery ordering - if you partition, what must have strict ordering 99% delivery - is that good enough? what level of throughput is needed what has finn.no done
  • 44. #javazone @spjelkavik @audunstrand Configuration Configuration for production Partitions Replicas (default.replication.factor) Minimum ISR (min.insync.replicas) Wait for acknowledge when producing messages (request.required.acks, block.on.buffer.full) Retries Leader election Configuration for consumer Number of threads
  • 45. #javazone @spjelkavik @audunstrand Gwen Shapira recommends... akcs = all block.on.buffer.full = true retries = MAX_INT max.inflight.requests.per.connect = 1 Producer.close() replication-factor >= 3 min.insync.replicas = 2 unclean.leader.election = false
  • 46. #javazone @spjelkavik @audunstrand mistake: default configuration of 128 partitions for each topic https://flic.kr/p/6KxPgZ
  • 47. #javazone @spjelkavik @audunstrand why is it a mistake partitions are kafkas way of scaling consumers, 128 partitions can handle 128 consumer processes in 0.8; clusters could not reduce the number of partitions without deleting data highest number of consumers today is 20
  • 48. #javazone @spjelkavik @audunstrand what is the consequence our 0.8 cluster was configured with 128 partitions as default, for all topics. many partitions and many topics creates many datapoints that must be coordinated zookeeper must coordinate all this rebalance must balance all clients on all partitions zookeeper and kafka went down (may 2015) (500 topics * 128 partitions)
  • 49. #javazone @spjelkavik @audunstrand what is the correct solution small number of partitions as default increase number of partitions for selected topics understand your use case reduce length of transactions on consumer side partitions per topic: max(t/p, t/c) Max partitions on a broker 100 x brokers x replication factor => 1500 in our case http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/
  • 50. #javazone @spjelkavik @audunstrand what has finn.no done 5 partitions as default 4 topics have more than 5 partitions, topics with lots of traffic
  • 51. #javazone @spjelkavik @audunstrand mistake: deploy a proof of concept hack - in production ; i.e why we had 8 zk nodes https://flic.kr/p/6eoSgT
  • 52. #javazone @spjelkavik @audunstrand why is it a mistake Kafka was set up by Ops for a proof of concept - not for hardened production use By coincidence we had 8 nodes for kafka, the same 8 nodes for zookeeper Zookeeper is dependent on a majority quorum, low latency between nodes The 8 nodes were NOT dedicated - in fact - they were overloaded already
  • 53. #javazone @spjelkavik @audunstrand what is the consequence Zookeeper recommends 3 nodes for normal usage, 5 for high, and any more is questionable More nodes leads to longer time for finding consensus, more communication If we get a split between data centers, there will be 4 in each You should not run Zk between data centers, due to latency and outage possibilities
  • 54. #javazone @spjelkavik @audunstrand what is the correct solution Have an odd number of Zookeeper nodes - preferrably 3, at most 5 Don’t cross data centers Check the documentation before deploying serious production load Don’t run a sensitive service (Zookeeper) on a server with 50 jvm-based services, 300% over committed on RAM Watch GC times
  • 55. #javazone @spjelkavik @audunstrand what has finn.no done dc 1 dc 2 broker05 zk kafka broker01 zk kafka broker03 zk kafka broker04 zk kafka broker02 zk kafka Version 0.8.2 5-20 partitions multiple configurations
  • 56. #javazone @spjelkavik @audunstrand “They say ignorance is bliss.... they're wrong ” ― Franz Kafka
  • 57.
  • 58. “It's only because of their stupidity that they're able to be so sure of themselves.” ― Franz Kafka, The Trial Audun Fauchald Strand @audunstrand Henning Spjelkavik @spjelkavik Q? https://www.finn.no/apply-here https://tech.finn.no https://twitter.com/@FINN_tech https://github.com/finn.no http://www.schibsted.com/en/Career/
  • 59. #javazone @spjelkavik @audunstrand Runner up Using pre-1.0 software Have control of topic creation Kafka is storage - treat it like one also ops-wise Client side rebalancing Commiting on all consumer threads, believing that you only commited on one
  • 60. #javazone @spjelkavik @audunstrand References / Further reading Designing data intensive systems, Martin Kleppmann Data on the inside - data on the outside, Pat Helland The Confluent Blog, http://confluent.io/ Kafka - The definitive guide This presentation, in English: http://www.confluent.io/blog/the-top-sessions-from- kafka-summit-2016 www.finn.no/apply-here jobs.schibsted.com/ tech.finn.no twitter.com/@FINN_tech github.com/finn.no