SlideShare a Scribd company logo
1 of 48
Download to read offline
2021/11/14
Hojin Shim / Site Reliabilty Engineer
ELK Stack - Log 처리 속도 개선
요청량 평균 약 100만건/분, Log 가 밀리기 시작했다.
Various Logging Pipeline


Architecture Patterns
Logging Patterns
Well-known patterns
• Remote logging 

• File Logging & Cron backup

• Logging pipeline without stream

• Logging pipeline with stream
Logging Patterns
Remote Logging
App Somewhere
Logging over network
Ex)

Logback / log4j of java
DB, Storage, etc.
• Low risk of losing records

• High risk of lag / throughput
Logging Patterns
File Logging & Cron Backup
App
PutObject
S3
• High risk of losing records 

• It’s depends on deployment patterns

• Di
ffi
cult to analyse

• It’s simple
Cron
Disk volume
Logging Patterns
Logging Pipeline Patterns (w/o stream)
App
• Risk of high throughput

• Risk of losing records
Forwarder


(pre-
processor)
Disk volume
Forwarder


(Post-
processor)
Search Engine
Logging Patterns
Logging Pipeline Patterns (w/ stream)
App
• Low risk of high throughput

• Low risk of losing records 

• High cost
Forwarder


(pre-
processor)
Disk volume
Forwarder


(post-
processor)
Search Engine
Stream
Logging Patterns
ELK Stack (Elastic Stack)
App
• Low risk of high throughput & losing records

• High cost

• Requires deep & wide technical knowledge
Disk volume
Elasticsearch
MSK (Kafka)
Filebeat
Logstash
Kibana
&
$$$ $$$
Logging Lag
Increase logging
Elasticsearch
MSK (Kafka)
App
Dis
Fi
Logstash
App
Dis
Fi
App
Dis
Fi
Requests
Lag!!
Now
Lag
What is the problem?
So many things could be a reason
• Filebeat I/O problem

• Kafka performance problem

• Logstash slow ingestion / processing problem

• Elasticsearch performance problem

• etc
Measurement
Measurement
What to measure?
• Basic system
metrics

• Etc
• Basic system
metrics

• Burst balance

• Bandwidth throttling

• Lag per topics

• Etc
• Basic system
metrics

• Num of events
processed

• Etc
• Basic system
metrics

• Indexing rate /
latency

• Etc
Filebeat MSK
(Kafka) Logstash
Elasticsearch
Measurement
How to measure? (Based on my experience)
• Telegraf 

• In
fl
uxDB

• Grafana
• Cloudwatch

• Burrow /
Prometheus

• Elasticsearch

• Grafana

• Telegraf

• Elasticsearch

• Grafana
• Cloudwatch

• Grafana
Filebeat MSK
(Kafka) Logstash
Elasticsearch
Measurement
How to measure? (Based on my experience)
• Telegraf 

• In
fl
uxDB

• Grafana
• Cloudwatch

• Burrow /
Prometheus

• Elasticsearch

• Grafana

• Telegraf

• Elasticsearch

• Grafana
• Cloudwatch

• Grafana
Filebeat MSK
(Kafka) Logstash
Elasticsearch
Consumer Lag monitoring Logstash processing rate monitoring
Measurement
Consumer Lag
Measurement
Consumer-lag
https://www.lightbend.com/blog/monitor-kafka-consumer-group-latency-with-kafka-lag-exporter
Measurement
Consumer-lag measurement
• Kubernetes friendly way

• Open Monitoring with Prometheus 



• All the time available way (demo in this session)

• Burrow / Telegraf
Measurement
Burrow / Telegraf
• Burrow

• Open source developed by Linkedin

• Apache Kafka monitoring tool

• HTTP endpoint for information

• Telegraf

• Open source developed by In
fl
uxdata

• All purpose gathering metrics

• Plugin systems
Measurement
Consumer-lag measurement with Burrow
MSK
(Kafka)
Burrow / Telegraf
Elasticsearch Grafana
Burrow Telegraf
Measurement
Burrow con
fi
g code snippet
..
.

..
.

..
.

[zookeeper
]

servers=[ "z-3.elk.abc.kafka.ap-northeast-2.amazonaws.com:2181","z-2.elk.kafka.ap-northeast-2.amazonaws.com:2181",

"z-1.product-elk-msk-abc.kafka.ap-northeast-2.amazonaws.com:2181"
]

timeout=
6

root-path="/burrow
"

[consumer.product-elk
]

class-name="kafka
"

cluster="product-elk
"

servers=[ "b-2.elk.kafka.ap-northeast-2.amazonaws.com:9094","b-1.elk.kafka.ap-northeast-2.amazonaws.com:9094"
]

client-profile=“your_prpfile
”

group-denylist=“^(some-group-|python-kafka-consumer-|quick-).*$
"

group-allowlist="
"

[cluster.product-elk
]

class-name="kafka
"

servers=[ “b-2.elk.abc.kafka.ap-northeast-2.amazonaws.com:9094”,"b-1.elk.abc.kafka.ap-northeast-2.amazonaws.com:9094"
]

client-profile="test
"

topic-refresh=6
0

offset-refresh=3
0

[tls.msk-mTLS
]

cafile="/etc/burrow/truststore.pem
"

noverify=tru
e

..
.

..
.

..
.

If you use clients / brokers encryption
Your zookeeper endpoint
Your bootstrap server endpoint
Burrow con
fi
guration - /etc/burrow/burrow.toml
Measurement
Telegraf con
fi
g code snippet
[[inputs.burrow]
]

servers = [“https://your.burrow-endpoint.com”
]

topics_exclude = [ "__consumer_offsets"
]

groups_exclude = ["console-*"
]

[inputs.burrow.tags
]

burrow = "burrow
"

[[outputs.elasticsearch]
]

urls = [ “http://your-elasticsearch-endpoint:9200”
]

timeout = "5s
"

enable_sniffer = fals
e

health_check_interval = "10s
"

index_name = "burrow-%Y.%m.%d
"

manage_template = tru
e

template_name = "telegraf-burrow
"

[outputs.elasticsearch.tagpass
]

burrow = ["burrow"]
Use tag if you have another metrics
Filter metric by tags
telegraf con
fi
guration - /etc/telegraf/telegraf.d/burrow.conf
Measurement
Data from burrow index
Some Topic Name
Lag Information
Partition
Measurement
Visualization with Grafana
Some Topic Lag
Some
topic
Some
topic
Measurement
Logstash Processing Rate
Measurement
Visualizatoin with Timelion
input
{

kafka
{

bootstrap_servers => "b-2.elk.abc.kafka.ap-northeast-2.amazonaws.com:9094,b-1.elk.abc.kafka.ap-northeast-2.amazonaws.com:9094
"

topics_pattern => "*
"

consumer_threads =>
1

codec => "json
"

decorate_events => tru
e

group_id => "logstash
"

security_protocol => "SSL
"

ssl_truststore_location => "/logstash/kafka.client.truststore.jks
"

enable_auto_commit => "true
"

}

}

..
.

filter
{

..
.

metrics
{

meter => "events
"

add_tag => "metric
"

add_field =>
{

"lsname" => “some-logstash
”

}

}
}

...

output
{

else if "metric" in [tags]
{

elasticsearch
{

hosts => ["eskibana.prd.in.musinsa.com:9200"
]

index => "logstash-metric-%{+yyyy.MM.dd}
"

}

..
.

}

Add logstash metric
logstash pipeline con
fi
guration - ./logstash/pipeline/logstash.conf
Measurement
Data from burrow index
Some Logstash Name
Event processing rate 1m
Measurement
Visualizatoin with Timelion
Problems & Solves
Logstash grok performance
Logstash filter performance
grok grok grok!
• Some log message might cause parsing problem

• Some special characters

• Long log messages

• Etc
http://some-domain/app/product/goodsview_stats/1474978/0?
utm_source=naver_jisicshopping&utm_medium=sh&source=NVSH&NaPm=ct%3Dkvyxfobc%7Cci%3Dd4151183d55ce2828c56f84eb392eab7338b2026%7Ctr%3Dslct%7Csn%3D204973%7Chk
ab6de6182e50b01b182e15ae740bcb84ce&menu=view&3Dcee524ab6de6182e50b01b182e15ae740bcb84ce&q=b3Dcee524ab6de6182e50b01b182e15ae740bcb84ce.....................
Logstash filter performance
grok grok grok!
[2021-09-03T17:26:25,923][WARN ][logstash.filters.grok ][main]
[8c1ed634e6ffe7026b0a684399b6a4893634d376554d997095836bd11d71a1c7]


Timeout executing grok


'%{IPORHOST:[nginx][access][remote_ip]} ......................'
https://www.elastic.co/guide/en/logstash/current/plugins-
fi
lters-grok.html#plugins-
fi
lters-grok-timeout_millis
Logstash filter performance
grok grok grok!
...

...

..
.

filter
{

if [event][dataset] == "nginx.access"
{

grok
{

match => { "message" => ["%{IPORHOST:[nginx][access][remote_ip]} - ................”]
}

remove_field => "message
"

timeout_millis => 30
0

}

...

...

...
Add short grok parsing timeout
logstash pipeline con
fi
guration - ./logstash/pipeline/logstash.conf
Problems & Solves
Logstash pipeline & batch
Logstash pipeline & batch
Too many topics to ingest
• The number of workers and CPU cores

• How many messages fetch each time

• How long to wait for undersized batch
https://www.elastic.co/guide/en/logstash/6.8/logstash-settings-
fi
le.html#logstash-settings-
fi
le
Logstash pipeline & batch
Too many topics to ingest
• The number of workers and CPU cores

• Same as CPU cores or little more

• How many messages fetch each time

• Default value is 125, New value is 1000

• How long to wait for undersized batch
https://www.elastic.co/guide/en/logstash/6.8/logstash-settings-
fi
le.html#logstash-settings-
fi
le
- pipeline.id: mai
n

path.config: "/usr/share/logstash/pipeline
"

pipeline.workers:
4

pipeline.batch.size: 100
0

pipeline.batch.delay: 5
0

logstash con
fi
guration - logstash.yaml
Problems & Solves
Kafka Partitions
Kakfa Partitions
Unbalanced input messages. It’s natural.
Order Service
Auth Service
Inventory Service
Order Topic
Inventory Topic
Auth Topic
Less log message
Heavy log message
Same amount of log ingestion per each topic
High consumer-lag possibility
Increase a number of partitions
Kakfa Partitions
Wait. What is partitions?
https://medium.com/event-driven-utopia/understanding-kafka-topic-partitions-ae40f80552e8
Topic with one partition
Writes Injest
Partition 0
Kakfa Partitions
Wait. What is partitions?
https://medium.com/event-driven-utopia/understanding-kafka-topic-partitions-ae40f80552e8
Topic with multiple partition
Writes
Partition 0
Partition 1
Partition 2
Injest
Kakfa Partitions
Wait. What is partitions?
https://medium.com/event-driven-utopia/understanding-kafka-topic-partitions-ae40f80552e8
#!/bin/bas
h

## get topic
s

ZOOKEEPER=z-3.elk.abc.kafka.ap-northeast-2.amazonaws.com:218
1

bin/kafka-topics.sh --list --zookeeper $ZOOKEEPER > topiclist.txt
 

## increase partition
s

while read line; d
o

echo "$line
"

bin/kafka-topics.sh --zookeeper $ZOOKEEPER --alter --topic $line --partitions
3

sleep 1
;

done < topiclist.tx
t

• Increase partitions of all existing topics
...
default.replication.factor=
2

num.partitions=3
log.retention.hours = 4
8

delete.topic.enable=tru
e

...
• Increase partitions from Kafka default setting (this is no e
ff
ect on existing topics)
Kakfa Partitions
Partitions / Consumers
Topic with multiple partition
Writes
Partition 0
Partition 1
Partition 2
input
{

kafka
{

..
.

bootstrap_servers => "...
"

topics_pattern => "*
"

consumer_threads =>
1

..
.

}

}

https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html
Sequential injest
Injest
Kakfa Partitions
Partitions / Consumers
Topic with multiple partition
Writes
Partition 0
Partition 1
Partition 2
input
{

kafka
{

..
.

bootstrap_servers => "...
"

topics_pattern => "*
"

consumer_threads =>
3

..
.

}

}

https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html
Parallel injest
Injest
Kakfa Partitions
Partitions / Consumers
Topic with multiple partition
Writes
Partition 0
Partition 1
Partition 2
input
{

kafka
{

..
.

bootstrap_servers => "...
"

topics_pattern => "*
"

consumer_threads =>
1

..
.

}

}

https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html
Injest
Live demo
My architecture
ELK Stack (Elastic Stack)
Elasticsearch
MSK (Kafka)
A
Di
F
Logstash
A
Di
F
A
Di
F
A
Di
F
A
Di
F
Improve partition settings
S3
Improve grok parser


Increase consumers
Wrap-up
Wrap-up
• First of all, measure it!

• Log Forwarder (in my case Logstash)

• Improve parsing performance (grok)

• Increase number of forwarders

• Message Stream (in my case Kafka)

• Partitioning

More Related Content

What's hot

Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producerconfluent
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Storesconfluent
 
Untangling Cluster Management with Helix
Untangling Cluster Management with HelixUntangling Cluster Management with Helix
Untangling Cluster Management with HelixAmy W. Tang
 
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy FarkasVirtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy FarkasFlink Forward
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Monitoring Microservices
Monitoring MicroservicesMonitoring Microservices
Monitoring MicroservicesWeaveworks
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewenconfluent
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack PresentationAmr Alaa Yassen
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafkaJiangjie Qin
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Amazon Web Services
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database Systemconfluent
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkDatabricks
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersSATOSHI TAGOMORI
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka IntroductionAmita Mirajkar
 
카프카(kafka) 성능 테스트 환경 구축 (JMeter, ELK)
카프카(kafka) 성능 테스트 환경 구축 (JMeter, ELK)카프카(kafka) 성능 테스트 환경 구축 (JMeter, ELK)
카프카(kafka) 성능 테스트 환경 구축 (JMeter, ELK)Hyunmin Lee
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache KafkaAmir Sedighi
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
 

What's hot (20)

Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
 
kafka
kafkakafka
kafka
 
Untangling Cluster Management with Helix
Untangling Cluster Management with HelixUntangling Cluster Management with Helix
Untangling Cluster Management with Helix
 
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy FarkasVirtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Monitoring Microservices
Monitoring MicroservicesMonitoring Microservices
Monitoring Microservices
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack Presentation
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafka
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Helix talk at RelateIQ
Helix talk at RelateIQHelix talk at RelateIQ
Helix talk at RelateIQ
 
카프카(kafka) 성능 테스트 환경 구축 (JMeter, ELK)
카프카(kafka) 성능 테스트 환경 구축 (JMeter, ELK)카프카(kafka) 성능 테스트 환경 구축 (JMeter, ELK)
카프카(kafka) 성능 테스트 환경 구축 (JMeter, ELK)
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 

Similar to How to improve ELK log pipeline performance

YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixBrendan Gregg
 
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014Amazon Web Services
 
Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek PROIDEA
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackJakub Hajek
 
DOD 2016 - Stefan Thies - Monitoring and Log Management for Docker Swarm and...
 DOD 2016 - Stefan Thies - Monitoring and Log Management for Docker Swarm and... DOD 2016 - Stefan Thies - Monitoring and Log Management for Docker Swarm and...
DOD 2016 - Stefan Thies - Monitoring and Log Management for Docker Swarm and...PROIDEA
 
Comprehensive Monitoring for Docker
Comprehensive Monitoring for DockerComprehensive Monitoring for Docker
Comprehensive Monitoring for DockerChristian Beedgen
 
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica SarbuOSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica SarbuNETWAYS
 
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica SarbuOSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica SarbuNETWAYS
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Databricks
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly SolarWinds Loggly
 
Cloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and LogsCloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and LogsAltoros
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per SecondAmazon Web Services
 
Deploy secure, scalable, and highly available web apps with Azure Front Door ...
Deploy secure, scalable, and highly available web apps with Azure Front Door ...Deploy secure, scalable, and highly available web apps with Azure Front Door ...
Deploy secure, scalable, and highly available web apps with Azure Front Door ...Stamo Petkov
 
DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...
DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...
DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...Aman Kohli
 
Mcas log collector deck
Mcas log collector deckMcas log collector deck
Mcas log collector deckMatt Soseman
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...GetInData
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataGetInData
 

Similar to How to improve ELK log pipeline performance (20)

YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
 
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
 
Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic Stack
 
DOD 2016 - Stefan Thies - Monitoring and Log Management for Docker Swarm and...
 DOD 2016 - Stefan Thies - Monitoring and Log Management for Docker Swarm and... DOD 2016 - Stefan Thies - Monitoring and Log Management for Docker Swarm and...
DOD 2016 - Stefan Thies - Monitoring and Log Management for Docker Swarm and...
 
Comprehensive Monitoring for Docker
Comprehensive Monitoring for DockerComprehensive Monitoring for Docker
Comprehensive Monitoring for Docker
 
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica SarbuOSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
 
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica SarbuOSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica Sarbu
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0
 
Logging & Docker - Season 2
Logging & Docker - Season 2Logging & Docker - Season 2
Logging & Docker - Season 2
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
 
Cloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and LogsCloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and Logs
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
Tracer
TracerTracer
Tracer
 
Deploy secure, scalable, and highly available web apps with Azure Front Door ...
Deploy secure, scalable, and highly available web apps with Azure Front Door ...Deploy secure, scalable, and highly available web apps with Azure Front Door ...
Deploy secure, scalable, and highly available web apps with Azure Front Door ...
 
Monitoring and Log Management for
Monitoring and Log Management forMonitoring and Log Management for
Monitoring and Log Management for
 
DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...
DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...
DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...
 
Mcas log collector deck
Mcas log collector deckMcas log collector deck
Mcas log collector deck
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
 

Recently uploaded

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...Call girls in Ahmedabad High profile
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 

Recently uploaded (20)

★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 

How to improve ELK log pipeline performance

  • 1. 2021/11/14 Hojin Shim / Site Reliabilty Engineer ELK Stack - Log 처리 속도 개선 요청량 평균 약 100만건/분, Log 가 밀리기 시작했다.
  • 3. Logging Patterns Well-known patterns • Remote logging • File Logging & Cron backup • Logging pipeline without stream • Logging pipeline with stream
  • 4. Logging Patterns Remote Logging App Somewhere Logging over network Ex)
 Logback / log4j of java DB, Storage, etc. • Low risk of losing records • High risk of lag / throughput
  • 5. Logging Patterns File Logging & Cron Backup App PutObject S3 • High risk of losing records • It’s depends on deployment patterns • Di ffi cult to analyse • It’s simple Cron Disk volume
  • 6. Logging Patterns Logging Pipeline Patterns (w/o stream) App • Risk of high throughput • Risk of losing records Forwarder 
 (pre- processor) Disk volume Forwarder 
 (Post- processor) Search Engine
  • 7. Logging Patterns Logging Pipeline Patterns (w/ stream) App • Low risk of high throughput • Low risk of losing records • High cost Forwarder 
 (pre- processor) Disk volume Forwarder 
 (post- processor) Search Engine Stream
  • 8. Logging Patterns ELK Stack (Elastic Stack) App • Low risk of high throughput & losing records • High cost • Requires deep & wide technical knowledge Disk volume Elasticsearch MSK (Kafka) Filebeat Logstash Kibana & $$$ $$$
  • 12. What is the problem? So many things could be a reason • Filebeat I/O problem • Kafka performance problem • Logstash slow ingestion / processing problem • Elasticsearch performance problem • etc
  • 14. Measurement What to measure? • Basic system metrics • Etc • Basic system metrics • Burst balance • Bandwidth throttling • Lag per topics • Etc • Basic system metrics • Num of events processed • Etc • Basic system metrics • Indexing rate / latency • Etc Filebeat MSK (Kafka) Logstash Elasticsearch
  • 15. Measurement How to measure? (Based on my experience) • Telegraf • In fl uxDB • Grafana • Cloudwatch • Burrow / Prometheus • Elasticsearch • Grafana • Telegraf • Elasticsearch • Grafana • Cloudwatch • Grafana Filebeat MSK (Kafka) Logstash Elasticsearch
  • 16. Measurement How to measure? (Based on my experience) • Telegraf • In fl uxDB • Grafana • Cloudwatch • Burrow / Prometheus • Elasticsearch • Grafana • Telegraf • Elasticsearch • Grafana • Cloudwatch • Grafana Filebeat MSK (Kafka) Logstash Elasticsearch Consumer Lag monitoring Logstash processing rate monitoring
  • 19. Measurement Consumer-lag measurement • Kubernetes friendly way • Open Monitoring with Prometheus 
 
 • All the time available way (demo in this session) • Burrow / Telegraf
  • 20. Measurement Burrow / Telegraf • Burrow • Open source developed by Linkedin • Apache Kafka monitoring tool • HTTP endpoint for information
 • Telegraf • Open source developed by In fl uxdata • All purpose gathering metrics • Plugin systems
  • 21. Measurement Consumer-lag measurement with Burrow MSK (Kafka) Burrow / Telegraf Elasticsearch Grafana Burrow Telegraf
  • 22. Measurement Burrow con fi g code snippet .. . .. . .. . [zookeeper ] servers=[ "z-3.elk.abc.kafka.ap-northeast-2.amazonaws.com:2181","z-2.elk.kafka.ap-northeast-2.amazonaws.com:2181",
 "z-1.product-elk-msk-abc.kafka.ap-northeast-2.amazonaws.com:2181" ] timeout= 6 root-path="/burrow " [consumer.product-elk ] class-name="kafka " cluster="product-elk " servers=[ "b-2.elk.kafka.ap-northeast-2.amazonaws.com:9094","b-1.elk.kafka.ap-northeast-2.amazonaws.com:9094" ] client-profile=“your_prpfile ” group-denylist=“^(some-group-|python-kafka-consumer-|quick-).*$ " group-allowlist=" " [cluster.product-elk ] class-name="kafka " servers=[ “b-2.elk.abc.kafka.ap-northeast-2.amazonaws.com:9094”,"b-1.elk.abc.kafka.ap-northeast-2.amazonaws.com:9094" ] client-profile="test " topic-refresh=6 0 offset-refresh=3 0 [tls.msk-mTLS ] cafile="/etc/burrow/truststore.pem " noverify=tru e .. . .. . .. . If you use clients / brokers encryption Your zookeeper endpoint Your bootstrap server endpoint Burrow con fi guration - /etc/burrow/burrow.toml
  • 23. Measurement Telegraf con fi g code snippet [[inputs.burrow] ] servers = [“https://your.burrow-endpoint.com” ] topics_exclude = [ "__consumer_offsets" ] groups_exclude = ["console-*" ] [inputs.burrow.tags ] burrow = "burrow " [[outputs.elasticsearch] ] urls = [ “http://your-elasticsearch-endpoint:9200” ] timeout = "5s " enable_sniffer = fals e health_check_interval = "10s " index_name = "burrow-%Y.%m.%d " manage_template = tru e template_name = "telegraf-burrow " [outputs.elasticsearch.tagpass ] burrow = ["burrow"] Use tag if you have another metrics Filter metric by tags telegraf con fi guration - /etc/telegraf/telegraf.d/burrow.conf
  • 24. Measurement Data from burrow index Some Topic Name Lag Information Partition
  • 25. Measurement Visualization with Grafana Some Topic Lag Some topic Some topic
  • 27. Measurement Visualizatoin with Timelion input { kafka { bootstrap_servers => "b-2.elk.abc.kafka.ap-northeast-2.amazonaws.com:9094,b-1.elk.abc.kafka.ap-northeast-2.amazonaws.com:9094 " topics_pattern => "* " consumer_threads => 1 codec => "json " decorate_events => tru e group_id => "logstash " security_protocol => "SSL " ssl_truststore_location => "/logstash/kafka.client.truststore.jks " enable_auto_commit => "true " } } .. . filter { .. . metrics { meter => "events " add_tag => "metric " add_field => { "lsname" => “some-logstash ” } } } ...
 output { else if "metric" in [tags] { elasticsearch { hosts => ["eskibana.prd.in.musinsa.com:9200" ] index => "logstash-metric-%{+yyyy.MM.dd} " } .. . } Add logstash metric logstash pipeline con fi guration - ./logstash/pipeline/logstash.conf
  • 28. Measurement Data from burrow index Some Logstash Name Event processing rate 1m
  • 30. Problems & Solves Logstash grok performance
  • 31. Logstash filter performance grok grok grok! • Some log message might cause parsing problem • Some special characters • Long log messages • Etc http://some-domain/app/product/goodsview_stats/1474978/0? utm_source=naver_jisicshopping&utm_medium=sh&source=NVSH&NaPm=ct%3Dkvyxfobc%7Cci%3Dd4151183d55ce2828c56f84eb392eab7338b2026%7Ctr%3Dslct%7Csn%3D204973%7Chk ab6de6182e50b01b182e15ae740bcb84ce&menu=view&3Dcee524ab6de6182e50b01b182e15ae740bcb84ce&q=b3Dcee524ab6de6182e50b01b182e15ae740bcb84ce.....................
  • 32. Logstash filter performance grok grok grok! [2021-09-03T17:26:25,923][WARN ][logstash.filters.grok ][main] [8c1ed634e6ffe7026b0a684399b6a4893634d376554d997095836bd11d71a1c7] 
 Timeout executing grok 
 '%{IPORHOST:[nginx][access][remote_ip]} ......................' https://www.elastic.co/guide/en/logstash/current/plugins- fi lters-grok.html#plugins- fi lters-grok-timeout_millis
  • 33. Logstash filter performance grok grok grok! ...
 ...
 .. . filter { if [event][dataset] == "nginx.access" { grok { match => { "message" => ["%{IPORHOST:[nginx][access][remote_ip]} - ................”] } remove_field => "message " timeout_millis => 30 0 } ...
 ...
 ... Add short grok parsing timeout logstash pipeline con fi guration - ./logstash/pipeline/logstash.conf
  • 34. Problems & Solves Logstash pipeline & batch
  • 35. Logstash pipeline & batch Too many topics to ingest • The number of workers and CPU cores • How many messages fetch each time • How long to wait for undersized batch https://www.elastic.co/guide/en/logstash/6.8/logstash-settings- fi le.html#logstash-settings- fi le
  • 36. Logstash pipeline & batch Too many topics to ingest • The number of workers and CPU cores • Same as CPU cores or little more • How many messages fetch each time • Default value is 125, New value is 1000 • How long to wait for undersized batch https://www.elastic.co/guide/en/logstash/6.8/logstash-settings- fi le.html#logstash-settings- fi le - pipeline.id: mai n path.config: "/usr/share/logstash/pipeline " pipeline.workers: 4 pipeline.batch.size: 100 0 pipeline.batch.delay: 5 0 logstash con fi guration - logstash.yaml
  • 38. Kakfa Partitions Unbalanced input messages. It’s natural. Order Service Auth Service Inventory Service Order Topic Inventory Topic Auth Topic Less log message Heavy log message Same amount of log ingestion per each topic High consumer-lag possibility Increase a number of partitions
  • 39. Kakfa Partitions Wait. What is partitions? https://medium.com/event-driven-utopia/understanding-kafka-topic-partitions-ae40f80552e8 Topic with one partition Writes Injest Partition 0
  • 40. Kakfa Partitions Wait. What is partitions? https://medium.com/event-driven-utopia/understanding-kafka-topic-partitions-ae40f80552e8 Topic with multiple partition Writes Partition 0 Partition 1 Partition 2 Injest
  • 41. Kakfa Partitions Wait. What is partitions? https://medium.com/event-driven-utopia/understanding-kafka-topic-partitions-ae40f80552e8 #!/bin/bas h ## get topic s ZOOKEEPER=z-3.elk.abc.kafka.ap-northeast-2.amazonaws.com:218 1 bin/kafka-topics.sh --list --zookeeper $ZOOKEEPER > topiclist.txt ## increase partition s while read line; d o echo "$line " bin/kafka-topics.sh --zookeeper $ZOOKEEPER --alter --topic $line --partitions 3 sleep 1 ; done < topiclist.tx t • Increase partitions of all existing topics ... default.replication.factor= 2 num.partitions=3 log.retention.hours = 4 8 delete.topic.enable=tru e ... • Increase partitions from Kafka default setting (this is no e ff ect on existing topics)
  • 42. Kakfa Partitions Partitions / Consumers Topic with multiple partition Writes Partition 0 Partition 1 Partition 2 input { kafka { .. . bootstrap_servers => "... " topics_pattern => "* " consumer_threads => 1 .. . } } https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html Sequential injest Injest
  • 43. Kakfa Partitions Partitions / Consumers Topic with multiple partition Writes Partition 0 Partition 1 Partition 2 input { kafka { .. . bootstrap_servers => "... " topics_pattern => "* " consumer_threads => 3 .. . } } https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html Parallel injest Injest
  • 44. Kakfa Partitions Partitions / Consumers Topic with multiple partition Writes Partition 0 Partition 1 Partition 2 input { kafka { .. . bootstrap_servers => "... " topics_pattern => "* " consumer_threads => 1 .. . } } https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html Injest
  • 46. My architecture ELK Stack (Elastic Stack) Elasticsearch MSK (Kafka) A Di F Logstash A Di F A Di F A Di F A Di F Improve partition settings S3 Improve grok parser Increase consumers
  • 48. Wrap-up • First of all, measure it!
 • Log Forwarder (in my case Logstash) • Improve parsing performance (grok) • Increase number of forwarders
 • Message Stream (in my case Kafka) • Partitioning