SlideShare a Scribd company logo
1 of 38
Choose Right Stream Storage:
Kinesis Data Streams vs MSK
Sungmin, Kim
Solutions Architect, AWS
Agenda
• Key Components of Real-time Analytics
• Anatomy of Amazon Kinesis Data Streams and MSK
• Comparing Amazon Kinesis Data Streams to MSK
• Monitoring Metrics
• Reference Architecture
• Key Takeaways
Key Components of Real-time
Analytics
From Batch to Real-time:
Lambda Architecture
Data
Source
Stream
Storage
Speed Layer
Batch Layer
Batch
Process
Batch
View
Real-
time
View
Consumer
Query & Merge
Results
Service Layer
Stream
Ingestion
Raw Data
Storage
Streaming Data
Stream
Delivery
Stream
Process
Lambda Architecture
Streaming
Data
Batch View
Stream Process
Real-time
View
Query
Query
Batch View
Real-time
View
Raw Data
Batch Process
Batch Layer Serving Layer
Speed Layer
Key Components of Real-time Analytics
Data
Source
Stream
Storage
Stream
Process
Stream
Ingestion
Data
Sink
Devices and/or
applications that
produce real-time
data at high
velocity
Data from tens of
thousands of data
sources can be
written to a single
stream
Data are stored in the
order they were
received for a set
duration of time and
can be replayed
indefinitely during that
time
Records are read in
the order they are
produced, enabling
real-time analytics
or streaming ETL
Data lake
(most common)
Database
(least common)
Stream Storage
Data
Source
Stream
Storage
Stream
Process
Stream
Ingestion
Data
Sink
Amazon Kinesis Data
Streams
Amazon Managed
Streaming for Kafka
Anatomy of Amazon Kinesis
Data Streams and MSK
Key Features of Kinesis Data Streams and MSK
• Distributed Queue • Stream Storage
#Queue #Distributed #Storage
Consumer
oldest data
newest data
5 4 3 2 1 0
3 2 1 0
#Queue: FIFO, Scale-Up vs Scale-Out
5 4
4 3 2 1 0
5
Producers
Hash
Function
Consumer
oldest data
newest data
Producers
shard/partition-1
shard/partition-2
3 2 1 0
5 4 3 2 1 0
4 3 2 1 0
shard/partition-3
#Distributed: Scale-Out
Consumer
Consumer
0
Consumer Group
4 3 2 1 0
Hash
Function
Consumer
Consumer
Consumer
Consumer Group
= next consumer offset oldest data
newest data
Producers
shard/partition-1
shard/partition-2
5 4 3 2 1 0
3 2 1 0
4 3 2 1 0
shard/partition-3
#Storage: Stream Buffer
2 1 0
4 3 2 1 0
0
Hash
Function
Consumer
Consumer
Consumer
Consumer Group
= next consumer offset oldest data
newest data
Amazon Kinesis
Data Streams
Amazon Managed
Streaming for Kafka
Producers
shard/partition-1
shard/partition-2
5 4 3 2 1 0
3 2 1 0
4 3 2 1 0
shard/partition-3
Anatomy of
Benefits of Stream Storage
• Decouple producers &
consumers
• Persistent buffer
• Collect multiple streams
• Preserve client ordering
• Parallel consumption
• Streaming MapReduce
Comparing Amazon Kinesis
Data Streams to MSK
Topi
c
Amazon Kinesis Data
Streams
Amazon Managed Streaming
for Kafka
Comparing Kinesis Data Streams to MSK
Amazon Kinesis Data
Streams
Amazon Managed Streaming
for Kafka
• Operational Perspective
• Number of clusters?
• Number of brokers per cluster?
• Number of topics per broker?
• Number of partitions per topic?
• Cluster provisioning model
• Only increase number of partitions;
can’t decrease
• Integration with a few of AWS Services
such as Kinesis Data Analytics for
Apache Flink
• Operational Perspective
• Number of Kinesis Data Streams?
• Number of shards per stream?
• Throughput provisioning model
• Increase/Decrease number of shards
• Fully Integration with AWS Services
such as Lambda function, Kinesis Data
Analytics, etc
Monitoring Metrics
RequestQueue
- Length
- WaitTime
ResponseQueue
- Length
- WaitTime
Network
- Packet Drop?
Produce/Consume Rate Unbalance
Who is Leader? Disk Full?
Too many topics?
Metrics to Monitor: MSK (Kafka)
Metrics to Monitor: MSK (Kafka)
Metric Level Description
ActiveControllerCount DEFAULT Only one controller per cluster should be active at any given time.
OfflinePartitionsCount DEFAULT Total number of partitions that are offline in the cluster.
GlobalPartitionCount DEFAULT Total number of partitions across all brokers in the cluster.
GlobalTopicCount DEFAULT Total number of topics across all brokers in the cluster.
KafkaAppLogsDiskUsed DEFAULT The percentage of disk space used for application logs.
KafkaDataLogsDiskUsed DEFAULT The percentage of disk space used for data logs.
RootDiskUsed DEFAULT The percentage of the root disk used by the broker.
PartitionCount PER_BROKER The number of partitions for the broker.
LeaderCount PER_BROKER The number of leader replicas.
UnderMinIsrPartitionCount PER_BROKER The number of under minIsr partitions for the broker.
UnderReplicatedPartitions PER_BROKER The number of under-replicated partitions for the broker.
FetchConsumerTotalTimeMsMean PER_BROKER The mean total time in milliseconds that consumers spend on
fetching data from the broker.
ProduceTotalTimeMsMean PER_BROKER The mean produce time in milliseconds.
How about monitoring Kinesis Data Streams?
How long time does a record stay in a shard?
5 transactions
per second,
per shard
With only one
consumer application,
records can be
retrieved every 200 ms
up to 1MB or 1,000
records per seconds,
per shard for writes
• 10MB per second, per shard
• up to 10,000 records per call
Consumer
Application
GetRecords()
Data
Metrics to Monitor: Kinesis Data Streams
Metric Description
GetRecords.IteratorAgeMilliseconds Age of the last record in all GetRecords
ReadProvisionedThroughputExceeded Number of GetRecords calls throttled
WriteProvisionedThroughputExceeded Number of PutRecord(s) calls throttled
PutRecord.Success, PutRecords.Success Number of successful PutRecord(s) operations
GetRecords.Success Number of successful GetRecords operations
Choosing Right Metrics
Too Much = Useless = Too Little
Kafka vs MSK vs Kinesis Data Streams
Operational
Excellence
Kinesis Data
Streams
Kafka
Amazon MSK
Degree of Freedom
≈ Complexity
Comparison Summary
Attribute Apache Kafka Kinesis Streams Managed Streaming for Kafka
Cost $$$ $ (pay for what you use) $$ (pay for infrastructure)
Ease of use Advanced setup required Get started in minutes Get started in minutes
Management Overhead High Low Low
Scalability Difficult to scale
Scale in seconds with one
click
Scale in minutes with one click
Throughput Infinite
Scales with shards, supports
up to 1mb payloads
Infinite
Durability Configurable 3x by default Configurable
Infrastructure You manage AWS manages AWS manages
Write-to-Read Latency <100 ms is achievable <100 ms (with HTTP/2) <100 ms is achievable
Open Sourced? Yes No Yes
Reference Architecture
Data Hub: (Asynchronous) Event-Bus
Kinesis
Data Streams
Kinesis
Data Firehose
Amazon S3
Amazon EC2
AWS Lambda
Amazon ECS
Kinesis
Data Analytics
Amazon ES
Amazon Athena
Amazon CloudWatch
https://aws.amazon.com/solutions/case-studies/autodesk-log-analytics/
Example Usage Pattern 1: Data Hub
AmazonM
SK
Log Aggregation
Web servers access log
Aggregated logs
Example Usage Pattern 2: Web Analytics
and Leaderboards
Amazon
DynamoDB
Amazon Kinesis
Data Analytics
Amazon Kinesis
Data Streams
Amazon
Cognito
Lightweight JS
client code
Web server on
Amazon EC2
OR
Compute top 10 users
Ingest web app data Persist to feed live apps
Lambda
function
https://aws.amazon.com/solutions/implementations/real-time-web-analytics-with-kinesis/
Amazon MSK
IoT
IoT
Things
Remote Control
Prediction/
Fraud Detection
Device Monitoring
Quality Control
https://aws.amazon.com/blogs/aws/new-serverless-streaming-etl-with-aws-glue/
Example Usage Pattern 3: Monitoring IoT
Devices
Ingest sensor data
Convert json
to parquet
Store all data points
in an S3 data lake
AWS IoT
Core
IoT rule
AWS Glue
Streaming Job
Amazon Athena
Glue
Crawler
Glue Data
Catalog
S3
Bucket
AWS Cloud
MQTT
Topic
Amazon Kinesis
Data Streams
Raspberry PI
+ Sense HAT
Event Sourcing and CQRS
https://www.confluent.io/blog/event-sourcing-cqrs-stream-processing-apache-kafka-whats-connection/
App Write Interface App Read Interface
Event Queue
Application
State
Kafka Streams
Topology
Kafka Topic
Event Handler
App Write Interface App Read Interface
Kafka
Streams
State Store
Event Store
Event Handler + App State
Event Store
Amazon Kinesis
Data Streams
Amazon Kinesis
Data Analytics
(SQL)
Example Usage Pattern 4: Streaming SQL
Continuous filter
Aggregate function
Data enrichment (join)
S3 Bucket
Anomaly Detection
Ticker, Company
AMZN, Amazon
ASD, SomeCompanyA
BAC, SomeCompanyB
CRM, SomeCompanyC
Event Store
https://docs.aws.amazon.com/kinesisanalytics/latest/dev/examples.html
App Write Interface App Read Interface
{"TICKER_SYMBOL": "CVB",
"SECTOR": "TECHNOLOGY",
"CHANGE": 0.81,
"PRICE": 53.63}
{"TICKER_SYMBOL": "ABC",
"SECTOR": "RETAIL",
"CHANGE": -1.14,
"PRICE": 23.64}
{"TICKER_SYMBOL": "JKL",
"SECTOR": "TECHNOLOGY",
"CHANGE": 0.22,
"PRICE": 15.32}
Event Handler
+ App State join
Takeaways
Lambda
Kappa
Lambda vs Kappa Architecture
Key Takeaways
• Distributed Queue as Stream Storage
• Preserve Ordering
• Parallel Consumption
• Persistent Buffer
• Decouple producers & consumers
• Trade-off: Operational Excellence vs Degree of Freedom
• MUST keep an eye on the right monitoring metrics
• Architectural Patterns
• Data Hub: (Asynchronous) Event-Bus
• Log Aggregation
• IoT
• Event Sourcing and CQRS
Where To Go Next?
• Amazon MSK Labs
https://amazonmsk-labs.workshop.aws/
• Amazon Managed Streaming for Kafka: Best Practices
https://docs.aws.amazon.com/msk/latest/developerguide/bestpractices.html
• Monitoring Kafka performance metrics (2020-04-16)
https://tinyurl.com/y6hrhwbq
• Apache Kafka 모니터링을 위한 Metrics 이해 및 최적화 방안 (2018-11)
https://tinyurl.com/y4uwyenx
• AWS Analytics Immersion Day - Build BI System from Scratch
• Workshop - https://tinyurl.com/yapgwv77
• Slides - https://tinyurl.com/ybxkb74b
• Realtime Analytics on AWS
https://tinyurl.com/y3evwm3v
• Writing SQL on Streaming Data with Amazon Kinesis Analytics – Part 1, 2
• Part1 - https://tinyurl.com/y8vo8q7o
• Part2 - https://tinyurl.com/ycbv7wel

More Related Content

Similar to Amazon Kinesis Data Streams Vs Msk (1).pptx

AWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis WebinarAWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis WebinarAmazon Web Services
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014Amazon Web Services
 
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017Amazon Web Services
 
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017Amazon Web Services
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesAmazon Web Services
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time AnalyticsAmazon Web Services
 
Real Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS LambdaReal Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS LambdaAmazon Web Services
 
Processamento em tempo real usando AWS - padrões e casos de uso
Processamento em tempo real usando AWS - padrões e casos de usoProcessamento em tempo real usando AWS - padrões e casos de uso
Processamento em tempo real usando AWS - padrões e casos de usoAmazon Web Services LATAM
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015Amazon Web Services Korea
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
SMC303 Real-time Data Processing Using AWS Lambda
SMC303 Real-time Data Processing Using AWS LambdaSMC303 Real-time Data Processing Using AWS Lambda
SMC303 Real-time Data Processing Using AWS LambdaAmazon Web Services
 
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Amazon Web Services
 
Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteRoger Barga
 
Em tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosEm tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosAmazon Web Services LATAM
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesAmazon Web Services
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesAmazon Web Services
 
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...Amazon Web Services
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysisAmazon Web Services
 
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017Amazon Web Services
 
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...Sungmin Kim
 

Similar to Amazon Kinesis Data Streams Vs Msk (1).pptx (20)

AWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis WebinarAWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis Webinar
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
 
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
 
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Real Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS LambdaReal Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS Lambda
 
Processamento em tempo real usando AWS - padrões e casos de uso
Processamento em tempo real usando AWS - padrões e casos de usoProcessamento em tempo real usando AWS - padrões e casos de uso
Processamento em tempo real usando AWS - padrões e casos de uso
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
SMC303 Real-time Data Processing Using AWS Lambda
SMC303 Real-time Data Processing Using AWS LambdaSMC303 Real-time Data Processing Using AWS Lambda
SMC303 Real-time Data Processing Using AWS Lambda
 
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
 
Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 Keynote
 
Em tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosEm tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dados
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best Practices
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best Practices
 
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysis
 
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017
 
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
 

Recently uploaded

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Amazon Kinesis Data Streams Vs Msk (1).pptx

  • 1. Choose Right Stream Storage: Kinesis Data Streams vs MSK Sungmin, Kim Solutions Architect, AWS
  • 2. Agenda • Key Components of Real-time Analytics • Anatomy of Amazon Kinesis Data Streams and MSK • Comparing Amazon Kinesis Data Streams to MSK • Monitoring Metrics • Reference Architecture • Key Takeaways
  • 3. Key Components of Real-time Analytics
  • 4. From Batch to Real-time: Lambda Architecture Data Source Stream Storage Speed Layer Batch Layer Batch Process Batch View Real- time View Consumer Query & Merge Results Service Layer Stream Ingestion Raw Data Storage Streaming Data Stream Delivery Stream Process
  • 5. Lambda Architecture Streaming Data Batch View Stream Process Real-time View Query Query Batch View Real-time View Raw Data Batch Process Batch Layer Serving Layer Speed Layer
  • 6. Key Components of Real-time Analytics Data Source Stream Storage Stream Process Stream Ingestion Data Sink Devices and/or applications that produce real-time data at high velocity Data from tens of thousands of data sources can be written to a single stream Data are stored in the order they were received for a set duration of time and can be replayed indefinitely during that time Records are read in the order they are produced, enabling real-time analytics or streaming ETL Data lake (most common) Database (least common)
  • 8. Anatomy of Amazon Kinesis Data Streams and MSK
  • 9. Key Features of Kinesis Data Streams and MSK • Distributed Queue • Stream Storage #Queue #Distributed #Storage
  • 10. Consumer oldest data newest data 5 4 3 2 1 0 3 2 1 0 #Queue: FIFO, Scale-Up vs Scale-Out 5 4 4 3 2 1 0 5 Producers
  • 11. Hash Function Consumer oldest data newest data Producers shard/partition-1 shard/partition-2 3 2 1 0 5 4 3 2 1 0 4 3 2 1 0 shard/partition-3 #Distributed: Scale-Out Consumer Consumer 0 Consumer Group 4 3 2 1 0
  • 12. Hash Function Consumer Consumer Consumer Consumer Group = next consumer offset oldest data newest data Producers shard/partition-1 shard/partition-2 5 4 3 2 1 0 3 2 1 0 4 3 2 1 0 shard/partition-3 #Storage: Stream Buffer 2 1 0 4 3 2 1 0 0
  • 13. Hash Function Consumer Consumer Consumer Consumer Group = next consumer offset oldest data newest data Amazon Kinesis Data Streams Amazon Managed Streaming for Kafka Producers shard/partition-1 shard/partition-2 5 4 3 2 1 0 3 2 1 0 4 3 2 1 0 shard/partition-3 Anatomy of
  • 14. Benefits of Stream Storage • Decouple producers & consumers • Persistent buffer • Collect multiple streams • Preserve client ordering • Parallel consumption • Streaming MapReduce
  • 16. Topi c Amazon Kinesis Data Streams Amazon Managed Streaming for Kafka Comparing Kinesis Data Streams to MSK
  • 17. Amazon Kinesis Data Streams Amazon Managed Streaming for Kafka • Operational Perspective • Number of clusters? • Number of brokers per cluster? • Number of topics per broker? • Number of partitions per topic? • Cluster provisioning model • Only increase number of partitions; can’t decrease • Integration with a few of AWS Services such as Kinesis Data Analytics for Apache Flink • Operational Perspective • Number of Kinesis Data Streams? • Number of shards per stream? • Throughput provisioning model • Increase/Decrease number of shards • Fully Integration with AWS Services such as Lambda function, Kinesis Data Analytics, etc
  • 19. RequestQueue - Length - WaitTime ResponseQueue - Length - WaitTime Network - Packet Drop? Produce/Consume Rate Unbalance Who is Leader? Disk Full? Too many topics? Metrics to Monitor: MSK (Kafka)
  • 20. Metrics to Monitor: MSK (Kafka) Metric Level Description ActiveControllerCount DEFAULT Only one controller per cluster should be active at any given time. OfflinePartitionsCount DEFAULT Total number of partitions that are offline in the cluster. GlobalPartitionCount DEFAULT Total number of partitions across all brokers in the cluster. GlobalTopicCount DEFAULT Total number of topics across all brokers in the cluster. KafkaAppLogsDiskUsed DEFAULT The percentage of disk space used for application logs. KafkaDataLogsDiskUsed DEFAULT The percentage of disk space used for data logs. RootDiskUsed DEFAULT The percentage of the root disk used by the broker. PartitionCount PER_BROKER The number of partitions for the broker. LeaderCount PER_BROKER The number of leader replicas. UnderMinIsrPartitionCount PER_BROKER The number of under minIsr partitions for the broker. UnderReplicatedPartitions PER_BROKER The number of under-replicated partitions for the broker. FetchConsumerTotalTimeMsMean PER_BROKER The mean total time in milliseconds that consumers spend on fetching data from the broker. ProduceTotalTimeMsMean PER_BROKER The mean produce time in milliseconds.
  • 21. How about monitoring Kinesis Data Streams? How long time does a record stay in a shard? 5 transactions per second, per shard With only one consumer application, records can be retrieved every 200 ms up to 1MB or 1,000 records per seconds, per shard for writes • 10MB per second, per shard • up to 10,000 records per call Consumer Application GetRecords() Data
  • 22. Metrics to Monitor: Kinesis Data Streams Metric Description GetRecords.IteratorAgeMilliseconds Age of the last record in all GetRecords ReadProvisionedThroughputExceeded Number of GetRecords calls throttled WriteProvisionedThroughputExceeded Number of PutRecord(s) calls throttled PutRecord.Success, PutRecords.Success Number of successful PutRecord(s) operations GetRecords.Success Number of successful GetRecords operations
  • 23. Choosing Right Metrics Too Much = Useless = Too Little
  • 24. Kafka vs MSK vs Kinesis Data Streams Operational Excellence Kinesis Data Streams Kafka Amazon MSK Degree of Freedom ≈ Complexity
  • 25. Comparison Summary Attribute Apache Kafka Kinesis Streams Managed Streaming for Kafka Cost $$$ $ (pay for what you use) $$ (pay for infrastructure) Ease of use Advanced setup required Get started in minutes Get started in minutes Management Overhead High Low Low Scalability Difficult to scale Scale in seconds with one click Scale in minutes with one click Throughput Infinite Scales with shards, supports up to 1mb payloads Infinite Durability Configurable 3x by default Configurable Infrastructure You manage AWS manages AWS manages Write-to-Read Latency <100 ms is achievable <100 ms (with HTTP/2) <100 ms is achievable Open Sourced? Yes No Yes
  • 28. Kinesis Data Streams Kinesis Data Firehose Amazon S3 Amazon EC2 AWS Lambda Amazon ECS Kinesis Data Analytics Amazon ES Amazon Athena Amazon CloudWatch https://aws.amazon.com/solutions/case-studies/autodesk-log-analytics/ Example Usage Pattern 1: Data Hub AmazonM SK
  • 29. Log Aggregation Web servers access log Aggregated logs
  • 30. Example Usage Pattern 2: Web Analytics and Leaderboards Amazon DynamoDB Amazon Kinesis Data Analytics Amazon Kinesis Data Streams Amazon Cognito Lightweight JS client code Web server on Amazon EC2 OR Compute top 10 users Ingest web app data Persist to feed live apps Lambda function https://aws.amazon.com/solutions/implementations/real-time-web-analytics-with-kinesis/ Amazon MSK
  • 32. https://aws.amazon.com/blogs/aws/new-serverless-streaming-etl-with-aws-glue/ Example Usage Pattern 3: Monitoring IoT Devices Ingest sensor data Convert json to parquet Store all data points in an S3 data lake AWS IoT Core IoT rule AWS Glue Streaming Job Amazon Athena Glue Crawler Glue Data Catalog S3 Bucket AWS Cloud MQTT Topic Amazon Kinesis Data Streams Raspberry PI + Sense HAT
  • 33. Event Sourcing and CQRS https://www.confluent.io/blog/event-sourcing-cqrs-stream-processing-apache-kafka-whats-connection/ App Write Interface App Read Interface Event Queue Application State Kafka Streams Topology Kafka Topic Event Handler App Write Interface App Read Interface Kafka Streams State Store Event Store Event Handler + App State Event Store
  • 34. Amazon Kinesis Data Streams Amazon Kinesis Data Analytics (SQL) Example Usage Pattern 4: Streaming SQL Continuous filter Aggregate function Data enrichment (join) S3 Bucket Anomaly Detection Ticker, Company AMZN, Amazon ASD, SomeCompanyA BAC, SomeCompanyB CRM, SomeCompanyC Event Store https://docs.aws.amazon.com/kinesisanalytics/latest/dev/examples.html App Write Interface App Read Interface {"TICKER_SYMBOL": "CVB", "SECTOR": "TECHNOLOGY", "CHANGE": 0.81, "PRICE": 53.63} {"TICKER_SYMBOL": "ABC", "SECTOR": "RETAIL", "CHANGE": -1.14, "PRICE": 23.64} {"TICKER_SYMBOL": "JKL", "SECTOR": "TECHNOLOGY", "CHANGE": 0.22, "PRICE": 15.32} Event Handler + App State join
  • 37. Key Takeaways • Distributed Queue as Stream Storage • Preserve Ordering • Parallel Consumption • Persistent Buffer • Decouple producers & consumers • Trade-off: Operational Excellence vs Degree of Freedom • MUST keep an eye on the right monitoring metrics • Architectural Patterns • Data Hub: (Asynchronous) Event-Bus • Log Aggregation • IoT • Event Sourcing and CQRS
  • 38. Where To Go Next? • Amazon MSK Labs https://amazonmsk-labs.workshop.aws/ • Amazon Managed Streaming for Kafka: Best Practices https://docs.aws.amazon.com/msk/latest/developerguide/bestpractices.html • Monitoring Kafka performance metrics (2020-04-16) https://tinyurl.com/y6hrhwbq • Apache Kafka 모니터링을 위한 Metrics 이해 및 최적화 방안 (2018-11) https://tinyurl.com/y4uwyenx • AWS Analytics Immersion Day - Build BI System from Scratch • Workshop - https://tinyurl.com/yapgwv77 • Slides - https://tinyurl.com/ybxkb74b • Realtime Analytics on AWS https://tinyurl.com/y3evwm3v • Writing SQL on Streaming Data with Amazon Kinesis Analytics – Part 1, 2 • Part1 - https://tinyurl.com/y8vo8q7o • Part2 - https://tinyurl.com/ycbv7wel

Editor's Notes

  1. [Kinesis Data Streams] ------------------ Streams and shards AWS API experience Throughput provisioning model Seamless scaling Typically lower costs Deep AWS integrations [Amazon MSK] ------------ Topics and partitions Open-source compatibility Strong third-party tooling Cluster provisioning model Kafka scaling isn’t seamless to clients Raw performance ----------- https://aws.highspot.com/items/5cb120e7429d7b4ed26391e3?lfrm=irel.7#29
  2. Shard당 최대 처리량: 10,000/200ms = 50,000 records/sec (5만)
  3. Shared Responsibility Model 관점에서 메시지를 주자 Under the hood: Scaling your Kinesis data streams https://aws.amazon.com/ko/blogs/big-data/under-the-hood-scaling-your-kinesis-data-streams/
  4. Too much information can be just as useless as too little
  5. https://aws.highspot.com/items/5cb120e7429d7b4ed26391e3?lfrm=irel.7#30
  6. Command Query Responsibility Segregation (CQRS) Event sourcing involves modeling the state changes made by applications as an immutable sequence or “log” of events. Instead of modifying the state of the application in-place, event sourcing involves storing the event that triggers the state change in an immutable log and modeling the state changes as responses to the events in the log.  Event Sourcing and CQRS Furthermore, the event sourcing and CQRS application architecture patterns are also related. Command Query Responsibility Segregation (CQRS) is an application architecture pattern most commonly used with event sourcing. CQRS involves splitting an application into two parts internally — the command side ordering the system to update state and the query side that gets information without changing state. CQRS provides separation of concerns – The command or write side is all about the business; it does not care about the queries, different materialized views over the data, optimal storage of the materialized views for performance and so on. On the other hand, the query or read side is all about the read access; its main purpose is making queries fast and efficient.