4. From Batch to Real-time:
Lambda Architecture
Data
Source
Stream
Storage
Speed Layer
Batch Layer
Batch
Process
Batch
View
Real-
time
View
Consumer
Query & Merge
Results
Service Layer
Stream
Ingestion
Raw Data
Storage
Streaming Data
Stream
Delivery
Stream
Process
6. Key Components of Real-time Analytics
Data
Source
Stream
Storage
Stream
Process
Stream
Ingestion
Data
Sink
Devices and/or
applications that
produce real-time
data at high
velocity
Data from tens of
thousands of data
sources can be
written to a single
stream
Data are stored in the
order they were
received for a set
duration of time and
can be replayed
indefinitely during that
time
Records are read in
the order they are
produced, enabling
real-time analytics
or streaming ETL
Data lake
(most common)
Database
(least common)
17. Amazon Kinesis Data
Streams
Amazon Managed Streaming
for Kafka
• Operational Perspective
• Number of clusters?
• Number of brokers per cluster?
• Number of topics per broker?
• Number of partitions per topic?
• Cluster provisioning model
• Only increase number of partitions;
can’t decrease
• Integration with a few of AWS Services
such as Kinesis Data Analytics for
Apache Flink
• Operational Perspective
• Number of Kinesis Data Streams?
• Number of shards per stream?
• Throughput provisioning model
• Increase/Decrease number of shards
• Fully Integration with AWS Services
such as Lambda function, Kinesis Data
Analytics, etc
19. RequestQueue
- Length
- WaitTime
ResponseQueue
- Length
- WaitTime
Network
- Packet Drop?
Produce/Consume Rate Unbalance
Who is Leader? Disk Full?
Too many topics?
Metrics to Monitor: MSK (Kafka)
20. Metrics to Monitor: MSK (Kafka)
Metric Level Description
ActiveControllerCount DEFAULT Only one controller per cluster should be active at any given time.
OfflinePartitionsCount DEFAULT Total number of partitions that are offline in the cluster.
GlobalPartitionCount DEFAULT Total number of partitions across all brokers in the cluster.
GlobalTopicCount DEFAULT Total number of topics across all brokers in the cluster.
KafkaAppLogsDiskUsed DEFAULT The percentage of disk space used for application logs.
KafkaDataLogsDiskUsed DEFAULT The percentage of disk space used for data logs.
RootDiskUsed DEFAULT The percentage of the root disk used by the broker.
PartitionCount PER_BROKER The number of partitions for the broker.
LeaderCount PER_BROKER The number of leader replicas.
UnderMinIsrPartitionCount PER_BROKER The number of under minIsr partitions for the broker.
UnderReplicatedPartitions PER_BROKER The number of under-replicated partitions for the broker.
FetchConsumerTotalTimeMsMean PER_BROKER The mean total time in milliseconds that consumers spend on
fetching data from the broker.
ProduceTotalTimeMsMean PER_BROKER The mean produce time in milliseconds.
21. How about monitoring Kinesis Data Streams?
How long time does a record stay in a shard?
5 transactions
per second,
per shard
With only one
consumer application,
records can be
retrieved every 200 ms
up to 1MB or 1,000
records per seconds,
per shard for writes
• 10MB per second, per shard
• up to 10,000 records per call
Consumer
Application
GetRecords()
Data
22. Metrics to Monitor: Kinesis Data Streams
Metric Description
GetRecords.IteratorAgeMilliseconds Age of the last record in all GetRecords
ReadProvisionedThroughputExceeded Number of GetRecords calls throttled
WriteProvisionedThroughputExceeded Number of PutRecord(s) calls throttled
PutRecord.Success, PutRecords.Success Number of successful PutRecord(s) operations
GetRecords.Success Number of successful GetRecords operations
24. Kafka vs MSK vs Kinesis Data Streams
Operational
Excellence
Kinesis Data
Streams
Kafka
Amazon MSK
Degree of Freedom
≈ Complexity
25. Comparison Summary
Attribute Apache Kafka Kinesis Streams Managed Streaming for Kafka
Cost $$$ $ (pay for what you use) $$ (pay for infrastructure)
Ease of use Advanced setup required Get started in minutes Get started in minutes
Management Overhead High Low Low
Scalability Difficult to scale
Scale in seconds with one
click
Scale in minutes with one click
Throughput Infinite
Scales with shards, supports
up to 1mb payloads
Infinite
Durability Configurable 3x by default Configurable
Infrastructure You manage AWS manages AWS manages
Write-to-Read Latency <100 ms is achievable <100 ms (with HTTP/2) <100 ms is achievable
Open Sourced? Yes No Yes
28. Kinesis
Data Streams
Kinesis
Data Firehose
Amazon S3
Amazon EC2
AWS Lambda
Amazon ECS
Kinesis
Data Analytics
Amazon ES
Amazon Athena
Amazon CloudWatch
https://aws.amazon.com/solutions/case-studies/autodesk-log-analytics/
Example Usage Pattern 1: Data Hub
AmazonM
SK
30. Example Usage Pattern 2: Web Analytics
and Leaderboards
Amazon
DynamoDB
Amazon Kinesis
Data Analytics
Amazon Kinesis
Data Streams
Amazon
Cognito
Lightweight JS
client code
Web server on
Amazon EC2
OR
Compute top 10 users
Ingest web app data Persist to feed live apps
Lambda
function
https://aws.amazon.com/solutions/implementations/real-time-web-analytics-with-kinesis/
Amazon MSK
37. Key Takeaways
• Distributed Queue as Stream Storage
• Preserve Ordering
• Parallel Consumption
• Persistent Buffer
• Decouple producers & consumers
• Trade-off: Operational Excellence vs Degree of Freedom
• MUST keep an eye on the right monitoring metrics
• Architectural Patterns
• Data Hub: (Asynchronous) Event-Bus
• Log Aggregation
• IoT
• Event Sourcing and CQRS
38. Where To Go Next?
• Amazon MSK Labs
https://amazonmsk-labs.workshop.aws/
• Amazon Managed Streaming for Kafka: Best Practices
https://docs.aws.amazon.com/msk/latest/developerguide/bestpractices.html
• Monitoring Kafka performance metrics (2020-04-16)
https://tinyurl.com/y6hrhwbq
• Apache Kafka 모니터링을 위한 Metrics 이해 및 최적화 방안 (2018-11)
https://tinyurl.com/y4uwyenx
• AWS Analytics Immersion Day - Build BI System from Scratch
• Workshop - https://tinyurl.com/yapgwv77
• Slides - https://tinyurl.com/ybxkb74b
• Realtime Analytics on AWS
https://tinyurl.com/y3evwm3v
• Writing SQL on Streaming Data with Amazon Kinesis Analytics – Part 1, 2
• Part1 - https://tinyurl.com/y8vo8q7o
• Part2 - https://tinyurl.com/ycbv7wel
Editor's Notes
[Kinesis Data Streams]
------------------
Streams and shards
AWS API experience
Throughput provisioning model
Seamless scaling
Typically lower costs
Deep AWS integrations
[Amazon MSK]
------------
Topics and partitions
Open-source compatibility
Strong third-party tooling
Cluster provisioning model
Kafka scaling isn’t seamless to clients
Raw performance
-----------
https://aws.highspot.com/items/5cb120e7429d7b4ed26391e3?lfrm=irel.7#29
Shard당 최대 처리량: 10,000/200ms = 50,000 records/sec (5만)
Shared Responsibility Model 관점에서 메시지를 주자
Under the hood: Scaling your Kinesis data streams
https://aws.amazon.com/ko/blogs/big-data/under-the-hood-scaling-your-kinesis-data-streams/
Too much information can be just as useless as too little
Command Query Responsibility Segregation (CQRS)
Event sourcing involves modeling the state changes made by applications as an immutable sequence or “log” of events. Instead of modifying the state of the application in-place, event sourcing involves storing the event that triggers the state change in an immutable log and modeling the state changes as responses to the events in the log.
Event Sourcing and CQRS
Furthermore, the event sourcing and CQRS application architecture patterns are also related. Command Query Responsibility Segregation (CQRS) is an application architecture pattern most commonly used with event sourcing. CQRS involves splitting an application into two parts internally — the command side ordering the system to update state and the query side that gets information without changing state. CQRS provides separation of concerns – The command or write side is all about the business; it does not care about the queries, different materialized views over the data, optimal storage of the materialized views for performance and so on. On the other hand, the query or read side is all about the read access; its main purpose is making queries fast and efficient.