More Related Content
Similar to Stream processing on AWS (20)
Stream processing on AWS
- 1. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Stream processing on AWS
アマゾンウェブサービスジャパン株式会社
半場光晴
- 2. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
目次
• このセッションの対象
• Stream processingとは?
• シンプルにするための原則
• アーキテクチャの例示
• デザインパターン
• まとめ
- 3. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
このセッションの対象
• このような方々
– Stream processingを何に生かせばよいのだろう?
– AWSでStream processingをどのように組めば良いのだろう?
– ラムダアーキテクチャに憧れている
– サービスの今を常に把握し続けたい
- 4. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
目次
• このセッションの対象
• Stream processingとは?
• シンプルにするための原則
• アーキテクチャの例示
• デザインパターン
• まとめ
- 5. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Stream processingのストリームとは?
• 連続するイベント、レコード、データ
• 主な特徴
– 高スループット
– 継続的な取り込み
• 例えば
– ログ集計、イベント集計
– センサー
例 1 KB * 10K record/s * 365 days
= 300 TB / year
ストリームからのBig
Data?!
- 6. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
バッチ処理とストリーム処理の比較
• Batch processing
– エラーレポート
– 請求書
– CVRポート
– セキュリティレポート
• Stream processing
– メトリクス監視
– 合計請求額アラート
– クリック分析
– 異常検知
- 7. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Stream processingを大別すると
• イベント駆動型
– イベントが発生するたびに処理を実行する
• マイクロバッチ型
– 細かく区切ったイベント群に対して、バッチ処
理を連続して実行する
- 8. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
マイクロバッチの例 Sliding window
• Sliding window
– Treat events as a window
• 例 30 seconds window
– Slide the window
• 例 Every 20 seconds
• ユースケース
– 直近1分間に発生したイベント
を10秒ごとに処理して異常検
知する
http://spark.apache.org/docs/latest/streaming-programming-guide.html
• Window length: 3
• The duration of the window
• Sliding interval: 2
• The interval at which the
window operation is performed
- 9. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Stream processing活用の鍵
• Stream processingを使いこなすためのキーポ
イントは何だろう?
– シンプルが一番
- 10. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
目次
• このセッションの対象
• Stream processingとは?
• シンプルにするための原則
• アーキテクチャの例示
• デザインパターン
• まとめ
- 11. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
ありあまるほどの選択肢
Amazon
Glacier
Amazon S3
Amazon
DynamoDB
Amazon RDS
Amazon EMR
Amazon
Redshift
Amazon
Kinesis
Amazon Kinesis-
enabled app
AWS Lambda Amazon ML
Amazon SQS
Amazon
ElastiCache
DynamoDB
Streams Amazon Elasticsearch
Service
AWS IoT
- 12. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
シンプルにするための原則
• データバスを疎結合にしよう
– Data > store > Process > Store > Process > Answers
• 目的に適したソフトウェアを選択しよう
– データ構造、レイテンシ、スループット、アクセスパターン
• ラムダアーキテクチャの発想を取り入れよう
– バッチ層 + スピード層 + サービング層
• AWSのサービスを活用しよう
– レジリエント、低い管理コスト
- 13. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
目次
• このセッションの対象
• Stream processingとは?
• シンプルにするための原則
• アーキテクチャの例示
• デザインパターン
• まとめ
- 14. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
アーキテクチャの例示 – 基本
• レイテンシ (結果が得られるまでの時間)
• スループット
• アクセスパターン
• コスト
Collect /
Store
Process
Store /
Analyze
データ 結果
- 15. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
データソース層
• APIコール
• DB書き込み
• センサー、IoT
• ログ
• メッセージ
• データ転送
Data Source
Stream
Files
Disks
AWS Direct Connect
Mobile Apps
Web Apps
AWS
Import/
Export
Logging
Data
Centers
Sensors &
IoT
Devices
CloudWatch
Snowball
Messages
Messaging
AWS IoT
AWS Direct Connect
Message
Transactions
Hot
Cold
- 16. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
データコレクション層
• KVS
• Streaming
• Messaging, Queue
• Storage
Data Source
Collect /
Store
Transactions
Stream
Files
Messages
Disks
Amazon
Kinesis
Streams
Amazon
Kinesis
Firehose
Amazon
DynamoDB
Streams
Amazon
SQS
Apache
Kafka
AWS Direct Connect
Snowball
Mobile Apps
Web Apps
AWS
Import/
Export
Messaging
Logging
Data
Centers
AWS IoT
Devices
Sensors
CloudWatch
HotCold
Amazon
S3
Events
WarmHotHot
- 17. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon
DynamoDB
Streams
Amazon
Kinesis
Streams
Amazon
Kinesis
Firehose
Apache
Kafka
Amazon
SQS
Amazon S3
Event
Notifications
AWS Managed
Service
Yes Yes Yes No Yes Yes
Guaranteed
Ordering
Yes Yes Yes Yes No No
Delivery exactly-once at-least-once exactly-once at-least-once at-least-once at-least-once
Data Retention
Period
24 hours 7 days N/A Configurable 14 days 24 hours retry
Availability 3 AZ 3 AZ 3 AZ Configurable 3 AZ 3 AZ
Scale /
Throughput
No limit /
~ Table IOPS
No Limit /
~ Shards
No limit /
Automatic
No limit /
~ Nodes
No Limits /
Automatic
No Limits
Parallel Clients Yes Yes No Yes No No
Stream MapReduce Yes Yes N/A Yes N/A N/A
Record/Object size 400KB 1MB Amazon Redshift
row size
Configurable 256KB 5TB / S3
Object
Cost Higher (table
cost)
Low Low Low (+admin) Low-Medium Low
各データコレクションの特徴
- 18. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
プロセス層
• Serverless
• Stream
processing
• Distributed
computing
• Job chain
Data Source
Collect /
Store
Process
Amazon Kinesis
Apps
AWS
Lambda
Transactions
Stream
Files
Disks
Amazon Kinesis
Analytics
Amazon
Kinesis
Streams
Amazon
Kinesis
Firehose
Amazon
DynamoDB
Streams
Apache
Kafka
AWS Direct Connect
Mobile Apps
Web Apps
AWS
Import/
Export
Logging
Data
Centers
Sensors &
IoT
Devices
CloudWatch
Snowball
Messages
Amazon
SQS
Messaging
AWS IoT
AWS Direct Connect
Message
Amazon
S3
Events
Streaming EMR
SQS Apps
EC2
EC2
FastFast
- 19. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
各プロセスの特徴
Spark
Streaming
Apache
Storm
Kinesis KCL
Application
AWS Lambda Amazon SQS
Client Apps
Scale ~ Nodes ~ Nodes ~ Nodes Automatic ~ Nodes
Micro-Batch or
Real-time
Micro-batch Real-time Near-real-time Near-real-time Near-real-time
AWS Managed
Service
Yes (EMR) No (EC2) No (KCL + EC2
+ Auto Scaling)
Yes No (EC2 + Auto
Scaling)
Scalability No Limits
~ Nodes
No Limits
~ Nodes
No Limits
~ Nodes
No Limits No Limits
Availability Single AZ Configurable Multi-AZ Multi-AZ Multi-AZ
Programming
languages
Java, Python,
Scala
Any language
via Thrift
Java, via
MultiLang
Daemon (.NET,
Python, Ruby,
Node.js)
Node.js, Java,
Python
AWS SDK
languages
(Java, .NET,
Python, …)
- 20. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
• Cache
• KVS
• RDB
• Search
• Storage
• DWH
• Distribut
ed
• ML
Process
Analyze
Store
Amazon EMR
Amazon
ML
Amazon
RDS
Amazon
DynamoDB
Amazon
ElastiCache
Data Source
Collect /
Store
Amazon Kinesis
Apps
AWS
Lambda
Transactions
Stream
Files
Disks
Amazon
S3
Amazon Kinesis
Analytics (new)
Amazon
Kinesis
Streams
Amazon
Kinesis
Firehose
Amazon
DynamoDB
Streams
Apache
Kafka
AWS Direct Connect
Amazon
Elasticsearch
Service
SearchSQLNoSQLCacheML
Amazon
Redshift
DWFileHadoop&Spark
Mobile Apps
Web Apps
AWS
Import/
Export
Logging
Data
Centers
Sensors &
IoT
Devices
CloudWatch
Snowball
SQS Apps
Messages
Amazon
SQS
Messaging
AWS IoT
AWS Direct Connect
Message
Amazon
S3
Events
Streaming EMR
EC2
EC2
HotWarmFastSlow
データストア層
- 21. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
各データストアの特徴
Amazon ElastiCache Amazon
DynamoDB
Amazon
RDS/Aurora
Amazon
Elasticsearch
Amazon S3 Amazon Glacier
Average
latency
ms ms ms, sec ms,sec ms,sec,min
(~ size)
hrs
Typical
data stored
GB GB–TBs
(no limit)
GB–TB
(64 TB Max)
GB–TB MB–PB
(no limit)
GB–PB
(no limit)
Typical
item size
B-KB KB
(400 KB max)
KB
(64 KB max)
KB
(2 GB max)
KB-TB
(5 TB max)
GB
(40 TB max)
Request
Rate
High - Very High Very High
(no limit)
High High Low – High
(no limit)
Very Low
Storage cost
GB/month
$$ ¢¢ ¢¢ ¢¢ ¢ ¢/10
Durability Low - Moderate Very High Very High High Very High Very High
Availability High
2 AZ
Very High
3 AZ
Very High
3 AZ
High
2 AZ
Very High
3 AZ
Very High
3 AZ
Hot Data Warm Data Cold Data
- 22. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
HotWarm
FastFast
Process
Analyze
Store
Amazon EMR
Amazon
ML
Amazon
RDS
Amazon
DynamoDB
Amazon
ElastiCache
Data Source
Collect /
Store
Consume
Amazon Kinesis
Apps
AWS
Lambda
Transactions
Stream
Files
Disks
Amazon
S3
Amazon Kinesis new
Analytics (new)
Amazon
Kinesis
Streams
Amazon
Kinesis
Firehose
Amazon
DynamoDB
Streams Amazon
QuickSight
Apps & Services
Analysis&VisualizationNotebooksIDE
Apache
Kafka
AWS Direct Connect
Amazon
Elasticsearch
Service
SearchSQLNoSQLCacheML
Amazon
Redshift
DWFileHadoop&Spark
Mobile Apps
Web Apps
AWS
Import/
Export
Logging
Data
Centers
Sensors
IoT
Devices
CloudWatch
Snowball
API
SQS Apps
Messages
Amazon
SQS
Messaging
AWS IoT
AWS Direct Connect
Message
Reference
Architecture
Amazon
S3
Events
Streaming EMR
EC2
EC2
FastSlow
WarmHotCold
- 23. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
目次
• このセッションの対象
• Stream processingとは?
• シンプルにするための原則
• アーキテクチャの例示
• デザインパターン
• まとめ
- 24. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
デザインパターン – 基本1
• 何段階かにデータストアを分けて、複数のプロ
セスの関連を疎にする
Store Process Store ProcessData Answers
process
store
- 25. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
デザインパターン – 基本2
• 例えば、Kinesisを介して、複数のプロセスにス
トリームをフォークして処理する
Amazon
Kinesis
AWS
Lambda
Data
Amazon
DynamoDB
Amazon
Kinesis S3
Connector
process
store
Amazon S3
- 26. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
デザインパターン – 基本3
• 例えば、複数のストリームの出力をEMRに集め
て、また処理する
Amazon EMR
Amazon
Kinesis
AWS
Lambda
Amazon S3
Data
Amazon
DynamoDB
AnswerSpark
Streaming
Amazon
Kinesis S3
Connector
Process
store
Spark
SQL
- 27. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Spark Streaming
Apache Storm
AWS Lambda
KCL
Amazon
Redshift
Amazon
Redshift
Hive
Spark
Presto
Amazon Kinesis
Apache Kafka
Amazon
DynamoDB
Amazon S3data
Hot Cold
Data Temperature
ProcessingSpeed
Fast
Slow Answers
Hive
Native
KCL
AWS Lambda
データの鮮度と処理速度
- 28. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
デザインパターン – ニアリアルタイム
Amazon EMR
Apache
Kafka
KCL
AWS Lambda
Spark
Streaming
Apache
Storm
Amazon
SNS
Amazon
ML
Notifications
Amazon
ElastiCache
(Redis)
Amazon
DynamoDB
Amazon
RDS
Amazon
ES
Alert
App state
Real-time Prediction
KPI
process
store
DynamoDB
Streams
Amazon
Kinesis
Data
Stream
- 29. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
バッチ&アドホック
Amazon S3
Amazon EMR
Hive
Pig
Spark
Amazon
ML
process
store
Consume
Amazon
Redshift
Amazon EMR
Presto
Spark
Batch
Interactive
Batch Prediction
Real-time Prediction
Data
Stream
Amazon
Kinesis
Firehose
- 30. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
ラムダアーキテクチャ
Batch Layer
Amazon
Kinesis
Data
Stream
process
store
Amazon
Kinesis S3
Connector
Amazon S3
A
p
p
l
i
c
a
t
i
o
n
s
Amazon
Redshift
Amazon EMR
Presto
Hive
Pig
Spark
answer
Speed Layer
answer
Serving
Layer
Amazon
ElastiCache
Amazon
DynamoDB
Amazon
RDS
Amazon
ES
answer
Amazon
ML
KCL
AWS Lambda
Storm
Spark
Streaming on
Amazon EMR
- 31. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
目次
• このセッションの対象
• Stream processingとは?
• シンプルにするための原則
• アーキテクチャの例示
• デザインパターン
• まとめ
- 32. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
まとめ
• 構成要素を疎結合にしよう
– データ、ストア、プロセス
• ラムダアーキテクチャの発想を取り入れよう
– バッチ、スピード、サービング
• AWSマネージドサービスを活用しよう
– データから価値を生み出すことに注力しよう
- 33. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
ご相談ください
• Stream processingのみならず、Big data関連
の内容も、お気軽に、ご相談ください
- 34. ©2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
TECHNICAL &
BUSINESS
SUPPORT
Account
Management
Support
Professional
Services
Solutions
Architects
Training &
Certification
Security
& Pricing
Reports
Partner
Ecosystem
AWS
MARKETPLACE
Backup
Big Data
& HPC
Business
Apps
Databases
Development
Industry
Solutions
Security
APPLICATION
SERVICES
Queuing
Notifications
Search
Orchestration
Email
ENTERPRISE
APPS
Virtual
Desktops
Storage
Gateway
Sharing &
Collaboration
Email &
Calendaring
Directories
HYBRID CLOUD
MANAGEMENT
Backups
Deployment
Direct
Connect
Identity
Federation
Integrated
Management
SECURITY &
MANAGEMENT
Virtual Private
Networks
Identity &
Access
Encryption
Keys
Configuration Monitoring Dedicated
INFRASTRUCTURE
SERVICES
Regions
Availability
Zones
Compute Storage
Databases
SQL, NoSQL,
Caching
CDNNetworking
PLATFORM
SERVICES
App
Mobile
& Web
Front-end
Functions
Identity
Data Store
Real-time
Development
Containers
Source
Code
Build
Tools
Deployment
DevOps
Mobile
Sync
Identity
Push
Notifications
Mobile
Analytics
Mobile
Backend
Analytics
Data
Warehousing
Hadoop
Streaming
Data
Pipelines
Machine
Learning
Editor's Notes
- 元ネタ
http://cdn.oreillystatic.com/en/assets/1/event/144/Building%20a%20scalable%20architecture%20for%20processing%20streaming%20data%20on%20AWS%20Presentation.pdf