SlideShare a Scribd company logo
1 of 20
The Stream Processor as a Database
The evolution of realtime analytics architecture
Jamie Grier
@jamiegrier
jamie@data-artisans.com
Who am I?
• Director of Applications Engineering at data Artisans
• Previously working on streaming computation at Twitter, Gnip and
Boulder Imaging
• Involved in various kinds of stream processing for about a decade
• Now I spend my time helping people to be successful with
Apache Flink in production applications
@jamiegrier#ff16 #ApacheFlink
Introduction
• Evolution of software architecture for real-time analytics at scale
• Pros and cons of each architecture
• New possibilities with robust stateful stream processing and
Queryable State!
• Introduce the idea of using the stream processor itself as the DB
• Demo of Queryable State in Apache Flink 1.2-SNAPSHOT!
@jamiegrier#ff16 #ApacheFlink
Motivating Example
• Tweet Impressions: 1 Million+ Impressions /
Second
• 100 Million+ unique tweet impressions per
hour
• Computing hourly aggegates for each
tweet and storing in key/value store
• Low latency access to the current in-flight
aggregates
@jamiegrier#ff16 #ApacheFlink
Tweet Impressions
Tweet Impressions
0
25
50
75
100
01:00:00 02:00:00 03:00:00 04:00:00
Tweet 1 Tweet 2
@jamiegrier#ff16 #ApacheFlink
Batch Architecture
@jamiegrier#ff16 #ApacheFlinkBatch Architecture
Batch Architecture
Bulk load of data into KV store can be very
efficient
Robust against failures — just restart failed
partitions
No resource usage between batches
@jamiegrier#ff16 #ApacheFlink
Pros Cons
Data Availability = Batch interval
Could be hours or more than a day
Not robust to out of order data issues
Batch boundary errors introduced
Hard to compute things like session
windows
@jamiegrier#ff16 #ApacheFlinkAlpha Architecture (not really)
Alpha Architecture :)
@jamiegrier#ff16 #ApacheFlink
Pros Cons
Data is available for query instantly
Very simple architecture
Handles out of order data naturally
We can always get the best data so far for
any given hour
Key value store becomes the bottleneck
very quickly
DB Load directly proportional to input rate
(2x ?)
Not robust to failures - failures can lead to
multiple counting, etc
@jamiegrier#ff16 #ApacheFlinkLamba Architecture
Lambda Architecture
Load on KV store is load, effectively bulk load of
pre-computed aggregates
Instant access to data
Can be built by cobbling together various
existing systems
Some of the best of two worlds
@jamiegrier#ff16 #ApacheFlink
Pros Cons
High load on speed layer cache, new bottleneck
Hard to reason about data correctness in speed layer
Correct data comes very late
Still subjest to batch boundary errors and hard to
compute sessions, etc
Complex and expensive!
Also the worst of two worlds
@jamiegrier#ff16 #ApacheFlinkBeta Architecture (not really)
Beta Architecture :)
@jamiegrier#ff16 #ApacheFlink
Dramatically reduced load on the KV Store
DB load is now relative to key cardinality not
message input rate
Correct counts even in failure cases
Get’s rid of correctness issues caused by batch
boundaries
We can tighten up our aggregate frequency as
compared to a batch system
Pros Cons
Time until data available increases - same
as window size!
Writes to key value store must be
idempotent to achieve robust semantics
The above is not always possible
@jamiegrier#ff16 #ApacheFlinkOmega Architecture (not really)
Omega Architecture :)
@jamiegrier#ff16 #ApacheFlink
Dramatically less load on the KV store
KV Store load proportional to unique key rate not
input rate
Correct data is available instantly!
Still very simple
Correct in failure cases
Only need current window state in Flink
Pros Cons
Still need a separate KV Store and Query
Service to merge results
In current implementation older data can be
served in some failure scenarios
Omega Prime Architecture :)
@jamiegrier#ff16 #ApacheFlink
“Optimus Prime was forced into a combination with his
brother. The result of their combination is Omega Prime, a
seemingly unique individual who not only combines the best
of his component parts abilities but adds a considerable
amount of power to the total. His importance in the fight
against evil cannot be over-estimated, stretching beyond even
his own universe.” — Wikipedia
@jamiegrier#ff16 #ApacheFlinkOmega Prime Architecture
Omega Prime Architecture :)
@jamiegrier#ff16 #ApacheFlink
No external key value store
Easy to scale because all state is local
Data is available instantly
Simple
Correct in failure cases
Build whatever stateful stream applications you
can think of and still have strong correctness
guarantees
Pros Cons
Total state size must fit in Flink State. This is
a limitation — for now.
See “very large state” talk by Stephan Ewen
Are people ready to consider using the
stream processor state as the only data
store?
Demo!
• Flink 1.2-SNAPSHOT (master)
• Created a plug-in for Grafana to query Flink
state directly
• Created a simple REST server to serve
requests from Grafana
• Queries window state in Flink directly
@jamiegrier#ff16 #ApacheFlink
@jamiegrier#ff16 #ApacheFlink
We’re Hiring!
http://data-artisans.com/careers

More Related Content

Viewers also liked

Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®Ververica
 
Ted Dunning-Faster and Furiouser- Flink Drift
Ted Dunning-Faster and Furiouser- Flink DriftTed Dunning-Faster and Furiouser- Flink Drift
Ted Dunning-Faster and Furiouser- Flink DriftFlink Forward
 
Julian Hyde - Streaming SQL
Julian Hyde - Streaming SQLJulian Hyde - Streaming SQL
Julian Hyde - Streaming SQLFlink Forward
 
Stephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink EverywhereStephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink EverywhereFlink Forward
 
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with FlinkSanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with FlinkFlink Forward
 
Márton Balassi Streaming ML with Flink-
Márton Balassi Streaming ML with Flink- Márton Balassi Streaming ML with Flink-
Márton Balassi Streaming ML with Flink- Flink Forward
 
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...Ververica
 
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)Robert Metzger
 
Apache Flink Community Updates November 2016 @ Berlin Meetup
Apache Flink Community Updates November 2016 @ Berlin MeetupApache Flink Community Updates November 2016 @ Berlin Meetup
Apache Flink Community Updates November 2016 @ Berlin MeetupRobert Metzger
 
Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?Till Rohrmann
 
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords   The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords Stephan Ewen
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Fabian Hueske
 
Automatic Detection of Web Trackers by Vasia Kalavri
Automatic Detection of Web Trackers by Vasia KalavriAutomatic Detection of Web Trackers by Vasia Kalavri
Automatic Detection of Web Trackers by Vasia KalavriFlink Forward
 
Trevor Grant - Apache Zeppelin - A friendlier way to Flink
Trevor Grant - Apache Zeppelin - A friendlier way to FlinkTrevor Grant - Apache Zeppelin - A friendlier way to Flink
Trevor Grant - Apache Zeppelin - A friendlier way to FlinkFlink Forward
 
Alexander Kolb - Flinkspector – Taming the squirrel
Alexander Kolb - Flinkspector – Taming the squirrelAlexander Kolb - Flinkspector – Taming the squirrel
Alexander Kolb - Flinkspector – Taming the squirrelFlink Forward
 
Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with...
Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with...Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with...
Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with...Flink Forward
 
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in FlinkMaxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in FlinkFlink Forward
 
Eron Wright - Introducing Flink on Mesos
Eron Wright - Introducing Flink on MesosEron Wright - Introducing Flink on Mesos
Eron Wright - Introducing Flink on MesosFlink Forward
 

Viewers also liked (20)

Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®
 
Ted Dunning-Faster and Furiouser- Flink Drift
Ted Dunning-Faster and Furiouser- Flink DriftTed Dunning-Faster and Furiouser- Flink Drift
Ted Dunning-Faster and Furiouser- Flink Drift
 
Julian Hyde - Streaming SQL
Julian Hyde - Streaming SQLJulian Hyde - Streaming SQL
Julian Hyde - Streaming SQL
 
Stephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink EverywhereStephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink Everywhere
 
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with FlinkSanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
 
Márton Balassi Streaming ML with Flink-
Márton Balassi Streaming ML with Flink- Márton Balassi Streaming ML with Flink-
Márton Balassi Streaming ML with Flink-
 
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
 
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
 
Apache Flink Community Updates November 2016 @ Berlin Meetup
Apache Flink Community Updates November 2016 @ Berlin MeetupApache Flink Community Updates November 2016 @ Berlin Meetup
Apache Flink Community Updates November 2016 @ Berlin Meetup
 
Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?
 
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords   The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
 
A look at Flink 1.2
A look at Flink 1.2A look at Flink 1.2
A look at Flink 1.2
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
 
Automatic Detection of Web Trackers by Vasia Kalavri
Automatic Detection of Web Trackers by Vasia KalavriAutomatic Detection of Web Trackers by Vasia Kalavri
Automatic Detection of Web Trackers by Vasia Kalavri
 
Trevor Grant - Apache Zeppelin - A friendlier way to Flink
Trevor Grant - Apache Zeppelin - A friendlier way to FlinkTrevor Grant - Apache Zeppelin - A friendlier way to Flink
Trevor Grant - Apache Zeppelin - A friendlier way to Flink
 
Alexander Kolb - Flinkspector – Taming the squirrel
Alexander Kolb - Flinkspector – Taming the squirrelAlexander Kolb - Flinkspector – Taming the squirrel
Alexander Kolb - Flinkspector – Taming the squirrel
 
Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with...
Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with...Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with...
Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with...
 
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in FlinkMaxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
 
Eron Wright - Introducing Flink on Mesos
Eron Wright - Introducing Flink on MesosEron Wright - Introducing Flink on Mesos
Eron Wright - Introducing Flink on Mesos
 

More from Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkFlink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsFlink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesFlink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 

More from Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 

Recently uploaded

Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)Data & Analytics Magazin
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 

Recently uploaded (17)

Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 

Jamie Grier - The Stream Processor as a Database- Building Online Applications directly on Streams

  • 1. The Stream Processor as a Database The evolution of realtime analytics architecture Jamie Grier @jamiegrier jamie@data-artisans.com
  • 2. Who am I? • Director of Applications Engineering at data Artisans • Previously working on streaming computation at Twitter, Gnip and Boulder Imaging • Involved in various kinds of stream processing for about a decade • Now I spend my time helping people to be successful with Apache Flink in production applications @jamiegrier#ff16 #ApacheFlink
  • 3. Introduction • Evolution of software architecture for real-time analytics at scale • Pros and cons of each architecture • New possibilities with robust stateful stream processing and Queryable State! • Introduce the idea of using the stream processor itself as the DB • Demo of Queryable State in Apache Flink 1.2-SNAPSHOT! @jamiegrier#ff16 #ApacheFlink
  • 4. Motivating Example • Tweet Impressions: 1 Million+ Impressions / Second • 100 Million+ unique tweet impressions per hour • Computing hourly aggegates for each tweet and storing in key/value store • Low latency access to the current in-flight aggregates @jamiegrier#ff16 #ApacheFlink
  • 5. Tweet Impressions Tweet Impressions 0 25 50 75 100 01:00:00 02:00:00 03:00:00 04:00:00 Tweet 1 Tweet 2 @jamiegrier#ff16 #ApacheFlink
  • 7. Batch Architecture Bulk load of data into KV store can be very efficient Robust against failures — just restart failed partitions No resource usage between batches @jamiegrier#ff16 #ApacheFlink Pros Cons Data Availability = Batch interval Could be hours or more than a day Not robust to out of order data issues Batch boundary errors introduced Hard to compute things like session windows
  • 9. Alpha Architecture :) @jamiegrier#ff16 #ApacheFlink Pros Cons Data is available for query instantly Very simple architecture Handles out of order data naturally We can always get the best data so far for any given hour Key value store becomes the bottleneck very quickly DB Load directly proportional to input rate (2x ?) Not robust to failures - failures can lead to multiple counting, etc
  • 11. Lambda Architecture Load on KV store is load, effectively bulk load of pre-computed aggregates Instant access to data Can be built by cobbling together various existing systems Some of the best of two worlds @jamiegrier#ff16 #ApacheFlink Pros Cons High load on speed layer cache, new bottleneck Hard to reason about data correctness in speed layer Correct data comes very late Still subjest to batch boundary errors and hard to compute sessions, etc Complex and expensive! Also the worst of two worlds
  • 13. Beta Architecture :) @jamiegrier#ff16 #ApacheFlink Dramatically reduced load on the KV Store DB load is now relative to key cardinality not message input rate Correct counts even in failure cases Get’s rid of correctness issues caused by batch boundaries We can tighten up our aggregate frequency as compared to a batch system Pros Cons Time until data available increases - same as window size! Writes to key value store must be idempotent to achieve robust semantics The above is not always possible
  • 15. Omega Architecture :) @jamiegrier#ff16 #ApacheFlink Dramatically less load on the KV store KV Store load proportional to unique key rate not input rate Correct data is available instantly! Still very simple Correct in failure cases Only need current window state in Flink Pros Cons Still need a separate KV Store and Query Service to merge results In current implementation older data can be served in some failure scenarios
  • 16. Omega Prime Architecture :) @jamiegrier#ff16 #ApacheFlink “Optimus Prime was forced into a combination with his brother. The result of their combination is Omega Prime, a seemingly unique individual who not only combines the best of his component parts abilities but adds a considerable amount of power to the total. His importance in the fight against evil cannot be over-estimated, stretching beyond even his own universe.” — Wikipedia
  • 18. Omega Prime Architecture :) @jamiegrier#ff16 #ApacheFlink No external key value store Easy to scale because all state is local Data is available instantly Simple Correct in failure cases Build whatever stateful stream applications you can think of and still have strong correctness guarantees Pros Cons Total state size must fit in Flink State. This is a limitation — for now. See “very large state” talk by Stephan Ewen Are people ready to consider using the stream processor state as the only data store?
  • 19. Demo! • Flink 1.2-SNAPSHOT (master) • Created a plug-in for Grafana to query Flink state directly • Created a simple REST server to serve requests from Grafana • Queries window state in Flink directly @jamiegrier#ff16 #ApacheFlink