Apache Mesos allows operators to run distributed applications across an entire datacenter and is attracting ever increasing interest. As much as distributed applications see increased use enabled by Mesos, Mesos also sees increasing use due to a growing ecosystem of well integrated applications. One of the latest additions to the Mesos family is Apache Flink. Flink is one of the most popular open source systems for real-time high scale data processing and allows users to deal with low-latency streaming analytical workloads on Mesos.
In this talk we explain the challenges solved while integrating Flink with Mesos, including how Flink’s distributed architecture can be modeled as a Mesos framework, and how Flink was integrated with Fenzo. Next, we describe how Flink was packaged to easily run on DC/OS.
6. Evolution of Data Analytics
Batch Event ProcessingMicro-Batch
Days Hours Minutes Seconds Microseconds
Solves problems using predictive and
prescriptive analytics
Reports what has happened using descriptive
analytics
Predictive User InterfaceReal-time Pricing and
Routing
Real-time
Advertising
Billing,
Chargeback
Product
recommendations
7. FMACK Stack
EVENTS
Ubiquitous data
streams from
connected devices
INGEST
Apache
Kafka
STORE
Apache
Flink
ANALYZE
Apache
Cassandra
ACT
Akka
Ingest millions of
events per second
Distributed & highly
scalable database
Real-time and batch
process data
Visualize data & build
data driven apps
Mesos/ DC/OS
Sensors
Devices
Clients
11. Apache Mesos
Typical Datacenter
siloed, over-provisioned servers,
low utilization
Industry Average
12-15% utilization
mySQL
microservice
Cassandra
Flink
Kafka
Mesos
automated schedulers, workload multiplexing
onto the same machines
12. Apache Mesos
Why Mesos?
! 2-level scheduling
! Fault-tolerant, battle-tested
! Scalable to 10,000+ nodes
! Created by Mesosphere founder
@ UC Berkeley; used in production
by 100+ web-scale companies [1]
[1] http://mesos.apache.org/documentation/latest/powered-by-mesos/
14. Why Apache Mesos?
▪ Mesos offers full functionality to implement fault
tolerant and elastic distributed applications
▪ 30% of survey respondents were running Flink
on Mesos (prior to proper Mesos support,
September 2016)
18. Fenzo
▪ Generic task scheduler for Mesos frameworks
▪ Developed by Netflix
▪ Matching between tasks and resource offers
▪ Pluggable fitness evaluator
Fenzo
Mesos
Launch
Coordinator
Periodic
resource
offers
Tell Fenzo offered
resources & tasks
Fenzo returns resource
task matchings
Tasks to launch
19. New Distributed Architecture
Mesos Master
Mesos Cluster
Client
(2) HTTP POST
JobGraph/Jars
Flink Master Process
Flink Mesos
ResourceManager
JobManager
(4) Start Process
(and supervise)
(8) Deploy
Tasks
(7) Register
(5) Request slots
Flink Mesos
Dispatcher
(3) Allocate
container
for Flink
master
(6) Allocate
containers
for TaskManagers
Marathon
(1) Start and
monitor
dispatcher
Mesos Task
TaskManager
Mesos Task
TaskManager
20.
21. DC/OS
Datacenter Operating System (DC/OS)
Distributed Systems Kernel (Mesos)
Big Data + Analytics EnginesMicroservices (in containers)
Streaming
Batch
Machine Learning
Analytics
Functions &
Logic
Search
Time Series
SQL / NoSQL
Databases
Modern App Components
Any Infrastructure (Physical, Virtual, Cloud)
22. Demo Time
Generator
▪ Financial data generated by generator
▪ Written to Kafka topics
▪ Kafka topics consumed by Flink
▪ Flink pipeline operates on Kafka data
▪ Results written back into Kafka