Getting data in and out of Flink is by far the most important aspect, and an everyday typical requirement of building Flink applications. Doing so in an end-to-end exactly-once manner, however, can be tricky. Being able to reliably consume data from the outside world without any duplicate processing and guaranteeing consistent distributed state, and at the same time provide computed results back to the outside world also without introducing duplicates, is crucial for the consistency and correctness of applications built upon stream processors. In this talk, we will talk about how end-to-end exactly-once guarantees can be achieved with Apache Flink. We will talk about Flink’s checkpointing mechanism, and how exactly to leverage it when consuming and producing data from your Flink streaming pipelines. In particular, we will be having a detailed review on how our supported connectors do so, with the aim to provide reference implementations for your own custom consumers and sinks.
7. Exactly-Once on a single node
▪ Simple transactions
▪ Write data in transaction
▪ Include read offsets in transaction
▪ On success - commit transaction
▪ On failure - abort transaction and restart from
last committed read offset
7
8. Exactly-Once on a single node
▪ What about application state?
▪ For example running average
▪ Include in the transaction, by adding as a
separate file/table just before commit
8
9. Exactly-Once in a distributed system
▪ Persistent communication channels
9
10. Exactly-Once in a distributed system
▪ Persistent communication channels
▪ Allow to re-process messages from last
committed read offsets
▪ High costs of communication
▪ Processes operate independent of each
other
10
12. Exactly-Once two phase commit
▪ Can we do better? Yes! Two phase commit
protocol
12
Input data Process 1 Process 2 Output data
13. Exactly-Once two phase commit
13
Input data Process 1 Process 2 Output data
State
backend
Job
manager
14. Exactly-Once two phase commit
14
Input data Process 1 Process 2 Output data
State
backend
Job
manager
Start external
transaction
15. Exactly-Once two phase commit
15
Input data Process 1 Process 2 Output data
State
backend
Job
manager
Inject checkpoint
barrier (1)
Pre-commit
(checkpoint starts)
16. Exactly-Once two phase commit
16
Input data Process 1 Process 2 Output data
State
backend
Job
manager
Inject checkpoint
barrier (1)
Snapshot offsets &
state (2)
Pass checkpoint
barrier (2)
Pre-commit
17. Exactly-Once two phase commit
17
Input data Process 1 Process 2 Output data
State
backend
Job
manager
Inject checkpoint
barrier (1)
Snapshot offsets &
state (2)
Pass checkpoint
barrier (2)
Snapshot state (3)
Pre-commit external
transaction (3)
Pre-commit
18. Exactly-Once two phase commit
▪ Without side effects (only internal state)
•Example: calculating running average
•State managed by Flink
▪ With side effects
•Operator must manage external state on it’s own
18
19. Exactly-Once two phase commit
19
Input data Process 1 Process 2 Output data
State
backend
Job
manager
Inject checkpoint
barrier (1)
Snapshot offsets &
state (2)
Pass checkpoint
barrier (2)
Snapshot state (3)
Pre-commit external
transaction (3)
Pre-commit
20. Exactly-Once two phase commit
20
Input data Process 1 Process 2 Output data
State
backend
Job
manager
Notify checkpoint
completed (1)
Commit
21. Exactly-Once two phase commit
21
Input data Process 1 Process 2 Output data
State
backend
Job
manager
Notify checkpoint
completed (1) Commit external
transaction (2)
Commit
22. Exactly-Once two phase commit
▪ Once all processes successfully complete pre-
commit, commit is issued
▪ If at least one pre-commit fails others are aborted
▪ After a successful pre-commit, commit MUST be
guaranteed to eventually succeed
▪ This guarantees that all processes agree on the final
outcome
22
23. Two phase commit example
implementation
▪ Begin transaction - create a temporary file
▪ Write element - write data to that file
▪ Pre-commit - flush the file
▪ And begin new transaction for subsequent
writes
▪ Commit - move the file to a target directory
▪ Can increase latency
23
24. Exactly-Once two phase commit
▪ In case of system crash, we can restore
state of the application to the latest
snapshot
▪ We could/should abort previous “pending”
transactions
•Delete temporary files
24
26. Exactly-Once producers in Flink
▪ BucketingSink
▪ Pravega connector
▪ Kafka 0.11 connector
▪ Kafka 0.11 introduced transactions
▪ Implemented on top of the
TwoPhaseCommitSinkFunction
▪ Low overhead
26
27. Summarizon
▪ Many ways to achieve Exactly-Once
▪ Flink checkpointing is similar to the two
phase commit protocol
▪ No materialization of the data in transit
▪ Lower latency
▪ Higher throughput
27