Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Dask + Prefect
1. Prefect + Dask: Parallel / Distributed Workflows
Chris White
February 26, 2020
Chris White Workflows on Dask February 26, 2020 1 / 6
2. Workflows
What do we mean by ”Workflows”?
Broadly speaking, when we talk about “workflow semantics” we mean:
Chris White Workflows on Dask February 26, 2020 2 / 6
3. Workflows
What do we mean by ”Workflows”?
Broadly speaking, when we talk about “workflow semantics” we mean:
“Tasks” represent units of business logic
Chris White Workflows on Dask February 26, 2020 2 / 6
4. Workflows
What do we mean by ”Workflows”?
Broadly speaking, when we talk about “workflow semantics” we mean:
“Tasks” represent units of business logic
Identification of failure (alerting)
Chris White Workflows on Dask February 26, 2020 2 / 6
5. Workflows
What do we mean by ”Workflows”?
Broadly speaking, when we talk about “workflow semantics” we mean:
“Tasks” represent units of business logic
Identification of failure (alerting)
Recovery from failure
Chris White Workflows on Dask February 26, 2020 2 / 6
6. Workflows
What do we mean by ”Workflows”?
Broadly speaking, when we talk about “workflow semantics” we mean:
“Tasks” represent units of business logic
Identification of failure (alerting)
Recovery from failure
Triggering logic (e.g., some tasks should be triggered by failed jobs)
Chris White Workflows on Dask February 26, 2020 2 / 6
7. Workflows
What do we mean by ”Workflows”?
Broadly speaking, when we talk about “workflow semantics” we mean:
“Tasks” represent units of business logic
Identification of failure (alerting)
Recovery from failure
Triggering logic (e.g., some tasks should be triggered by failed jobs)
Retrying tasks is a first-class operation
Chris White Workflows on Dask February 26, 2020 2 / 6
8. Workflows
What do we mean by ”Workflows”?
Broadly speaking, when we talk about “workflow semantics” we mean:
“Tasks” represent units of business logic
Identification of failure (alerting)
Recovery from failure
Triggering logic (e.g., some tasks should be triggered by failed jobs)
Retrying tasks is a first-class operation
Run-once guarantees
Chris White Workflows on Dask February 26, 2020 2 / 6
9. Workflows
What do we mean by ”Workflows”?
Broadly speaking, when we talk about “workflow semantics” we mean:
“Tasks” represent units of business logic
Identification of failure (alerting)
Recovery from failure
Triggering logic (e.g., some tasks should be triggered by failed jobs)
Retrying tasks is a first-class operation
Run-once guarantees
Audit trails, lineage, access controls, an API
Chris White Workflows on Dask February 26, 2020 2 / 6
10. Workflows
What do we mean by ”Workflows”?
Broadly speaking, when we talk about “workflow semantics” we mean:
“Tasks” represent units of business logic
Identification of failure (alerting)
Recovery from failure
Triggering logic (e.g., some tasks should be triggered by failed jobs)
Retrying tasks is a first-class operation
Run-once guarantees
Audit trails, lineage, access controls, an API
Scheduling features for both batch and ad-hoc runs
Chris White Workflows on Dask February 26, 2020 2 / 6
11. Workflows
What do we mean by ”Workflows”?
Broadly speaking, when we talk about “workflow semantics” we mean:
“Tasks” represent units of business logic
Identification of failure (alerting)
Recovery from failure
Triggering logic (e.g., some tasks should be triggered by failed jobs)
Retrying tasks is a first-class operation
Run-once guarantees
Audit trails, lineage, access controls, an API
Scheduling features for both batch and ad-hoc runs
... and many more
Chris White Workflows on Dask February 26, 2020 2 / 6
13. Workflows
Where does Dask come in?
Asynchronous scheduling of tasks
Chris White Workflows on Dask February 26, 2020 3 / 6
14. Workflows
Where does Dask come in?
Asynchronous scheduling of tasks
Parallelizing task execution
Chris White Workflows on Dask February 26, 2020 3 / 6
15. Workflows
Where does Dask come in?
Asynchronous scheduling of tasks
Parallelizing task execution
Distributing task execution
Chris White Workflows on Dask February 26, 2020 3 / 6
16. Workflows
Where does Dask come in?
Asynchronous scheduling of tasks
Parallelizing task execution
Distributing task execution
Submitting tasks to heterogeneous workers (worker resources)
Chris White Workflows on Dask February 26, 2020 3 / 6
17. Workflows
Where does Dask come in?
Asynchronous scheduling of tasks
Parallelizing task execution
Distributing task execution
Submitting tasks to heterogeneous workers (worker resources)
Creating of clusters on-demand / per-run (dask-kubernetes)
Chris White Workflows on Dask February 26, 2020 3 / 6
18. Workflows
Where does Dask come in?
Asynchronous scheduling of tasks
Parallelizing task execution
Distributing task execution
Submitting tasks to heterogeneous workers (worker resources)
Creating of clusters on-demand / per-run (dask-kubernetes)
... all off the shelf
Chris White Workflows on Dask February 26, 2020 3 / 6
22. Some fun problems
Complications Opportunities
Approximately half of our community is not familiar with Dask
Chris White Workflows on Dask February 26, 2020 6 / 6
23. Some fun problems
Complications Opportunities
Approximately half of our community is not familiar with Dask
Dask is more willing to rerun tasks
Chris White Workflows on Dask February 26, 2020 6 / 6
24. Some fun problems
Complications Opportunities
Approximately half of our community is not familiar with Dask
Dask is more willing to rerun tasks
Sharing futures between Clients would be great
Chris White Workflows on Dask February 26, 2020 6 / 6
25. Some fun problems
Complications Opportunities
Approximately half of our community is not familiar with Dask
Dask is more willing to rerun tasks
Sharing futures between Clients would be great
Prefect currently submits large payloads to the scheduler
Chris White Workflows on Dask February 26, 2020 6 / 6
26. Some fun problems
Complications Opportunities
Approximately half of our community is not familiar with Dask
Dask is more willing to rerun tasks
Sharing futures between Clients would be great
Prefect currently submits large payloads to the scheduler
Creating Dask aware objects is hard
Chris White Workflows on Dask February 26, 2020 6 / 6
27. Some fun problems
Complications Opportunities
Approximately half of our community is not familiar with Dask
Dask is more willing to rerun tasks
Sharing futures between Clients would be great
Prefect currently submits large payloads to the scheduler
Creating Dask aware objects is hard
Resource configuration is an art not a science
Chris White Workflows on Dask February 26, 2020 6 / 6
28. Some fun problems
Complications Opportunities
Approximately half of our community is not familiar with Dask
Dask is more willing to rerun tasks
Sharing futures between Clients would be great
Prefect currently submits large payloads to the scheduler
Creating Dask aware objects is hard
Resource configuration is an art not a science
Prefect abdicates process control to Dask
Chris White Workflows on Dask February 26, 2020 6 / 6