SlideShare a Scribd company logo
1 of 50
Download to read offline
AI pipelines powered by Jupyter notebooks
Luciano Resende
Open Source AI Platform Architect
@lresende1975
About me - Luciano Resende
Open Source AI Platform Architect – IBM – CODAIT
• Senior Technical Staff Member at IBM, contributing to open source for over 10 years
• Currently contributing to : Jupyter Notebook ecosystem, Apache Bahir, Apache
Toree, Apache Spark among other projects related to AI/ML platforms
lresende@us.ibm.com
https://www.linkedin.com/in/lresende
@lresende1975
https://github.com/lresende
IBM Developer / © 2019 IBM Corporation 2
IBM Open Source Participation
IBM Developer / © 2019 IBM Corporation
Learn
Open Source @ IBM
Program touches
78,000
IBMers annually
Consume
Virtually all
IBM products
contain some
open source
• 40,363 pkgs
Per Year
Contribute
• >62K OS Certs
per year
• ~10K IBM
commits per
month
Connect
> 1000
active IBM
Contributors
Working in key OS
projects
3
IBM Open Source
Participation
IBM generated open source innovation
• 137 IBM Open Code projects w/1000+ Github
projects
• Projects graduates into full open governance:
Node-Red, OpenWhisk, SystemML, Blockchain
fabric among others
• developer.ibm.com/code/open/code/
Community
• IBM focused on 18 strategic communities
• Drive open governance in “Centers of Gravity”
• IBM Leaders drive key technologies and assure
freedom of action
The IBM OS Way is now open sourced
• Training, Recognition, Tooling
• Organization, Consuming, Contributing
4IBM Developer / © 2019 IBM Corporation
Technology leaders do more than just consume OSS
19
1998
“For more than 20 years, IBM and Red Hat have paved the
way for open communities to power innovative IT solutions.”
– Red Hat
Long IBM history of actively fostering balanced community participation
5
© 2019 IBM Corporation
Center for Open Source
Data and AI
Technologies
6
CODAIT aims to make AI solutions
dramatically easier to create, deploy,
and manage in the enterprise
Relaunch of the Spark Technology
Center (STC) to reflect expanded
mission
6IBM Developer / © 2019 IBM Corporation
CODAIT
codait.org
codait (French)
= coder/coded
https://m.interglot.com/fr/en/codait
IBM Data Asset eXchange (DAX)
7
• Curated free and open datasets under open data licenses
• Standardized dataset formats and metadata
• Ready for use in enterprise AI applications
• Complement to the Model Asset eXchange (MAX)
Data Asset eXchange
ibm.biz/data-asset-exchange
Model Asset eXchange
ibm.biz/model-exchange
AGENDA
Jupyter Notebooks
Analytic Workloads Pipelines
• IPython %run magic
• Jupyter NBConverter
• Papermill
• Apache Flow
AI/Deep Learning Workloads Pipelines
• AI Platforms
• Kubeflow and Kubeflow Pipelines
Announcements
Resources
IBM Developer / © 2019 IBM Corporation 8
Jupyter Notebooks
9IBM Developer / © 2019 IBM Corporation
Jupyter Notebooks
Notebooks are interactive
computational environments, in
which you can combine code
execution, rich text, mathematics,
plots and rich media.
10IBM Developer / © 2019 IBM Corporation
Jupyter Notebook
11
Simple, but Powerful
As simple as opening a web
page, with the capabilities of
a powerful, multilingual,
development environment.
Interactive widgets
Code can produce rich
outputs such as images,
videos, markdown, LaTeX
and JavaScript. Interactive
widgets can be used to
manipulate and visualize
data in real-time.
Language of choice
Jupyter Notebooks have
support for over 50
programming languages,
including those popular in
Data Science, Data
Engineer, and AI such as
Python, R, Julia and Scala.
Big Data Integration
Leverage Big Data platforms
such as Apache Spark from
Python, R and Scala.
Explore the same data with
pandas, scikit-learn,
ggplot2, dplyr, etc.
Share Notebooks
Notebooks can be shared
with others using e-mail,
Dropbox, Google Drive,
GitHub, etc
Jupyter Notebook Platform Architecture
Notebook UI runs on the browser
The Notebook Server serves the
’Notebooks’
Kernels interpret/execute cell contents
– Are responsible for code execution
– Abstracts different languages
– 1:1 relationship with Notebook
– Runs and consume resources as long as
notebook is running
12IBM Developer / © 2019 IBM Corporation
Jupyter Notebook
Analytic Workloads
13IBM Developer / © 2019 IBM Corporation
Analytic Workloads
Large amount of data
Shared across organization in Data
Lakes
Multiple workload types
– Data cleansing
– Data Warehouse
– Machine Learning and Insights
14IBM Developer / © 2019 IBM Corporation
Analytic Workloads
Decompose Schedule/Run
Homegrown pipelines
16IBM Developer / © 2019 IBM Corporation
Notebook Pipelines
using %run
%run built-in IPython magic
- Enables execution of notebooks or python
scripts
IBM Developer / © 2019 IBM Corporation 17
Notebook
Orchestrator
%run
%run
%run
Notebook Pipelines
using %run
%run built-in IPython magic
- Enables execution of notebooks or
python scripts
Limitations
- Available in the IPython kernel only
- Static
- No command line integration
IBM Developer / © 2019 IBM Corporation 18
Notebook Pipelines
using NBConvert
IBM Developer / © 2019 IBM Corporation 19
input
notebook(s)
orchestrator
result_1.ipynb result_2.ipynb
result_3.html result_4.pdf
output file(s)
ipynb, html, pdf
NBConvert
Jupyter NBConvert
https://nbconvert.readthedocs.io/en/latest/
Jupyter NBConvert enables executing
and converting notebooks to different
file formats.
Notebook Pipelines
using NBConvert
$ pip install nbconvert
$ jupyter nbconvert --to html --execute overview_with_run.ipynb
[NbConvertApp] Converting notebook overview_with_run.ipynb to html
[NbConvertApp] Executing notebook with kernel: python3
[NbConvertApp] Writing 300558 bytes to overview_with_run.html
$ open overview_with_run.html
IBM Developer / © 2019 IBM Corporation 20
Jupyter NBConvert
https://nbconvert.readthedocs.io/en/latest/
Jupyter NBConvert enables executing
and converting notebooks to different
file formats.
Advantages
– Support notebook chaining
– Convert results to immutable formats
Limitations
– No support for parameters
Notebook Pipelines
with Papermill
21IBM Developer / © 2019 IBM Corporation
Papermill
Papermill is an open source tool
contributed by Netflix which enables
parameterizing, executing, and
analyzing Jupyter Notebooks.
Papermill lets you:
- Parameterize notebooks
- Execute notebooks
IBM Developer / © 2019 IBM Corporation 22
input
notebook
orchestrator
result_1.ipynb result_2.ipynb
result_3.html result_4.pdf
output file(s)
ipynb, html, pdf
Papermill
Papermill provides programmatic
interface so you can integrate with your
applications
IBM Developer / © 2019 IBM Corporation 23
import papermill as pm
pm.execute_notebook('input_nb.ipynb',
'outputs/20190402_run.ipynb')
...
# Each run can be placed in a unique / sortable path
pprint(files_in_directory('outputs'))
outputs/ ...
20190401_run.ipynb
20190402_run.ipynb
Papermill
Papermill provides a CLI that enables
easy integration with external tools and
simple schedulers as crontab.
IBM Developer / © 2019 IBM Corporation 24
$ papermill input_notebook.ipynb
outputs/{run_id}_out.ipynb
$ papermill input.ipynb report.ipynb -y '{"foo":"bar"}' &&
jupyter nbconvert --to html report.ipynb
Notebook Pipelines with
Apache Airflow
25IBM Developer / © 2019 IBM Corporation
Apache Airflow
Airflow is a platform to
programmatically author, schedule and
monitor workflows. It’s enterprise
ready and used to build large and
complex workload pipelines.
IBM Developer / © 2019 IBM Corporation 26
Python Code
DAG
(Workflow)
Apache Airflow
Airflow is a platform to
programmatically author, schedule and
monitor workflows. It’s enterprise
ready and used to build large and
complex workload pipelines.
Airflow Papermill operator enables
Jupyter Notebooks to be integrated into
Airflow workflows/pipelines.
IBM Developer / © 2019 IBM Corporation 27
More information à https://airflow.readthedocs.io/en/latest/howto/operator/papermill.html
Analytic Workloads
Decompose Schedule/Run
Analytic Workloads
Analytic Workloads Pipelines Summary
%run NBConvert Papermill Apache
Airflow
Notebook Kernels IPython Multiple Multiple Multiple
Static versus Dynamic Static Dynamic Dynamic Dynamic
Programmatic APIs Yes Yes
Notebook Parameters Yes Yes
Heterogeneous pipelines/workflows Yes
Jupyter Notebook
AI / Deep Learning Workloads
31IBM Developer / © 2019 IBM Corporation
AI / Deep Learning Workloads
Resource intensive workloads
Requires expensive hardware (GPU,
TPU)
Long Running training jobs
– Simple MINIST takes over one hour
WITHOUT a decent GPU
– Other non complex deep learning
model training can easily take over a
dat WITH GPUs
32IBM Developer / © 2019 IBM Corporation
Training/Deploying Models requires a lot of DevOPS
33
Model Serving
Monitoring
Resource
Management
Configuration
Hyperparameter
Optimization
Reproducibility
IBM Developer / © 2019 IBM Corporation
AI / Deep Learning Workloads Challenges
• How to isolate the training environments to multiple jobs,
based on different deep learning frameworks (and/or
releases) can be submitted/trained on the same time.
• Ability to allocate individual system level resources such as
GPUs, TPUs, etc with different kernels for a period of time.
• Ability to allocate and free up system level resources such as
GPUs, TPUs, etc as they stop being used or when they are idle
for a period of time.
IBM Developer / © 2019 IBM Corporation 34
AI / Deep Learning Workloads
Source: https://github.com/Langhalsdino/Kubernetes-GPU-Guide
IBM Developer / © 2019 IBM Corporation 35
Containers and Kubernetes Platform
- Containers simplify management of
complicated and heterogenous AI/Deep
Learning infrastructure providing a required
isolation layer to different pods running
different Deep Learning frameworks
- Containers provides a flexible way to deploy
applications and are here to stay
- Kubernetes enables easy management of
containerized applications and resources
with the benefit of Elasticity and Quality of
Services
AI Platforms
AI/Deep Learning Platforms aim to
abstract the DevOPS tasks from the
Data Scientist providing a consistent
way to develop AI models independent
of the toolkit/framework being used.
IBM Developer / © 2019 IBM Corporation 36
FfDL
Kubeflow
• ML Toolkit for Kubernetes
• Open source and community driven
• Support multiple ML Frameworks
• End-to-end workflows that can be
shared, scaled and deployed
IBM Developer / © 2019 IBM Corporation 37
Kubeflow Pipelines
Kubeflow Pipelines is a platform for
building and deploying portable,
scalable machine learning (ML)
workflows based on Docker containers.
• End-to-end orchestration: enabling and simplifying the
orchestration of machine learning pipelines.
• Easy experimentation: making it easy for you to try
numerous ideas and techniques and manage your
various trials/experiments.
• Easy re-use: enabling you to re-use components and
pipelines to quickly create end-to-end solutions without
having to rebuild each time.
IBM Developer / © 2019 IBM Corporation 38
Kubeflow Pipelines
IBM Developer / © 2019 IBM Corporation 39
Two key takeaways : A Pipeline and a
Pipeline Component
A pipeline is a description of a machine
learning (ML) workflow, including all of
the components of the workflow and
how they work together.
Kubeflow Pipelines
IBM Developer / © 2019 IBM Corporation 40
A pipeline component is an
implementation of a pipeline task.
A component represents a step in the
workflow.
Kubeflow Pipelines
IBM Developer / © 2019 IBM Corporation 41
Each pipeline component is a container
that contains a program to perform the
task required for that particular step of
your workflow.
Kubeflow Pipelines
IBM Developer / © 2019 IBM Corporation 42
AI Workloads and Kubeflow Pipelines
Decompose Schedule/Run
Learn more about Kubeflow Pipelines
IBM Developer / © 2019 IBM Corporation 44
Building a secure and transparent ML pipeline
using open source technologies
Animesh Singh (IBM), Svetlana Levitan (IBM), Tommy Li (IBM)
1:30pm–5:00pm Tuesday, July 16, 2019
Incorporating Artificial Intelligence
Location: C123-124
Community Announcements
IBM Developer / © 2019 IBM Corporation 45
Jupyter Notebook 6.0
Release Availability
pip install --upgrade notebook
Community Resources
IBM Developer / © 2019 IBM Corporation 46
Jupyter.org
https://jupyter.org/
JupyterLab
https://jupyterlab.readthedocs.io/en/stable/
Papermill
https://github.com/nteract/papermill
Kubeflow
https://kubeflow.org
https://github.com/kubeflow/
Thank you!
@lresende1975
47IBM Developer / © 2019 IBM Corporation
Fabric for
Deep Learning
FfDL provides a scalable, resilient, and
fault tolerant deep-learning framework
• Fabric for Deep Learning or FfDL (pronounced as ‘fiddle’) is an
open source project which aims at making Deep Learning easily
accessible to the people it matters the most i.e. Data Scientists,
and AI developers.
• FfDL Provides a consistent way to deploy, train and visualize Deep
Learning jobs across multiple frameworks like TensorFlow, Caffe,
PyTorch, Keras etc.
• FfDL is being developed in close collaboration with IBM Research
and IBM Watson. It forms the core of Watson`s Deep Learning
service in open source.
IBM Developer / © 2019 IBM Corporation 48
FfDL Github Page
https://github.com/IBM/FfDL
FfDL Technical Architecture Blog
http://developer.ibm.com/code/2018/03/20/democratize-ai-with-
fabric-for-deep-learning
Deep Learning as a Service within Watson Studio
https://www.ibm.com/cloud/deep-learning
Research paper: “Scalable Multi-Framework Management of
Deep Learning Training Jobs”
http://learningsys.org/nips17/assets/papers/paper_29.pdf
FfDL
48
49
FfDL: Architecture
2018 / © 2018 IBM Corporation
50
https://arxiv.org/abs/1709.05871
FfDL: Research Papers
2018 / © 2018 IBM Corporation

More Related Content

What's hot

AWS Enterprise First Call Deck
AWS Enterprise First Call DeckAWS Enterprise First Call Deck
AWS Enterprise First Call Deck
Alexandre Melo
 
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks Delta
Databricks
 
Find the red dot
Find the red dotFind the red dot
Find the red dot
sumit270
 
Mainframe Integration, Offloading and Replacement with Apache Kafka
Mainframe Integration, Offloading and Replacement with Apache KafkaMainframe Integration, Offloading and Replacement with Apache Kafka
Mainframe Integration, Offloading and Replacement with Apache Kafka
Kai Wähner
 

What's hot (20)

A glimpse of cassandra 4.0 features netflix
A glimpse of cassandra 4.0 features   netflixA glimpse of cassandra 4.0 features   netflix
A glimpse of cassandra 4.0 features netflix
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare Industry
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Deploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and KubernetesDeploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and Kubernetes
 
AWS Enterprise First Call Deck
AWS Enterprise First Call DeckAWS Enterprise First Call Deck
AWS Enterprise First Call Deck
 
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
 
Living the AWS Well Architected Framework
Living the AWS Well Architected FrameworkLiving the AWS Well Architected Framework
Living the AWS Well Architected Framework
 
Apache Kafka in Financial Services - Use Cases and Architectures
Apache Kafka in Financial Services - Use Cases and ArchitecturesApache Kafka in Financial Services - Use Cases and Architectures
Apache Kafka in Financial Services - Use Cases and Architectures
 
Dynamic AX : Application Integration Framework
Dynamic AX : Application Integration FrameworkDynamic AX : Application Integration Framework
Dynamic AX : Application Integration Framework
 
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks Delta
 
Find the red dot
Find the red dotFind the red dot
Find the red dot
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
 
Mainframe Integration, Offloading and Replacement with Apache Kafka
Mainframe Integration, Offloading and Replacement with Apache KafkaMainframe Integration, Offloading and Replacement with Apache Kafka
Mainframe Integration, Offloading and Replacement with Apache Kafka
 
DevOps for Applications in Azure Databricks: Creating Continuous Integration ...
DevOps for Applications in Azure Databricks: Creating Continuous Integration ...DevOps for Applications in Azure Databricks: Creating Continuous Integration ...
DevOps for Applications in Azure Databricks: Creating Continuous Integration ...
 
Streaming all over the world Real life use cases with Kafka Streams
Streaming all over the world  Real life use cases with Kafka StreamsStreaming all over the world  Real life use cases with Kafka Streams
Streaming all over the world Real life use cases with Kafka Streams
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
 
Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...
Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...
Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...
 
An Agile Approach to Accelerate Mass Migration
An Agile Approach to Accelerate Mass MigrationAn Agile Approach to Accelerate Mass Migration
An Agile Approach to Accelerate Mass Migration
 
Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...
Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...
Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...
 
Managed Feature Store for Machine Learning
Managed Feature Store for Machine LearningManaged Feature Store for Machine Learning
Managed Feature Store for Machine Learning
 

Similar to Ai pipelines powered by jupyter notebooks

Notebook-based AI Pipelines with Elyra and Kubeflow
Notebook-based AI Pipelines with Elyra and KubeflowNotebook-based AI Pipelines with Elyra and Kubeflow
Notebook-based AI Pipelines with Elyra and Kubeflow
Nick Pentreath
 
Deploying End-to-End Deep Learning Pipelines with ONNX
Deploying End-to-End Deep Learning Pipelines with ONNXDeploying End-to-End Deep Learning Pipelines with ONNX
Deploying End-to-End Deep Learning Pipelines with ONNX
Databricks
 
Continuous Deployment for Deep Learning
Continuous Deployment for Deep LearningContinuous Deployment for Deep Learning
Continuous Deployment for Deep Learning
Databricks
 

Similar to Ai pipelines powered by jupyter notebooks (20)

Strata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise GatewayStrata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
 
Scaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsScaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloads
 
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
 
Notebook-based AI Pipelines with Elyra and Kubeflow
Notebook-based AI Pipelines with Elyra and KubeflowNotebook-based AI Pipelines with Elyra and Kubeflow
Notebook-based AI Pipelines with Elyra and Kubeflow
 
Building Notebook-based AI Pipelines with Elyra and Kubeflow
Building Notebook-based AI Pipelines with Elyra and KubeflowBuilding Notebook-based AI Pipelines with Elyra and Kubeflow
Building Notebook-based AI Pipelines with Elyra and Kubeflow
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for Code
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examples
 
Deploying End-to-End Deep Learning Pipelines with ONNX
Deploying End-to-End Deep Learning Pipelines with ONNXDeploying End-to-End Deep Learning Pipelines with ONNX
Deploying End-to-End Deep Learning Pipelines with ONNX
 
Continuous Deployment for Deep Learning
Continuous Deployment for Deep LearningContinuous Deployment for Deep Learning
Continuous Deployment for Deep Learning
 
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
 
Using Elyra for COVID-19 Analytics
Using Elyra for COVID-19 AnalyticsUsing Elyra for COVID-19 Analytics
Using Elyra for COVID-19 Analytics
 
IBM Keynote presentation, OW2con'19, June 12-13, 2019, Paris.
IBM Keynote presentation, OW2con'19, June 12-13, 2019, Paris.IBM Keynote presentation, OW2con'19, June 12-13, 2019, Paris.
IBM Keynote presentation, OW2con'19, June 12-13, 2019, Paris.
 
Social Applications made easy with the new Social Business Toolkit SDK
Social Applications made easy with the new Social Business Toolkit SDKSocial Applications made easy with the new Social Business Toolkit SDK
Social Applications made easy with the new Social Business Toolkit SDK
 
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and DockerFast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
 
End-to-End Deep Learning Deployment with ONNX
End-to-End Deep Learning Deployment with ONNXEnd-to-End Deep Learning Deployment with ONNX
End-to-End Deep Learning Deployment with ONNX
 
"Using TensorFlow Lite to Deploy Deep Learning on Cortex-M Microcontrollers,"...
"Using TensorFlow Lite to Deploy Deep Learning on Cortex-M Microcontrollers,"..."Using TensorFlow Lite to Deploy Deep Learning on Cortex-M Microcontrollers,"...
"Using TensorFlow Lite to Deploy Deep Learning on Cortex-M Microcontrollers,"...
 
IBM Connect 2014 - KEY108: IBM Collaboration Solutions Application Developmen...
IBM Connect 2014 - KEY108: IBM Collaboration Solutions Application Developmen...IBM Connect 2014 - KEY108: IBM Collaboration Solutions Application Developmen...
IBM Connect 2014 - KEY108: IBM Collaboration Solutions Application Developmen...
 
Connect 2014 - Key108 - Application Development Strategy
Connect 2014 - Key108  - Application Development StrategyConnect 2014 - Key108  - Application Development Strategy
Connect 2014 - Key108 - Application Development Strategy
 
IBM Bluemix Workshop version 3
IBM Bluemix Workshop version 3IBM Bluemix Workshop version 3
IBM Bluemix Workshop version 3
 

More from Luciano Resende

Data access layer and schema definitions
Data access layer and schema definitionsData access layer and schema definitions
Data access layer and schema definitions
Luciano Resende
 
Building RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RSBuilding RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RS
Luciano Resende
 

More from Luciano Resende (20)

A Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfA Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdf
 
Jupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway OverviewJupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway Overview
 
IoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache BahirIoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache Bahir
 
Getting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache BahirGetting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache Bahir
 
Building analytical microservices powered by jupyter kernels
Building analytical microservices powered by jupyter kernelsBuilding analytical microservices powered by jupyter kernels
Building analytical microservices powered by jupyter kernels
 
Building iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache BahirBuilding iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache Bahir
 
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkAn Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
 
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
 
What's new in Apache SystemML - Declarative Machine Learning
What's new in Apache SystemML  - Declarative Machine LearningWhat's new in Apache SystemML  - Declarative Machine Learning
What's new in Apache SystemML - Declarative Machine Learning
 
Big analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayBig analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel Gateway
 
Jupyter con meetup extended jupyter kernel gateway
Jupyter con meetup   extended jupyter kernel gatewayJupyter con meetup   extended jupyter kernel gateway
Jupyter con meetup extended jupyter kernel gateway
 
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
 
How mentoring can help you start contributing to open source
How mentoring can help you start contributing to open sourceHow mentoring can help you start contributing to open source
How mentoring can help you start contributing to open source
 
SystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningSystemML - Declarative Machine Learning
SystemML - Declarative Machine Learning
 
Luciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conference
 
Asf icfoss-mentoring
Asf icfoss-mentoringAsf icfoss-mentoring
Asf icfoss-mentoring
 
Open Source tools overview
Open Source tools overviewOpen Source tools overview
Open Source tools overview
 
Data access layer and schema definitions
Data access layer and schema definitionsData access layer and schema definitions
Data access layer and schema definitions
 
How mentoring programs can help newcomers get started with open source
How mentoring programs can help newcomers get started with open sourceHow mentoring programs can help newcomers get started with open source
How mentoring programs can help newcomers get started with open source
 
Building RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RSBuilding RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RS
 

Recently uploaded

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
vexqp
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
cnajjemba
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
vexqp
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 

Recently uploaded (20)

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 

Ai pipelines powered by jupyter notebooks

  • 1. AI pipelines powered by Jupyter notebooks Luciano Resende Open Source AI Platform Architect @lresende1975
  • 2. About me - Luciano Resende Open Source AI Platform Architect – IBM – CODAIT • Senior Technical Staff Member at IBM, contributing to open source for over 10 years • Currently contributing to : Jupyter Notebook ecosystem, Apache Bahir, Apache Toree, Apache Spark among other projects related to AI/ML platforms lresende@us.ibm.com https://www.linkedin.com/in/lresende @lresende1975 https://github.com/lresende IBM Developer / © 2019 IBM Corporation 2
  • 3. IBM Open Source Participation IBM Developer / © 2019 IBM Corporation Learn Open Source @ IBM Program touches 78,000 IBMers annually Consume Virtually all IBM products contain some open source • 40,363 pkgs Per Year Contribute • >62K OS Certs per year • ~10K IBM commits per month Connect > 1000 active IBM Contributors Working in key OS projects 3
  • 4. IBM Open Source Participation IBM generated open source innovation • 137 IBM Open Code projects w/1000+ Github projects • Projects graduates into full open governance: Node-Red, OpenWhisk, SystemML, Blockchain fabric among others • developer.ibm.com/code/open/code/ Community • IBM focused on 18 strategic communities • Drive open governance in “Centers of Gravity” • IBM Leaders drive key technologies and assure freedom of action The IBM OS Way is now open sourced • Training, Recognition, Tooling • Organization, Consuming, Contributing 4IBM Developer / © 2019 IBM Corporation
  • 5. Technology leaders do more than just consume OSS 19 1998 “For more than 20 years, IBM and Red Hat have paved the way for open communities to power innovative IT solutions.” – Red Hat Long IBM history of actively fostering balanced community participation 5 © 2019 IBM Corporation
  • 6. Center for Open Source Data and AI Technologies 6 CODAIT aims to make AI solutions dramatically easier to create, deploy, and manage in the enterprise Relaunch of the Spark Technology Center (STC) to reflect expanded mission 6IBM Developer / © 2019 IBM Corporation CODAIT codait.org codait (French) = coder/coded https://m.interglot.com/fr/en/codait
  • 7. IBM Data Asset eXchange (DAX) 7 • Curated free and open datasets under open data licenses • Standardized dataset formats and metadata • Ready for use in enterprise AI applications • Complement to the Model Asset eXchange (MAX) Data Asset eXchange ibm.biz/data-asset-exchange Model Asset eXchange ibm.biz/model-exchange
  • 8. AGENDA Jupyter Notebooks Analytic Workloads Pipelines • IPython %run magic • Jupyter NBConverter • Papermill • Apache Flow AI/Deep Learning Workloads Pipelines • AI Platforms • Kubeflow and Kubeflow Pipelines Announcements Resources IBM Developer / © 2019 IBM Corporation 8
  • 9. Jupyter Notebooks 9IBM Developer / © 2019 IBM Corporation
  • 10. Jupyter Notebooks Notebooks are interactive computational environments, in which you can combine code execution, rich text, mathematics, plots and rich media. 10IBM Developer / © 2019 IBM Corporation
  • 11. Jupyter Notebook 11 Simple, but Powerful As simple as opening a web page, with the capabilities of a powerful, multilingual, development environment. Interactive widgets Code can produce rich outputs such as images, videos, markdown, LaTeX and JavaScript. Interactive widgets can be used to manipulate and visualize data in real-time. Language of choice Jupyter Notebooks have support for over 50 programming languages, including those popular in Data Science, Data Engineer, and AI such as Python, R, Julia and Scala. Big Data Integration Leverage Big Data platforms such as Apache Spark from Python, R and Scala. Explore the same data with pandas, scikit-learn, ggplot2, dplyr, etc. Share Notebooks Notebooks can be shared with others using e-mail, Dropbox, Google Drive, GitHub, etc
  • 12. Jupyter Notebook Platform Architecture Notebook UI runs on the browser The Notebook Server serves the ’Notebooks’ Kernels interpret/execute cell contents – Are responsible for code execution – Abstracts different languages – 1:1 relationship with Notebook – Runs and consume resources as long as notebook is running 12IBM Developer / © 2019 IBM Corporation
  • 13. Jupyter Notebook Analytic Workloads 13IBM Developer / © 2019 IBM Corporation
  • 14. Analytic Workloads Large amount of data Shared across organization in Data Lakes Multiple workload types – Data cleansing – Data Warehouse – Machine Learning and Insights 14IBM Developer / © 2019 IBM Corporation
  • 16. Homegrown pipelines 16IBM Developer / © 2019 IBM Corporation
  • 17. Notebook Pipelines using %run %run built-in IPython magic - Enables execution of notebooks or python scripts IBM Developer / © 2019 IBM Corporation 17 Notebook Orchestrator %run %run %run
  • 18. Notebook Pipelines using %run %run built-in IPython magic - Enables execution of notebooks or python scripts Limitations - Available in the IPython kernel only - Static - No command line integration IBM Developer / © 2019 IBM Corporation 18
  • 19. Notebook Pipelines using NBConvert IBM Developer / © 2019 IBM Corporation 19 input notebook(s) orchestrator result_1.ipynb result_2.ipynb result_3.html result_4.pdf output file(s) ipynb, html, pdf NBConvert Jupyter NBConvert https://nbconvert.readthedocs.io/en/latest/ Jupyter NBConvert enables executing and converting notebooks to different file formats.
  • 20. Notebook Pipelines using NBConvert $ pip install nbconvert $ jupyter nbconvert --to html --execute overview_with_run.ipynb [NbConvertApp] Converting notebook overview_with_run.ipynb to html [NbConvertApp] Executing notebook with kernel: python3 [NbConvertApp] Writing 300558 bytes to overview_with_run.html $ open overview_with_run.html IBM Developer / © 2019 IBM Corporation 20 Jupyter NBConvert https://nbconvert.readthedocs.io/en/latest/ Jupyter NBConvert enables executing and converting notebooks to different file formats. Advantages – Support notebook chaining – Convert results to immutable formats Limitations – No support for parameters
  • 21. Notebook Pipelines with Papermill 21IBM Developer / © 2019 IBM Corporation
  • 22. Papermill Papermill is an open source tool contributed by Netflix which enables parameterizing, executing, and analyzing Jupyter Notebooks. Papermill lets you: - Parameterize notebooks - Execute notebooks IBM Developer / © 2019 IBM Corporation 22 input notebook orchestrator result_1.ipynb result_2.ipynb result_3.html result_4.pdf output file(s) ipynb, html, pdf
  • 23. Papermill Papermill provides programmatic interface so you can integrate with your applications IBM Developer / © 2019 IBM Corporation 23 import papermill as pm pm.execute_notebook('input_nb.ipynb', 'outputs/20190402_run.ipynb') ... # Each run can be placed in a unique / sortable path pprint(files_in_directory('outputs')) outputs/ ... 20190401_run.ipynb 20190402_run.ipynb
  • 24. Papermill Papermill provides a CLI that enables easy integration with external tools and simple schedulers as crontab. IBM Developer / © 2019 IBM Corporation 24 $ papermill input_notebook.ipynb outputs/{run_id}_out.ipynb $ papermill input.ipynb report.ipynb -y '{"foo":"bar"}' && jupyter nbconvert --to html report.ipynb
  • 25. Notebook Pipelines with Apache Airflow 25IBM Developer / © 2019 IBM Corporation
  • 26. Apache Airflow Airflow is a platform to programmatically author, schedule and monitor workflows. It’s enterprise ready and used to build large and complex workload pipelines. IBM Developer / © 2019 IBM Corporation 26 Python Code DAG (Workflow)
  • 27. Apache Airflow Airflow is a platform to programmatically author, schedule and monitor workflows. It’s enterprise ready and used to build large and complex workload pipelines. Airflow Papermill operator enables Jupyter Notebooks to be integrated into Airflow workflows/pipelines. IBM Developer / © 2019 IBM Corporation 27 More information à https://airflow.readthedocs.io/en/latest/howto/operator/papermill.html
  • 30. Analytic Workloads Pipelines Summary %run NBConvert Papermill Apache Airflow Notebook Kernels IPython Multiple Multiple Multiple Static versus Dynamic Static Dynamic Dynamic Dynamic Programmatic APIs Yes Yes Notebook Parameters Yes Yes Heterogeneous pipelines/workflows Yes
  • 31. Jupyter Notebook AI / Deep Learning Workloads 31IBM Developer / © 2019 IBM Corporation
  • 32. AI / Deep Learning Workloads Resource intensive workloads Requires expensive hardware (GPU, TPU) Long Running training jobs – Simple MINIST takes over one hour WITHOUT a decent GPU – Other non complex deep learning model training can easily take over a dat WITH GPUs 32IBM Developer / © 2019 IBM Corporation
  • 33. Training/Deploying Models requires a lot of DevOPS 33 Model Serving Monitoring Resource Management Configuration Hyperparameter Optimization Reproducibility IBM Developer / © 2019 IBM Corporation
  • 34. AI / Deep Learning Workloads Challenges • How to isolate the training environments to multiple jobs, based on different deep learning frameworks (and/or releases) can be submitted/trained on the same time. • Ability to allocate individual system level resources such as GPUs, TPUs, etc with different kernels for a period of time. • Ability to allocate and free up system level resources such as GPUs, TPUs, etc as they stop being used or when they are idle for a period of time. IBM Developer / © 2019 IBM Corporation 34
  • 35. AI / Deep Learning Workloads Source: https://github.com/Langhalsdino/Kubernetes-GPU-Guide IBM Developer / © 2019 IBM Corporation 35 Containers and Kubernetes Platform - Containers simplify management of complicated and heterogenous AI/Deep Learning infrastructure providing a required isolation layer to different pods running different Deep Learning frameworks - Containers provides a flexible way to deploy applications and are here to stay - Kubernetes enables easy management of containerized applications and resources with the benefit of Elasticity and Quality of Services
  • 36. AI Platforms AI/Deep Learning Platforms aim to abstract the DevOPS tasks from the Data Scientist providing a consistent way to develop AI models independent of the toolkit/framework being used. IBM Developer / © 2019 IBM Corporation 36 FfDL
  • 37. Kubeflow • ML Toolkit for Kubernetes • Open source and community driven • Support multiple ML Frameworks • End-to-end workflows that can be shared, scaled and deployed IBM Developer / © 2019 IBM Corporation 37
  • 38. Kubeflow Pipelines Kubeflow Pipelines is a platform for building and deploying portable, scalable machine learning (ML) workflows based on Docker containers. • End-to-end orchestration: enabling and simplifying the orchestration of machine learning pipelines. • Easy experimentation: making it easy for you to try numerous ideas and techniques and manage your various trials/experiments. • Easy re-use: enabling you to re-use components and pipelines to quickly create end-to-end solutions without having to rebuild each time. IBM Developer / © 2019 IBM Corporation 38
  • 39. Kubeflow Pipelines IBM Developer / © 2019 IBM Corporation 39 Two key takeaways : A Pipeline and a Pipeline Component A pipeline is a description of a machine learning (ML) workflow, including all of the components of the workflow and how they work together.
  • 40. Kubeflow Pipelines IBM Developer / © 2019 IBM Corporation 40 A pipeline component is an implementation of a pipeline task. A component represents a step in the workflow.
  • 41. Kubeflow Pipelines IBM Developer / © 2019 IBM Corporation 41 Each pipeline component is a container that contains a program to perform the task required for that particular step of your workflow.
  • 42. Kubeflow Pipelines IBM Developer / © 2019 IBM Corporation 42
  • 43. AI Workloads and Kubeflow Pipelines Decompose Schedule/Run
  • 44. Learn more about Kubeflow Pipelines IBM Developer / © 2019 IBM Corporation 44 Building a secure and transparent ML pipeline using open source technologies Animesh Singh (IBM), Svetlana Levitan (IBM), Tommy Li (IBM) 1:30pm–5:00pm Tuesday, July 16, 2019 Incorporating Artificial Intelligence Location: C123-124
  • 45. Community Announcements IBM Developer / © 2019 IBM Corporation 45 Jupyter Notebook 6.0 Release Availability pip install --upgrade notebook
  • 46. Community Resources IBM Developer / © 2019 IBM Corporation 46 Jupyter.org https://jupyter.org/ JupyterLab https://jupyterlab.readthedocs.io/en/stable/ Papermill https://github.com/nteract/papermill Kubeflow https://kubeflow.org https://github.com/kubeflow/
  • 47. Thank you! @lresende1975 47IBM Developer / © 2019 IBM Corporation
  • 48. Fabric for Deep Learning FfDL provides a scalable, resilient, and fault tolerant deep-learning framework • Fabric for Deep Learning or FfDL (pronounced as ‘fiddle’) is an open source project which aims at making Deep Learning easily accessible to the people it matters the most i.e. Data Scientists, and AI developers. • FfDL Provides a consistent way to deploy, train and visualize Deep Learning jobs across multiple frameworks like TensorFlow, Caffe, PyTorch, Keras etc. • FfDL is being developed in close collaboration with IBM Research and IBM Watson. It forms the core of Watson`s Deep Learning service in open source. IBM Developer / © 2019 IBM Corporation 48 FfDL Github Page https://github.com/IBM/FfDL FfDL Technical Architecture Blog http://developer.ibm.com/code/2018/03/20/democratize-ai-with- fabric-for-deep-learning Deep Learning as a Service within Watson Studio https://www.ibm.com/cloud/deep-learning Research paper: “Scalable Multi-Framework Management of Deep Learning Training Jobs” http://learningsys.org/nips17/assets/papers/paper_29.pdf FfDL 48
  • 49. 49 FfDL: Architecture 2018 / © 2018 IBM Corporation