MLflow 1.0 is coming soon as the first stable release of MLflow. It also packs many cleanups and improvements, such as simpler metadata management, search APIs and HDFS support. In this talk, we’ll present these new features in detail, and then discuss additional MLflow components that Databricks and other companies are working on for the rest of 2019. These new tools include a model registry to share and track models, as well as a multi-step workflow abstraction, both of which were announced at Spark + AI Summit 2019.
10. What Does the 1.0 Release Mean?
API stability of the original components
• Safe to build apps and integrations around them long term
Time to start adding some new features!
10
11. MLflow Components
11
Tracking
Record and query
experiments: code,
data, config, results
Projects
Packaging format
for reproducible runs
on any platform
Models
General model format
that supports diverse
deployment tools
mlflow.org github.com/mlflow twitter.com/MLflowdatabricks.com/mlflow
13. Key Concepts in Tracking
Parameters: key-value inputs to your code
Metrics: numeric values (can update over time)
Artifacts: arbitrary files, including models
Source: what code ran?
16. Model Format
Flavor 2Flavor 1
Run Sources
Inference Code
Batch & Stream Scoring
Cloud Serving Tools
MLflow Models
Simple model flavors
usable by many tools
17. Example MLflow Model
my_model/
├── MLmodel
│
│
│
│
│
└── estimator/
├── saved_model.pb
└── variables/
...
Usable by tools that understand
TensorFlow model format
Usable by any tool that can run
Python (Docker, Spark, etc!)
run_id: 769915006efd4c4bbd662461
time_created: 2018-06-28T12:34
flavors:
tensorflow:
saved_model_dir: estimator
signature_def_key: predict
python_function:
loader_module: mlflow.tensorflow
18. MLflow Components
18
Tracking
Record and query
experiments: code,
data, config, results
Projects
Packaging format
for reproducible runs
on any platform
Models
General model format
that supports diverse
deployment tools
mlflow.org github.com/mlflow twitter.com/MLflowdatabricks.com/mlflow
20. Selected New Features in MLflow 1.0
• Support for logging metrics per user-defined step
• Improved search
• HDFS support for artifacts
• ONNX Model Flavor [experimental]
• Deploying an MLflow Model as a Docker Image [experimental]
21. Support for logging metrics per user-defined step
Metrics logged at the end of a run, e.g.:
● Overall accuracy
● Overall AUC
● Overall loss
Metrics logged while training, e.g.:
● Accuracy per minibatch
● AUC per minibatch
● Loss per minibatch
Currently visualized by logging order:
22. Support for logging metrics per user-defined step
New step argument for log_metric
● Define the x coordinate for the metric
● Define ordering and scale of the horizontal axis in visualizations
log_metric ("exp", 1, 10)
log_metric ("exp", 2, 1000)
log_metric ("exp", 4, 10000)
log_metric ("exp", 8, 100000)
log_metric ("exp", 16, 1000000)
log_metric(key, value, step=None)
23. Improved Search
Search API supports a simplified version of the SQL WHERE clause, e.g.:
params.model = "LogisticRegression" and metrics.error <= 0.05
24. Improved Search
Search API supports a simplified version of the SQL WHERE clause, e.g.:
params.model = "LogisticRegression" and metrics.error <= 0.05
all_experiments = [exp.experiment_id for
exp in MlflowClient().list_experiments()]
runs = MlflowClient().search_runs(
all_experiments,
"params.model='LogisticRegression'"
" and metrics.error<=0.05",
ViewType.ALL)
Python API Example
25. Improved Search
Search API supports a simplified version of the SQL WHERE clause, e.g.:
Python API Example UI Example
all_experiments = [exp.experiment_id for
exp in MlflowClient().list_experiments()]
runs = MlflowClient().search_runs(
all_experiments,
"params.model='LogisticRegression'"
" and metrics.error<=0.05",
ViewType.ALL)
params.model = "LogisticRegression" and metrics.error <= 0.05
26. HDFS Support for Artifacts
mlflow.log_artifact(local_path, artifact_path=None)
AWS S3 Azure Blob
Store
Google Cloud
Storage
HDFS● DBFS
● NFS
● FTP
● SFTP
Supported Artifact Stores
27. ONNX Model Flavor
[Experimental]
ONNX models export both
• ONNX native format
• Pyfunc
mlflow.onnx.load_model(model_uri)
mlflow.onnx.log_model(onnx_model, artifact_path, conda_env=None)
mlflow.onnx.save_model(onnx_model, path, conda_env=None,
mlflow_model=<mlflow.models.Model object>)
Supported Model Flavors
Scikit TensorFlow MLlib H2O PyTorch Keras MLeap
Python
Function
R FunctionONNX
28. Docker Build
[Experimental]
$ mlflow models build-docker -m "runs:/some-run-uuid/my-model" -n "my-image-name"
$ docker run -p 5001:8080 "my-image-name"
Builds a Docker image whose default entrypoint serves the
specified MLflow model at port 8080 within the container.
31. What’s coming soon
• New component: Model Registry
• Version-controlled registry of models
• Model lifecycle management
• Model monitoring
32. What’s coming soon
• New component: Model Registry
• Version-controlled registry of models
• Model lifecycle management
• Model monitoring
• Auto-logging from common frameworks
33. What’s coming soon
• New component: Model Registry
• Version-controlled registry of models
• Model lifecycle management
• Model monitoring
• Auto-logging from common frameworks
• Parallel coordinates plot
34. What’s coming soon
• New component: Model Registry
• Version-controlled registry of models
• Model lifecycle management
• Model monitoring
• Auto-logging from common frameworks
• Parallel coordinates plot
• Kubernetes remote run
35. What’s coming soon
• New component: Model Registry
• Version-controlled registry of models
• Model lifecycle management
• Model monitoring
• Auto-logging from common frameworks
• Parallel coordinates plot
• Kubernetes remote run
• Delta Lake integration (Delta.io) for Data Versioning
36. What’s coming soon
• New component: Model Registry
• Version-controlled registry of models
• Model lifecycle management
• Model monitoring
• Auto-logging from common frameworks
• Parallel coordinates plot
• Kubernetes remote run
• Delta Lake integration (Delta.io) for Data Versioning
• And more...