Want to deploy Machine Learning and Deep Learning models using MLflow, REST APIs, or through cloud platforms? Join the top-rated course on Machine Learning and Deep Learning model deployment by clicking here. 404
MLflow provides a convenient way to build end-to-end Machine Learning pipelines in production and in this guide, you will learn everything you need to know about the platform. This means that by the end of this guide, you will be able to easily use MLflow for Machine Learning pipelines — starting from model experimentation to model deployment.
Ready to get started? Let's deep dive into how you can get the best out of MLflow for your next Machine Learning project.
MLflow is an open-source platform for managing the end-to-end machine learning lifecycle or pipeline. It supports multiple Machine Learning libraries, algorithms, deployment tools, and programming languages.
The platform was created by Databricks and has over 10,000 stars on GitHub with over 300+ contributors updating the platform on a daily basis.
The MLflow platform provides four major components:
Based on these components, MLflow is designed to be useful for an individual to a large range of people working as a team. Some of its applications are as follows:
If you're still undecided about learning how to use the platform, you can go over the MLflow components again and figure out if the platform is for you or not.
Remember that MLflow supports multiple programming languages and tools such as R-programming language or Python. It also comes with a graphical user interface that you can access from your browser once you successfully install MLflow.
To keep this guide concise and easy to digest, we'll show you how you can install MLflow to use with Python. To install MLflow, open up your command line/terminal and write the following command:
pip install mlflow
Note: You must have Python installed in your system to use pip which is Python's package manager.
Once you execute the command, MLflow will get installed in your system. You can check if the installation is successful or not by importing MLflow in Python using the following line of code:
import mlflow
If this line of Python code doesn't give you an error, then, you've successfully installed MLflow to use with Python.
MLflow Tracking is used to keep track of each individual code run in an experiment. By definition, a 'run' is the individual execution of a code of a model whereas an 'experiment' is a named group of runs.
Here's a list of all the things that you can track using MLflow:
Once these pieces of information are recorded, they can be queried using the MLflow Tracking UI (user interface) or MLflow Python API.
Given below is how you can track metrics using MLflow and Python:
# Importing the os library to work with operating system functionalities import os # Importing tracking functions from MLflow from mlflow import log_metric, log_param, log_artifacts # Logging a parameter (key-value pair) log_param("param1", 0) # Logging a metric; metrics can be updated throughout the run log_metric("foo", 100) log_metric("foo", 200) log_metric("foo", 300) # Create a file called test.text in outputs directory if not os.path.exists("outputs"): os.makedirs("outputs") with open("outputs/test.txt", "w") as f: f.write("hello world!") # Logging an artifact (output file) log_artifacts("outputs")
Here, the three imported functions do the following thing:
In the above code, we logged all of the given parameters and metrics as key-value pairs and the artifacts in the outputs
directory using MLflow. To view these logged information, we can use the MLflow Tracking UI.
The MLflow Tracking UI is a user interface created by MLFlow that displays your tracked information. You can view the user interface created by the MLflow by writing the following command in your command line/terminal and going to the IP address that comes as an output:
mlflow ui
By default, the IP address for your MLflow UI is present at http://127.0.0.1:5000.
As you can see, we have an experiment ID created by default and our runs in the given UI.
Also, you can browse each individual run and see the tracked parameters, metrics, and artifacts associated with the run.
Here, you can see the parameter that we recorded param1
, the metric we recorded and updated foo
and the artifact we logged from the outputs directory test.txt
.
Now, that you know how to track information in MLflow, you can try creating different runs and trying out different model hyperparameters while recording changes in model performance for each new run.
To create a run and to stop one, you can use the start_run()
and end_run()
functions from MLflow and you are done:
# Importing the os library to work with operating system functionalities import os # Importing tracking functions from MLflow from mlflow import log_metric, log_param, log_artifacts, start_run, end_run # Ending previously running runs end_run() # Starting a new run start_run() # Logging a parameter (key-value pair) log_param("param1", 0) # Logging a metric; metrics can be updated throughout the run log_metric("foo", 100) log_metric("foo", 200) log_metric("foo", 300) # Create a file called test.text in outputs directory if not os.path.exists("outputs"): os.makedirs("outputs") with open("outputs/test.txt", "w") as f: f.write("hello world!") # Logging an artifact (output file) log_artifacts("outputs") # Ending the run end_run()
If you again view the MLflow UI, you can see that a new run has been created under the same experiment:
An MLflow Project is a format for packaging data science code in a reusable and reproducible way. This format is described using a YAML file which is called an MLproject file.
The MLproject file must consist of the three basic components as listed below:
Here's an example of a MLproject file:
name: tutorial
conda_env: conda.yaml
entry_points:
main:
parameters:
alpha: float
l1_ratio: {type: float, default: 0.1}
command: "python train.py {alpha} {l1_ratio}"
You can view MLflow's official GitHub repository to see how the MLproject file is kept inside a directory. To run this MLProject file all you have to do is write the following command:
mlflow run git@github.com:mlflow/mlflow-example.git -P alpha=0.5 --no-conda
As you can observe, when running the command above, we are specifying the alpha
value of the parameter that is mentioned in the MLproject file. We are also disabling the use of a conda environment by using the --no-conda flag. The output of train.py
is then shown as follows,
Pretty useful, isn't it? You can run any script off of GitHub or other cloud repository system using the MLflow Project component.
An MLflow Model is a standard format for packaging Machine Learning models for batch inferencing, real-time inferencing, and much more.
The format defines a convention that lets you save a model in different flavors that can be understood by different downstream tasks such as batch inferencing or real-time inferencing.
Flavors are the key concept that makes MLflow Models powerful: they are a convention that deployment tools can use to understand the model, which makes it possible to write tools that work with models from any ML library without having to integrate each tool with each library. All of the flavors that a particular model supports are defined in its MLmodel
file in YAML format.
For example, mlflow.sklearn
outputs models as follows:
# Directory written by mlflow.sklearn.save_model(model, "my_model")
my_model/
├── MLmodel
├── model.pkl
├── conda.yaml
└── requirements.txt
And its MLmodel
file describes two flavors:
time_created: 2018-05-25T17:28:53.35
flavors:
sklearn:
sklearn_version: 0.19.1
pickled_model: model.pkl
python_function:
loader_module: mlflow.sklearn
This model can then be used with any tool that supports either the sklearn
or python_function
model flavor. You can learn more about the fields in an MLmodel file from MLflow's documentation.
Now, using this model you can deploy the MLflow model on a local machine with the following command:
mlflow models serve -m my_model --no-conda
When the above command is run, it selects an appropriate backend flavor like python_function
and listens to the intended port and host. Here, it uses the default host 127.0.0.1. You can remove the --no-conda parameter if you are using conda and have a conda.yaml file in your directory.
The MLflow Model Registry is a centralized model store, set of APIs, and UI, which manage the full lifecycle of an MLflow Model. It provides model lineage, model versioning, stage transitions, and annotations. Once an MLflow model is logged by logging APIs, this model can then be registered with the Model Registry.
Some of the few concepts that model registry describes and facilitates the full lifecycle of an MLflow Model are:
To learn about different APIs used to work with the MLFlow Model Registry, please refer to the documentation.
Let us see how you can use the Python API from MLflow to use the Model Registry:
mlflow.<model_flavor>.log_model()
method.# Importing MLflow in Python import mlflow # Setting the model name model_name = "LinearModel" # Setting the model version model_version = 1 # Fetching the model based on the model version model = mlflow.pyfunc.load_model(model_uri=f"models:/{model_name}/{model_version}")
export MLFLOW_TRACKING_URI=http://localhost:5000 mlflow models serve -m "models:/LinearModel/Production"
# Importing MLflow in Python import mlflow # Initializing the MLflowClient client = mlflow.tracking.MlflowClient() # Transitioning the model into either Staging, Production, or Archived based on version number client.transition_model_version_stage(name="LinearModel", version=1, stage="Production")
You've successfully made it to the end of this ultimate guide on 'MLflow for Machine Learning Lifecycle and Pipelines [Ultimate Guide]'. If you have any questions related to MLflow, please feel free to write them down in the comments.
Also, if you are looking to fast-track your data science career by learning other tools such as MLFlow, make sure to join our Python for Data Science Fast-Track 500 or Machine Learning Fast-Track 500.
P.S.: Don't forget to bookmark this guide and share it with your friends on LinkedIn, Twitter, and Facebook.
Want to deploy Machine Learning and Deep Learning models using MLflow, REST APIs, or through cloud platforms? Join the top-rated course on Machine Learning and Deep Learning model deployment by clicking here. 404