MLOps using Python

Greetings! Some links on this site are affiliate links. That means that, if you choose to make a purchase, The Click Reader may earn a small commission at no extra cost to you. We greatly appreciate your support!

There’s a tremendous rise in machine learning applications lately but are they really useful to the industry? Successful deployments and effective production–level operations lead to determining the actual value of these applications. 

MLOps using Python

According to a survey by Algorithmia, 55% of the companies have never deployed a machine learning model. Moreover, 85% of the models cannot make it to production. Some of the main reasons for this failure are lack of talent, non-availability of processes that can manage change, and absence of automated systems. Hence to tackle these challenges, it is necessary to bring in the technicalities of DevOps and Operations with the machine learning development, which is what MLOps is all about.

What is MLOps?

MLOps, also known as Machine Learning Operations for Production, is a set of standardized practices that can be utilized to build, deploy, and govern the lifecycle of ML models. In simple words, MLOps are bunch of technical engineering and operational tasks that allows your machine learning model to be used by other users and applications accross the organization.

MLOps lifecycle

There are seven stages in a MLOps lifecycle, which executes iteratively and the success of machine learning application depends on the success of these individual steps. The problems faced at one step can cause backtracking to the previous step to check for any bugs introduced. Let’s understand what happens at every step in the MLOps lifecycle:

MLOps using Python
  • ML development: This is the basic step that involves creating a complete pipeline beginning from data processing to model training and evaluation codes.
  • Model Training: Once the setup is ready, the next logical step is to train the model. Here, continuous training functionality is also needed to adapt to new data or address specific changes. 
  • Model Evaluation: Performing inference over the trained model and checking the accuracy/correctness of the output results. 
  • Model Deployment: When the proof of concept stage is accomplished, the other part is to deploy the model according to the industry requirements to face the real-life data. 
  • Prediction Serving: After deployment, the model is now ready to serve predictions over the incoming data. 
  • Model Monitoring: Over time, problems such as concept drift can make the results inaccurate hence continuous monitoring of the model is essential to ensure proper functioning.
  • Data and Model Management: It is a part of the central system that manages the data and models. It includes maintaining storage, keeping track of different versions, ease of accessibility, security, and configuration across various cross-functional teams. 

PyCaret and MLflow

PyCaret is an open source, low-code machine learning library in Python that allows you to go from preparing your data to deploying your model within minutes in your choice of notebook environment.

MLOps using Python

MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry. MLflow currently offers four components:

MLOps using Python

Let’s get started

It would be easier to understand the MLOps process, pyCaret and MLflow using an example. For this exercise we’ll use https://www.kaggle.com/ronitf/heart-disease-uci . This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In particular, the Cleveland database is the only one that has been used by ML researchers to this date. The “goal” field refers to the presence of heart disease in the patient. It is integer valued from 0 (no presence) to 4. Firstly, we’ll install pycaret, import libraries and load data:

!pip install pycaret pandas shap
import pandas as pd
from pycaret.classification import *
df = pd.read_csv('heart.csv')
df.head()
MLOps using Python

Common to all modules in PyCaret, the setup is the first and the only mandatory step in any machine learning experiment using PyCaret. This function takes care of all the data preparation required prior to training models. Here We will pass log_experiment = True and experiment_name = 'diamond' , this will tell PyCaret to automatically log all the metrics, hyperparameters, and model artifacts behind the scene as you progress through the modeling phase. This is possible due to integration with MLflow.

cat_features = ['sex', 'cp', 'fbs', 'restecg', 'exang', 'thal']
experiment = setup(df, target='target', categorical_features=cat_features, log_experiment = True, experiment_name = 'diamond')
MLOps using Python
Output from setup — truncated for display

Now that the data is ready, let’s train the model using compare_models function. It will train all the algorithms available in the model library and evaluates multiple performance metrics using k-fold cross-validation.

best_model = compare_models()
MLOps using Python

Let’s now finalize the best model i.e. train the best model on the entire dataset including the test set and then save the pipeline as a pickle file. save_model function will save the entire pipeline (including the model) as a pickle file on your local disk.

save_model(best_model, model_name='ridge-model')
MLOps using Python

Remember we passed log_experiment = True in the setup function along with experiment_name = 'diamond' . Now we can initial MLflow UI to see all the logs of all the models and the pipeline.

mlflow ui

Now open your browser and type “localhost:5000”. It will open a UI like this:

MLOps using Python

Now, we can load this model at any time and test the data on it:

model = load_model('ridge-model')
MLOps using Python
model.predict(df.tail())
MLOps using Python

So, that’s how an end-to-end machine learning model is saved and deployed and is available to use for industrial purposes.


MLOps using PythonMLOps using Python

Do you want to learn Python, Data Science, and Machine Learning while getting certified? Here are some best selling Datacamp courses that we recommend you enroll in:

  1. Introduction to Python (Free Course) - 1,000,000+ students already enrolled!
  2. Introduction to Data Science  in Python- 400,000+ students already enrolled!
  3. Introduction to TensorFlow for Deep Learning with Python - 90,000+ students already enrolled!
  4. Data Science and Machine Learning Bootcamp with R - 70,000+ students already enrolled!

Leave a Comment