NVIDIA's NVTabular - Superfast ETL using Python

Table of Contents

NVIDIA's NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.

NVIDIA NVTabular for training recommender model

The library boasts a 95 times speedup over traditional ETL tools such as NumPy and was specifically built for accelerating NVIDIA Merlin's data pipeline to build recommender systems. It does so by scaling ETL over multiple GPUs and nodes.

NVIDIA's NVTabular - Superfast ETL for Data Science

How to install NVIDIA's NVTabular

The following prerequisites must be met in order to install NVTabular:

CUDA version 10.1+
Python version 3.7+
NVIDIA Pascal GPU or later

NVTabular can be installed with Anaconda from the nvidia channel by running the following command:

conda install -c nvidia -c rapidsai -c numba -c conda-forge nvtabular python=3.7 cudatoolkit=10.2

This should successfully install NVTabular in your system.

NOTE: At the moment, NVTabular will only run on Linux and other operating systems are not currently supported.

Getting Started with NVTabular Example Notebooks

To get started with NVTabular, there are multiple NVTabular Example Notebooks hosted on NVIDIA's GitHub repository.

We suggest checking out the Jupyter Notebooks in 'examples/getting-started-movielens' as an entry point for understanding the library better. The MovieLens25M is a popular dataset for recommender systems and is used in academic publications.

The example notebooks are structured as follows and should be reviewed in this order as per the repository:

01-Download-Convert.ipynb: Demonstrates how to download the dataset and convert it into the correct format so that it can be consumed.
02-ETL-with-NVTabular.ipynb: Demonstrates how to execute the preprocessing and feature engineering pipeline (ETL) with NVTabular on the GPU.
03-Training-with-PyTorch.ipynb: Demonstrates how to train a model with PyTorch based on the ETL output.
03-Training-with-TF.ipynb: Demonstrates how to train a model with TensorFlow based on the ETL output.
04-Triton-Inference-with-TF.ipynb: Demonstrates how to make inference using Triton.

NVIDIA's NVTabular - Superfast ETL using Python

The goal of the example notebooks is to show how NVIDIA Merlin uses NVTabular to perform ETL, subsequently train TensorFlow, or PyTorch, or HugeCTR models, and then, make inferences using Triton.

Written by

The Click Reader

At The Click Reader, we are committed to empowering individuals with the tools and knowledge needed to excel in the ever-evolving field of data science. Our sole focus is delivering a world-class data science bootcamp that transforms beginners and upskillers into industry-ready professionals.

NVIDIA's NVTabular - Superfast ETL using Python

How to install NVIDIA's NVTabular

Getting Started with NVTabular Example Notebooks

Related Articles

DeepMind x UCL Deep and Reinforcement Learning Lecture Series

Data Analyst vs. Data Scientist Salary: The Complete 2025 Guide

Hyperparameter Tuning using Python

Python Treemap Visualization - Plot a Treemap using Python

Interested In Data Science Bootcamp?
Request more info now.

NVIDIA's NVTabular - Superfast ETL using Python

How to install NVIDIA's NVTabular

Getting Started with NVTabular Example Notebooks

Related Articles

DeepMind x UCL Deep and Reinforcement Learning Lecture Series

Data Analyst vs. Data Scientist Salary: The Complete 2025 Guide

Hyperparameter Tuning using Python

Python Treemap Visualization - Plot a Treemap using Python

Interested In Data Science Bootcamp?Request more info now.

Interested In Data Science Bootcamp?
Request more info now.