Inspecting data using Pandas

Greetings! Some links on this site are affiliate links. That means that, if you choose to make a purchase, The Click Reader may earn a small commission at no extra cost to you. We greatly appreciate your support!

While working with data, it is very important to inspect the data. Knowing insights about data such as count, mean, standard deviations, min-max values, data type, etc can provide valuable information about the data we’re working with. Pandas provide easier methods to give basic insights about a DataFrame. In this chapter, you will learn about inspecting data using pandas to extract information such as count, dtype, mean, etc.

For this chapter, we will be using the COVID-19 Dataset from Kaggle. You can simply download the data from this link and save the file as data.csv in the same folder where your Jupyter Notebook is situated at. Then, you can simply load the data into your Python notebook as:

# Making necessary imports
import pandas as pd

# Loading the dataset
df = pd.read_csv("data.csv")

Note: This dataset gets updated frequently, so the values seen in this example may slightly vary when you try the dataset yourself. However, the processes still remain the same.

Display top n rows of a Pandas DataFrame

The pandas.DataFrame.head method is used to display the top n rows of the DataFrame.

# Display top 3 rows
df.head(n=3)
inspecting data using pandas

If the number of rows (n) is not specified, the top 5 rows are displayed as default.

# Displays top 5 rows by default
df.head()
pandas dataframe head

Display bottom n rows of a Pandas DataFrame

The pandas.DataFrame.tail method is used to display the bottom n rows of the DataFrame. Similar to the pandas.DataFrame.head function, if no number is passed to it, it displays the bottom 5 rows of the DataFrame.

# Display bottom 5 rows
df.tail()
Pandas DataFrame display bottom rows

Display all the Column Names

Sometimes it may not be feasible to print the whole DataFrame in order to see the name of the columns present in the DataFrame, especially when there are a lot of columns. In such cases, we can use the pandas.DataFrame.columns method to extract all the column names.

# Display all the column names
df.columns
Index(['Country/Region', 'Confirmed', 'Deaths', 'Recovered', 'Active',
'New cases', 'New deaths', 'New recovered', 'Deaths / 100 Cases',
'Recovered / 100 Cases', 'Deaths / 100 Recovered', 'Confirmed last week',
'1 week change', '1 week % increase', 'WHO Region'],
dtype='object')

Display Descriptive Statistics of the DataFrame

The pandas.DataFrame.describe is used to display the descriptive statistics of the columns of the DataFrame such as mean, count, standard deviation, minimum value, maximum value, etc. Such descriptive statistics help us understand the data well.

# Display descriptive statistics of the DataFrame
df.describe()
Inspecting data using Pandas

Display Data Type, Non-Null Values and Memory Usage about a Pandas DataFrame

The pandas.DataFrame.info method is used to display the index data type and column data type, the number of non-null values, and memory usage.

# Display futher information
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 187 entries, 0 to 186
Data columns (total 15 columns):

#    Column                 Non-Null Count    Dtype
---  ------                 --------------    -----
0    Country/Region         187 non-null      object
1    Confirmed              187 non-null      int64
2    Deaths                 187 non-null      int64
3    Recovered              187 non-null      int64
4    Active                 187 non-null      int64
5    New cases              187 non-null      int64
6    New deaths             187 non-null      int64
7    New recovered          187 non-null      int64
8    Deaths / 100 Cases     187 non-null      float64
9    Recovered / 100 Cases  187 non-null      float64
10   Deaths / 100 Recovered 187 non-null      float64
11   Confirmed last week    187 non-null      int64
12   1 week change          187 non-null      int64
13   1 week % increase      187 non-null      float64
14 WHO Region               187 non-null      object
dtypes: float64(4), int64(9), object(2)
memory usage: 22.0+ KB

Now that you know how to inspect data using Pandas, head over to the next lesson where you will be learning how to visualize data using Pandas.


Inspecting data using PandasInspecting data using Pandas

Do you want to learn Python, Data Science, and Machine Learning while getting certified? Here are some best selling Datacamp courses that we recommend you enroll in:

  1. Introduction to Python (Free Course) - 1,000,000+ students already enrolled!
  2. Introduction to Data Science  in Python- 400,000+ students already enrolled!
  3. Introduction to TensorFlow for Deep Learning with Python - 90,000+ students already enrolled!
  4. Data Science and Machine Learning Bootcamp with R - 70,000+ students already enrolled!

Leave a Comment