In the previous chapter, we predicted a continuously-valued label using linear regression. In this chapter, we will discuss logistic regression which is useful for classification problems where the output is discrete rather than continuous. Logistic regression models the input-output behavior with an S-shaped curve (logistic function) which gives the probability of input variable belonging to a certain class.
In this chapter, we will be using the MNIST handwritten digits dataset. The dataset contains 60,000 examples for training and 10,000 examples for testing. The digits have been size-normalized and centered in a fixed-size image (28x28 pixels) with values from 0 to 255.
We will first start by loading the MNIST dataset from the tensorflow datasets. We will load both the training and testing datasets. Since the data are images, we flatten the pixel values into a 1-D array of size 784 using the reshape method of numpy. We also normalize the pixel intensities such that the pixel values are between 0 to 1.
from tensorflow.keras.datasets import mnist import numpy as np # Load train and test data (x_train, y_train), (x_test, y_test) = mnist.load_data() # Converting data to float32 x_train, x_test = np.array(x_train, np.float32), np.array(x_test, np.float32) # Flatten images to 1-D vector of 784 features (28*28). x_train, x_test = x_train.reshape(x_train.shape[0], -1), x_test.reshape(x_test.shape[0], -1) # Normalize images value from [0, 255] to [0, 1]. x_train, x_test = x_train / 255., x_test / 255.
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz 11493376/11490434 [==============================] - 0s 0us/step
Due to the large number of images for training, it is suggested to train the images in batches. So we will be using the tf.data function to shuffle and create data batches.
train_data = tf.data.Dataset.from_tensor_slices((x_train, y_train)) train_data = train_data.repeat().shuffle(5000).batch(256).prefetch(1)
Now, we define the logistic regression model as a Python class with two methods: init and call. Similar to the linear regression model, the weights and biases for the logistic regression model is defined in the init method whereas the formula is defined in the call method.
As the input feature vector has 784 pixel values and 10 classes (numbers from 0-9), the weight should be of shape [784, 10] and bias should be a 1-D vector having 10 values. Then we will multiply the inputs vector with the weights and finally add a bias to obtain the logits. Finally, a softmax function is applied to normalize the logits to a probability distribution.
class Model: def __init__(self): self.W = tf.Variable(tf.ones([784, 10]), name="weight") self.b = tf.Variable(tf.zeros([10]), name="bias") def __call__(self, x): return tf.nn.softmax(tf.matmul(x, self.W) + self.b)
Now, we will pass the logit obtained from the model to a loss function in order to evaluate the model's performance. We first one-hot encode the outputs using the one_hot() function of TensorFlow.
Then, we compute the cross-entropy loss between the predicted value and the actual one-hot encoded label. Another function that computes the accuracy of our model. For updating the weight and biases on each iteration (epoch), we will be using the Stochastic gradient descent (SGD) optimizer.
def loss(y_pred, y_true): # Encode label to a one hot vector y_true = tf.one_hot(y_true, depth=10) # Clip prediction values to avoid log(0) error y_pred = tf.clip_by_value(y_pred, 1e-9, 1.) # Compute cross-entropy return tf.reduce_mean(-tf.reduce_sum(y_true * tf.math.log(y_pred),1)) def accuracy(y_pred, y_true): # Predicted class is the index of highest score in prediction vector (i.e. argmax). correct_prediction = tf.equal(tf.argmax(y_pred, 1), tf.cast(y_true, tf.int64)) return tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) # Stochastic gradient descent optimizer. optimizer = tf.optimizers.SGD(lr = 0.1)
Now, for each iteration (epoch) during the model training, we need to:
def train(model, x, y): with tf.GradientTape() as t: pred = model(x) current_loss = loss(pred, y) # Compute gradients gradients = t.gradient(current_loss, [model.W, model.b]) # Update W and b following gradients. optimizer.apply_gradients(zip(gradients, [model.W, model.b]))
Finally, the model is initialized and is trained for 60 iterations (epochs).
# Initialize the model model = Model() epochs = 60 losses = [] for epoch_count in range(epochs): current_loss = loss(model(x_train), y_train) losses.append(current_loss) # Train the model train(model, x_train, y_train)
Finally, we can visualize how the value of loss decreases over each epoch by visualizing the values of loss in each iteration using the matplotlib library.
# Visualizing the loss function plt.plot(losses) plt.xlabel('Num of epochs') plt.ylabel('Loss') plt.show()
From the above graph, we can clearly see how the value of loss is decreasing over each epoch. Running the training for a higher number of epochs may decrease the loss even further. So feel free to try it out!
In the next chapter, you will get introduced to building Neural Networks in a more TensorFlow-ic way.
Do you want to learn Python, Data Science, and Machine Learning while getting certified? Here are some best selling Datacamp courses that we recommend you enroll in: