# Deep Learning Theoretical Course (Course VII)

July 19, 2020 2020-08-04 10:52## Deep Learning Theoretical Course (Course VII)

### Activation functions

In the last lesson, we left out talking about activation functions since they required a lesson of their own. So let’s dive into that before learning how to train a neuron.

An activation function is a non-linear function that determines the output of a neuron. It changes the output of the linear combination of inputs and weights to a non-linear range of values.

#### What are linear and non-linear functions?

Let us quickly refresh our memory about linear and non-linear functions.

A linear function is a function that has a constant slope.

In the above figure, the graph represents the value of outputs plotted by the function, . If you take a derivative at any point on the straight line, the value of slope will remain the same.

On the other hand, a non-linear function is a function that has varying slope across multiple data points.

In the above figure, the graph represents the value of outputs plotted by the function, . If you take a derivative at the top of the curve (2,4) and at the bottom of the curve (0,0), the value of the slope is found to change.

#### Why are activation functions important?

Consider a classification problem where we are trying to differentiate between class A (blue) and class B (red) data points and the dataset looks like this the following when plotted onto a graph.

Now, let’s try to differentiate these classes using the straight line made by the linear function, .

If we assume that anything above the straight line is class A and anything below the straight line is class B, then we can see that we have falsely classified many of the data points. Also, even if we try to plot a different straight line on the graph, we would still not be able to classify all of the data points correctly.

However, if we use a non-linear function, say , we are able to bend the curve and get a boundary line that represents the dataset perfectly as shown in the example below:

If we assume that any point within the boundary line made by the non-linear function is class A and anything outside the boundary line is class B, then, we seem to have achieved 100% accuracy in our predictions.

In a neuron, the linear combination of inputs and their weights is a linear function. So, to make the output of the linear function non-linear, we make use of an activation function.

#### Commonly used activation functions

Choosing a proper activation function plays a huge rule in determining the accuracy of prediction of a neuron. So, let us discuss some activation functions which are commonly used in deep learning:

1. **Sigmoid activation function** – The sigmoid activation function gives a value between 0 and 1. It is denoted by and is calculated in the following way:

2. **Tanh (Hyperbolic Tangent) activation function** – The tanh activation function gives a value between -1 and 1. It is denoted by and is calculated in the following way:

3. **ReLU (Rectified Linear Unit) activation function **– The ReLU activation function gives the input directly as an output if it is positive, otherwise, it will output zero. It is calculated in the following way:

4. **Leaky ReLU activation function** – The Leaky ReLU activation function gives the input directly as an output if it is positive, otherwise, it will output a smaller version of the input. It is calculated in the following way:

where, is normally 0.01.

And, this is it for activation functions! We can discuss a lot in detail about the pros and cons of all the activation functions listed above but that is a lesson for another day.

Next, we will understand how a neuron learns using an algorithm called gradient descent.