Thanks to theidioms.com

Deep Learning Theoretical Course (Course VII)

Deep Learning Theoretical Course (Course VII)

Activation functions

In the last lesson, we left out talking about activation functions since they required a lesson of their own. So let’s dive into that before learning how to train a neuron.

An activation function is a non-linear function that determines the output of a neuron. It changes the output of the linear combination of inputs and weights to a non-linear range of values.

What are linear and non-linear functions?

Let us quickly refresh our memory about linear and non-linear functions.

A linear function is a function that has a constant slope.

Activation Functions

In the above figure, the graph represents the value of outputs plotted by the function, f(x) = x + 2. If you take a derivative at any point on the straight line, the value of slope will remain the same.

On the other hand, a non-linear function is a function that has varying slope across multiple data points.

Activation Functions

In the above figure, the graph represents the value of outputs plotted by the function, f(x) = x^2. If you take a derivative at the top of the curve (2,4) and at the bottom of the curve (0,0), the value of the slope is found to change.

Why are activation functions important?

Consider a classification problem where we are trying to differentiate between class A (blue) and class B (red) data points and the dataset looks like this the following when plotted onto a graph.

Activation Functions

Now, let’s try to differentiate these classes using the straight line made by the linear function, f(x) = x + 2.

Activation Functions

If we assume that anything above the straight line is class A and anything below the straight line is class B, then we can see that we have falsely classified many of the data points. Also, even if we try to plot a different straight line on the graph, we would still not be able to classify all of the data points correctly.

However, if we use a non-linear function, say f(x) = x^2, we are able to bend the curve and get a boundary line that represents the dataset perfectly as shown in the example below:

Activation Functions

If we assume that any point within the boundary line made by the non-linear function is class A and anything outside the boundary line is class B, then, we seem to have achieved 100% accuracy in our predictions.

In a neuron, the linear combination of inputs and their weights is a linear function. So, to make the output of the linear function non-linear, we make use of an activation function.

Commonly used activation functions

Choosing a proper activation function plays a huge rule in determining the accuracy of prediction of a neuron. So, let us discuss some activation functions which are commonly used in deep learning:

1. Sigmoid activation function – The sigmoid activation function gives a value between 0 and 1. It is denoted by \sigma(z) and is calculated in the following way:

    \[f(z) = \sigma(z) =  \dfrac{1}{1+e^{-z}}\]

2. Tanh (Hyperbolic Tangent) activation function – The tanh activation function gives a value between -1 and 1. It is denoted by tanh(z) and is calculated in the following way:

    \[f(z) =  tanh(z) = \dfrac{e^z - e^{-z}}{e^z + e^{-z}}\]

3. ReLU (Rectified Linear Unit) activation function – The ReLU activation function gives the input directly as an output if it is positive, otherwise, it will output zero. It is calculated in the following way:

    \[f(z) = \left \{ \begin{array}{rcl}0 & \mbox{for} & x < 0\\ x & \mbox{for} & x \ge 0\end{array}\]

4. Leaky ReLU activation function – The Leaky ReLU activation function gives the input directly as an output if it is positive, otherwise, it will output a smaller version of the input. It is calculated in the following way:

    \[f(z) = \left \{ \begin{array}{rcl} \alpha x & \mbox{for} & x < 0\\ x & \mbox{for} & x \ge 0\end{array}\]

where, \alpha is normally 0.01.

And, this is it for activation functions! We can discuss a lot in detail about the pros and cons of all the activation functions listed above but that is a lesson for another day.

Next, we will understand how a neuron learns using an algorithm called gradient descent.

Leave your thought here

Your email address will not be published. Required fields are marked *

Close Bitnami banner
Bitnami