Activation functions

Greetings! Some links on this site are affiliate links. That means that, if you choose to make a purchase, The Click Reader may earn a small commission at no extra cost to you. We greatly appreciate your support!

[latexpage]

In the last lesson, we left out talking about activation functions since they required a lesson of their own. So let’s dive into that before learning how to train a neuron.

An activation function is a non-linear function that determines the output of a neuron. It changes the output of the linear combination of inputs and weights to a non-linear range of values.

What are linear and non-linear functions?

Let us quickly refresh our memory about linear and non-linear functions.

A linear function is a function that has a constant slope.

Linear Function - Activation Functions

In the above figure, the graph represents the value of outputs plotted by the function, $f(x) = x + 2$. If you take a derivative at any point on the straight line, the value of the slope will remain the same.

On the other hand, a non-linear function is a function that has a varying slope across multiple data points.

Non-linear function - Activation Functions

In the above figure, the graph represents the value of outputs plotted by the function, $f(x) = x^2$. If you take a derivative at the top of the curve (2,4) and at the bottom of the curve (0,0), the value of the slope is found to change.


Why are activation functions important?

Consider a classification problem where we are trying to differentiate between class A (blue) and class B (red) data points and the dataset looks like this the following when plotted onto a graph.

Classes - Activation Functions

Now, let’s try to differentiate these classes using the straight line made by the linear function, $f(x) = x + 2$.

Linear function - Activation Functions

If we assume that anything above the straight line is class A and anything below the straight line is class B, then we can see that we have falsely classified many of the data points. Also, even if we try to plot a different straight line on the graph, we would still not be able to classify all of the data points correctly.

However, if we use a non-linear function, say $f(x) = x^2$, we are able to bend the curve and get a boundary line that represents the dataset perfectly as shown in the example below:

Non-linear function seperating classes - Activation Functions

If we assume that any point within the boundary line made by the non-linear function is class A and anything outside the boundary line is class B, then, we seem to have achieved 100% accuracy in our predictions.

In a neuron, the linear combination of inputs and their weights is a linear function. So, to make the output of the linear function non-linear, we make use of an activation function.


Commonly used activation functions

Choosing a proper activation function plays a huge rule in determining the accuracy of prediction of a neuron. So, let us discuss some activation functions which are commonly used in deep learning:

1. Sigmoid activation function – The sigmoid activation function gives a value between 0 and 1. It is denoted by $\sigma(z)$ and is calculated in the following way:

$$f(z) = \sigma(z) = \dfrac{1}{1+e^{-z}} $$

2. Tanh (Hyperbolic Tangent) activation function – The tanh activation function gives a value between -1 and 1. It is denoted by $tanh(z)$ and is calculated in the following way:

$$f(z) = tanh(z) = \dfrac{e^z – e^{-z}}{e^z + e^{-z}}$$

3. ReLU (Rectified Linear Unit) activation function – The ReLU activation function gives the input directly as an output if it is positive, otherwise, it will output zero. It is calculated in the following way:

$$f(z) = \left \{ \begin{array}{rcl}
0 & \mbox{for} & x < 0\\ x & \mbox{for} & x \ge 0\end{array}$$

4. Leaky ReLU activation function – The Leaky ReLU activation function gives the input directly as an output if it is positive, otherwise, it will output a smaller version of the input. It is calculated in the following way:

$$f(z) = \left \{ \begin{array}{rcl} \alpha x & \mbox{for} & x < 0\\ x & \mbox{for} & x \ge 0\end{array} $$

where, $\alpha $ is normally 0.01.

And, this is it for activation functions! We can discuss a lot in detail about the pros and cons of all the activation functions listed above but that is a lesson for another day.


Activation functionsActivation functions

Do you want to learn Python, Data Science, and Machine Learning while getting certified? Here are some best selling Datacamp courses that we recommend you enroll in:

  1. Introduction to Python (Free Course) - 1,000,000+ students already enrolled!
  2. Introduction to Data Science  in Python- 400,000+ students already enrolled!
  3. Introduction to TensorFlow for Deep Learning with Python - 90,000+ students already enrolled!
  4. Data Science and Machine Learning Bootcamp with R - 70,000+ students already enrolled!

Leave a Comment