A deep neural network is an interconnection of neurons collectively working together. It is made up of three different types of layers:
- Input Layer: The input layer is responsible for feeding the data into the network.
- Hidden Layer(s): The hidden layer(s) is responsible for computing the input received from its previous layer using multiple neurons and passing the output of each neuron to the subsequent layer.
- Output Layer: The output layer is responsible for outputting the prediction from the network.
Here is a sample architecture of a deep neural network, which has an input layer, two hidden layers (with three neurons each) and one output layer.
This may look overwhelming at a first glance but let us put this diagram in comparison to that of a neuron.
In a deep neural network, each neuron in the hidden layer is responsible for the computation of the activation function applied to the linear combination of the inputs and their respective weights.
However, since it is odd to expect different results from each neuron for the same set of inputs and weights, we provide a different set of weights to same set of inputs for each neuron. Therefore, the entire network has more parameters to learn and this makes the network more accurate in comparison to just using a single neuron.
How does data flow in a Deep Neural Network?
Let us continue by using a more detail diagram of the above deep neural architecture for understanding how data flows through the network.
In the beginning, all inputs ($x_1, x_2, x_3, …, x_n$) are assigned with their own set of respective weights for each neuron present in hidden layer 1 {($w_{1,1}^{(1)}, w_{1,2}^{(1)}, w_{1,3}^{(1)}$), ($w_{2,1}^{(1)}, w_{2,2}^{(1)}, w_{2,3}^{(1)}$), …, ($w_{n,1}^{(1)}, w_{n,2}^{(1)}, w_{n,3}^{(1)}$)}, where, $n$ is a positive integer.
To help you understand this notation, the subscript of $w_{1,1}^1$, i.e., $w_{1,1}$ represents that the weight is assigned for the first input going into the first neuron whereas the superscript, i.e., $w^{(1)}$ represents the hidden layer number.
Then, the weighted inputs are passed into each neuron of hidden layer 1 for individual computation. The following mathematical equation represents the calculation done by neuron 1 of hidden layer 1,
$$a^{(1)}_{1} = f(z_{1}^{(1)})= f(w_{1,1}^{(1)}x_{1} + w_{2,1}^{(1)}x_{2} + … + w_{n,1}^{(1)}x_{n} + b^{(1)})$$
We can also write this equation in matrix notation as,
$$a^{(1)}_{1} = f(z_{1}^{(1)}) = f(\textbf{w}^{(1)}^{T}\textbf{x} + b^{(1)})$$
In general practice, the value of bias, $b$, is kept the same for each neuron in a hidden layer and different neurons may have different activation functions even in the same hidden layer.
Similarly, each neuron in hidden layer 1 computes the activation function to the linear combination of the weighted inputs of the input layer.
The set of inputs for the hidden layer 2 is now ($a^{(1)}_{1}, a^{(1)}_{2}, a^{(1)}_{2}$) and a new set of weights are initialized for each neuron present in hidden layer 2, i.e., {($w_{1,1}^{(2)}, w_{1,2}^{(2)}, w_{1,3}^{(2)}$), ($w_{2,1}^{(2)}, w_{2,2}^{(2)}, w_{2,3}^{(2)}$), …, ($w_{n,1}^{(2)}, w_{n,2}^{(1)}, w_{n,3}^{(2)}$)}
Then, the weighted inputs are passed into each neuron of hidden layer 2 for individual computation. The following mathematical equation represents the calculation done by neuron 2 of hidden layer 2,
$$a^{(2)}_{1} = f(z_{1}^{(2)})= f(w_{1,1}^{(2)}a^{(1)}_{1} + w_{2,1}^{(2)}a^{(1)}_{2} + … + w_{n,1}^{(2)}a^{(1)}_{n} + b^{(2)})$$
We can also write this equation in matrix notation as,
$$a^{(2)}_{1} = f(z_{1}^{(2)}) = f(\textbf{w}^{(2)}^{T}\textbf{a}^{(1)}) + b^{(2)})$$
Similarly, each neuron in hidden layer 2 computes the activation function to the linear combination of the weighted inputs of the hidden layer 1.
Now, the set of inputs for the output layer is now ($a^{(2)}_{1}, a^{(2)}_{2}, a^{(2)}_{3}$) and a new set of weights are initialized for each input, ($w_{1}^{(2)}, w_{2}^{(2)}, w_{3}^{(2)}$).
Int the output layer, the computation is rather simple since there is only 1 neuron.
$$\hat{y} = f(z^{(3)}) = f(\textbf{w}^{(3)}^{T}\textbf{a}^{(2)} + b^{(3)})$$
This is how data flows in a deep neural network.
Some insights about Deep Neural Networks
Before we move onto the next chapter, here are some insights about Deep Neural Networks:
- The term ‘deep’ comes from the fact that as we add more hidden layers in a deep neural network, the data has to flow deeper through the network.
- A neural network with only one hidden layer is called a shallow neural network.
- In this chapter, we studied about Dense Neural Networks. A Dense Neural Network has every single input connected with every subsequent layer of neurons in the network and hence, it gets the term ‘Dense’. There are multiple other variants of deep neural networks such as Convolutional Neural Networks, AutoEncoders, Recurrent Neural Networks, etc.
Hope this proves to be useful! Now, let’s understand how a deep neural network learns in the next and final chapter of this course.
Do you want to learn Python, Data Science, and Machine Learning while getting certified? Here are some best selling Datacamp courses that we recommend you enroll in:
- Introduction to Python (Free Course) - 1,000,000+ students already enrolled!
- Introduction to Data Science in Python- 400,000+ students already enrolled!
- Introduction to TensorFlow for Deep Learning with Python - 90,000+ students already enrolled!
- Data Science and Machine Learning Bootcamp with R - 70,000+ students already enrolled!