A deep neural network is an interconnection of neurons collectively working together. It is made up of three different types of layers:
Here is a sample architecture of a deep neural network, which has an input layer, two hidden layers (with three neurons each) and one output layer.
This may look overwhelming at a first glance but let us put this diagram in comparison to that of a neuron.
In a deep neural network, each neuron in the hidden layer is responsible for the computation of the activation function applied to the linear combination of the inputs and their respective weights.
However, since it is odd to expect different results from each neuron for the same set of inputs and weights, we provide a different set of weights to same set of inputs for each neuron. Therefore, the entire network has more parameters to learn and this makes the network more accurate in comparison to just using a single neuron.
Let us continue by using a more detail diagram of the above deep neural architecture for understanding how data flows through the network.
In the beginning, all inputs ($x_1, x_2, x_3, ..., x_n$) are assigned with their own set of respective weights for each neuron present in hidden layer 1 {($w_{1,1}^{(1)}, w_{1,2}^{(1)}, w_{1,3}^{(1)}$), ($w_{2,1}^{(1)}, w_{2,2}^{(1)}, w_{2,3}^{(1)}$), ..., ($w_{n,1}^{(1)}, w_{n,2}^{(1)}, w_{n,3}^{(1)}$)}, where, $n$ is a positive integer.
To help you understand this notation, the subscript of $w_{1,1}^1$, i.e., $w_{1,1}$ represents that the weight is assigned for the first input going into the first neuron whereas the superscript, i.e., $w^{(1)}$ represents the hidden layer number.
Then, the weighted inputs are passed into each neuron of hidden layer 1 for individual computation. The following mathematical equation represents the calculation done by neuron 1 of hidden layer 1,
$$a^{(1)}_{1} = f(z_{1}^{(1)})= f(w_{1,1}^{(1)}x_{1} + w_{2,1}^{(1)}x_{2} + ... + w_{n,1}^{(1)}x_{n} + b^{(1)})$$
We can also write this equation in matrix notation as,
$$a^{(1)}_{1} = f(z_{1}^{(1)}) = f(\textbf{w}^{(1)}^{T}\textbf{x} + b^{(1)})$$
In general practice, the value of bias, $b$, is kept the same for each neuron in a hidden layer and different neurons may have different activation functions even in the same hidden layer.
Similarly, each neuron in hidden layer 1 computes the activation function to the linear combination of the weighted inputs of the input layer.
The set of inputs for the hidden layer 2 is now ($a^{(1)}_{1}, a^{(1)}_{2}, a^{(1)}_{2}$) and a new set of weights are initialized for each neuron present in hidden layer 2, i.e., {($w_{1,1}^{(2)}, w_{1,2}^{(2)}, w_{1,3}^{(2)}$), ($w_{2,1}^{(2)}, w_{2,2}^{(2)}, w_{2,3}^{(2)}$), ..., ($w_{n,1}^{(2)}, w_{n,2}^{(1)}, w_{n,3}^{(2)}$)}
Then, the weighted inputs are passed into each neuron of hidden layer 2 for individual computation. The following mathematical equation represents the calculation done by neuron 2 of hidden layer 2,
$$a^{(2)}_{1} = f(z_{1}^{(2)})= f(w_{1,1}^{(2)}a^{(1)}_{1} + w_{2,1}^{(2)}a^{(1)}_{2} + ... + w_{n,1}^{(2)}a^{(1)}_{n} + b^{(2)})$$
We can also write this equation in matrix notation as,
$$a^{(2)}_{1} = f(z_{1}^{(2)}) = f(\textbf{w}^{(2)}^{T}\textbf{a}^{(1)}) + b^{(2)})$$
Similarly, each neuron in hidden layer 2 computes the activation function to the linear combination of the weighted inputs of the hidden layer 1.
Now, the set of inputs for the output layer is now ($a^{(2)}_{1}, a^{(2)}_{2}, a^{(2)}_{3}$) and a new set of weights are initialized for each input, ($w_{1}^{(2)}, w_{2}^{(2)}, w_{3}^{(2)}$).
Int the output layer, the computation is rather simple since there is only 1 neuron.
$$\hat{y} = f(z^{(3)}) = f(\textbf{w}^{(3)}^{T}\textbf{a}^{(2)} + b^{(3)})$$
This is how data flows in a deep neural network.
Before we move onto the next chapter, here are some insights about Deep Neural Networks:
Hope this proves to be useful! Now, let's understand how a deep neural network learns in the next and final chapter of this course.
Do you want to learn Python, Data Science, and Machine Learning while getting certified? Here are some best selling Datacamp courses that we recommend you enroll in: