In the last blog post we saw what a neural network is ,it’s representation and the maths behind it(To check that blog post,click here). We didn’t talk about how this neural network actually learns.

We have a design for a neural network, but how can it learn to do something? The first thing we will need is a data set to learn from- a so-called training dataset.

**Learning in brief:**

We initialize the weight of the network and then we will feed our neural network with the data this feeding forward of the data to the neural network is called as forward propagation.

And when it produces and output ŷ, we measure the generated errors by comparing it to actual result y. Now this error is given as feedback to the network,because of which weights are updated. And that’s how our neural network learns. Now let’s discuss in detail!

## Learning with gradient descent:

What we would like is an algorithm which lets us find weights and biases so that the output from the network approximates y for all training input x. To quantify how well we are achieving this goal we define a cost function C(θ,b)=(1/2n)∑(ŷ-y)^{2},where ŷ is output generated by the network and y is the actual output, this cost function is sometimes called as mean squared error. Inspecting this cost function we see that C(θ,b) is non negative, since every term in the sum is non negative.

Furthermore, the cost reduced is (C(θ,b) tends to 0), precisely when ŷ is approximately equal to y for all training input x. So our training algorithm has done a good job if it can find weight and biases so that C(θ,b) is minimized. If C(θ,b) is large our algorithm is not doing well. So the aim of our training algorithm will be to minimize the cost of C(θ,b) as a function of weights and biases. In simple words, we want to find a set of weights and biases which make the cost as small as possible. We will do that using an algorithm call gradient descent.

## Gradient descent:

For different values of ŷ , we get different values of the cost function.

We have to find ŷ for which cost function is minimum. One way to do that is bruteforce,i.e. Trying different values in the hope to find the best one for which the cost is minimum, which is not practically possible (is computational cost and time is very high).

So we will try gradient descent which is very faster than this.

Now let’s say we start from a point randomly and then we will calculate the slope of the line at that point.

If the slope is negative we will go right and if it is positive we will go left.

Doing this repeatedly we will find the best value which minimizes the cost function.

You can imagine this as a ball. Gradient Descent is called as gradient descent because we are deciding to the minimum of the cost function.

There’s more to gradient descent, but for intuition it is more than sufficient.

## Back-propagation**: **

Back propagation is a very advanced algorithm driven by very sophisticated mathematics which allows us to adjust at the same time.The weights are Updated according to how much they are responsible for the error. The learning rate decides by how much we update the weights. It is the reason why back propagation is a major breakthrough. To learn more about the maths behind it, click here .

## To summarize the whole here are the steps involved in learning of Neural Networks:

1. Randomly Initialize the weight to small number close to zero(but not zero).

2. Input the first observation of a data set in the input layer, each feature in one input node.

3. Forward propagation: From left to right the neurons are activated in a way that the impact of each neuron’s activation is limited by the weights. Propagate the activations until getting the predicted result ŷ.

4. Now we will compare the predicted result to the actual result. No measure the cost function.

5. Back propagation: From right to left the error is back propagated.

6. Repeat steps 1 to 5 and update the weights after each observation or batch of observations.

7. When the whole training set passed through the ANN , that’s an epoch . Redo more epochs.

Click here to see the implementation of an ANN to solve a business problem in python using keras. Please leave your feedback in the comment section and share the blog if you like it. You can also contact me for any query or with any suggestions .

Thank You!

Happy Learning!

I’m really impressed together with your writing abilities and also with the format in your blog.

Is that this a paid subject matter or did you

modify it yourself? Either way stay up the nice quality writing, it’s uncommon to look a nice blog like this one these

days..

ANN algorithms learn from 100 times loop or 1000 times loops,right.but this loop is for feed forward propagation or backward propagation or for both.?

After each batch for forward propagation , there’ll be backward propagation to update the parameters.