Why do I need derivates by neural networks?

49 Views Asked by At

I´m learning about Backpropagation in neuronal networks and learning rate etc.

Thats all clear but for me I don´t know why I need to do so many derivatives, I know the rules I can derive the derivatives from f(x). But I don´t know why I do this.

So my question is, why do I must to derive the derivative of some activation functions and why do I need the derivation in general ?

For example I have the given question:

enter image description here

I know how to calculate the exercise but I don't get the deeper thing why I need this and why I must for example derivative of $tanh(x)$ or so on.

2

There are 2 best solutions below

1
On BEST ANSWER

In the theory of neural networks it is generally important to understand how behavior of the network's output changes when one changes the inputs.

There is a particular importance in understanding how the output changes when the change of the input is very small, and to study the ratio of output change to input change. For example, by pondering this ratio perhaps you might discover that the network is "moving in the wrong direction" when you change the input a tiny amount, in which case you might decide to change your back propogation algorithms.

The expectation from physical considerations is that a very small change of input will lead to a very small change of output, and the ratio of output change to input change is a very important quantity to understand.

In the simplest case, where the network has one input parameter $x$ and one output parameter $y$, the relation between input and output is given by a function $y = f(x)$. When an input value $x$ is changed by adding a small quantity $\Delta x$, the new input is $x + \Delta x$, and we have $$\frac{\text{output change}}{\text{input change}} = \frac{\text{new output} - \text{old output}}{\text{new input} - \text{old input}} = \frac{f(x+\Delta x) - f(x)}{(x + \Delta x) - x} = \frac{f(x+\Delta x) - f(x)}{\Delta x} $$ Further physical considerations suggest that for very tiny values of $\Delta x$, this ratio comes very close to a particular number, and from this we arrive at the concept of the limiting ratio, also known as the derivative of $f$ at $x$. The derivative of $f$ at $x$ is denoted $f'(x)$, and it is given by the limit expression $$f'(x) = \lim_{\Delta x \to 0} \frac{f(x+\Delta x) - f(x)}{\Delta x} $$ The point is that instead of studying the ratios of small changes, it is much easier to study the derivative, and it is very worthwhile to master the calculus tools needed to so that.

0
On

Because training the neural network requires solving an optimization problem using a method such as gradient descent. And you can't implement gradient descent without knowing how to evaluate these derivatives.