Derivative of result with repect to its function

191 Views Asked by At

I'm trying to implement the simplest form of backpropagation. The backpropagation is a widely used in Neural Networks algorithm; the steps are, basically: Ⅰ) calculating the result of a formula with a variable called "weight" (the forward pass), next Ⅱ) finding the difference between the actual result, and the wanted one, then Ⅲ) finding the derivative of the difference with respect to the "weight" (to see how much it affected the miss), and Ⅳ) adding (or subtracting) the derivative result to the weight.

So, let $f(x,w) = (x+w)^2$, x is an input, and w is a weight. Let $f(3,5) = 64$. Now, suppose with the same $x$ we want the result $80$ instead, so let $E = 80-64 = 16$, then calculate $\frac{dE}{dw}$.

This is the step where I'm stuck. The derivative of the $(x+w)^2$ is $2x+2w$, but I don't understand where am I supposed to substitute the E. Perhaps am I calculating the wrong derivative? I mean, that $\frac{df}{dw}=\frac{df}{dx}=2x+2w$, but probably $\frac{dE}{dw}≠2x+2w$? I have no idea where to go further ☹

2

There are 2 best solutions below

4
On

I think you are implementing what is essentially Newton's method (gradient descent):

$E = f(x,w) \approx f(x,w_0) + \frac{\partial f(x,w)}{\partial E}|_{w_0} \times \delta w$. If you put $E = 80$ and $f(x,w_0)=64$ then in the next iteration replace $w_0$ by $w_0 + \delta w$.

0
On

Apparently, with $E = 16$, it wouldn't be mathematical sense to compute: $$\frac {dE}{dw} (x + w)^2$$ as a derivative because if you substitute $16$ for $E$, you would get: $$\frac {16x^2}{w} + 32x^2 + 16w$$ Since you are working with a multivariable function, you can use the multivariable chain rule.

Multivariable Chain Rule

Let $z = f(x, w)$. Since $f(x, w) = (x + w)^2$, $z = (x + w)^2$. Before we do anything, we'll expand $(x + w)^2$ to get $x^2 + 2xw + w^2$. Now, we will use the following formula: $$dz(x, w) = \frac {\partial z}{\partial x}dx + \frac{\partial z}{\partial w}dw$$ for $z = f(x, w)$. When we calculate the partial derivative with respect to $x$, we get $2w + 2x$. When we calculate with respect to $w$, we get $2w + 2x$ (which is the same answer as the last partial derivative). So, we have $$dz=(2w + 2x)dx + (2w + 2x)dw = 2w + 2x \ dx + 2w + 2x \ dw$$ We can factor out a 2 to get: $$2(x \ dx + w \ dw)$$ So, your final answer is $2(x \ dx + w \ dw)$.