I'm attempting to implement the Backpropagation algorithm, and I've run into a wall because the best descriptions I can find of the algorithm rely on calculus; and I don't know calculus.
My issue is interpreting the equation that describes how a weight should change:
$$w'_{(x1)1} = w_{(x1)1} + \eta \delta_1\frac{df_1(e)}{de}x_1$$
(source, about 4/5 of the way down the page) where:
- $w'_{(x1)1}$ = the new weight
- $w_{(x1)1}$ = the old weight
- $\eta$ = learning rate
- $\delta$ = the error of the edge target
- $\frac{df_1(e)}{de}$ = the derivative of the activation function
- $x_1$ = the output of the edge source
My problem is, I don't know how to interpret the "$\frac{df_1(e)}{de} x_1$" at the end. I've figured there are 2 ways I could interpret it:
- $x_1$ is the argument of the derivative function (
derivative(x1), in code) - $x_1$ is multiplied by the result of the derivative, since multiplication is generally implicit.
If it's option 2, what is the argument of the derivative? To the best of my knowledge, the derivative of the function (sigmoid in this case) will be a function itself, an it doesn't make any sense to try to multiply a function by a value.
The pattern $$\frac{d(\text{stuff})}{de}$$ means the derivative of "$(\text{stuff})$" with respect to $e$. So in this case, the function $f_1$ is the thing you take the derivative of.
That derivative is then multiplied by $x_1$.
Another way to write this is $$\frac{d}{de}(\text{stuff})$$ and in that case, the thing you take the derivative of is the thing that comes after the $\frac{d}{de}$. But note that here, the $d$ on top of the fraction notation is by itself. There is nothing else on top. That's how you recognize this variant of the notation. You can see that this doesn't apply to the formula you're looking at because your formula has something besides the $d$ on top; namely, $f_1(e)$.