$$ \text{Loss}(y, \hat{y}) = \sum_{i=1}^n \left( y- \hat{y} \right)^2 $$ $$ \begin{split} \frac{\partial \text{Loss}(y, \hat{y})}{\partial W} &= \frac{\partial \text{Loss}(y, \hat{y})}{\partial \hat{y}} \frac{\partial \hat{y}}{\partial z} \frac{\partial z}{\partial W} \quad \text{where}~z = Wx + b \\ & = 2(y-\hat{y}) \cdot \text{derivative of sigmoid function}\cdot x \\ & = 2(y - \hat{y})~ z(1-z)~ x \end{split} $$
Chain rule for calculating derivative of the loss function with respect to the weights
Sigmoid of $x = \frac{1}{(1 + (e^{-x}))}$
Sigmoid derivative of $x = x\cdot (1-x)$
$y$ here is the required output. $\hat{y}$ here is calculated output.
$\hat{y}$ = sigmoid of (input * weight) where input is $x$ and weight is $W$.
The $\hat{y}$ and $y$ both are $1 \times 1$ matrix. Also, input and weight are matrices. Can someone please explain the derivation in the picture to me? Please.
I think the answer shoud be...
Let's $$f(w) = \sum^n_{i=1}(y-\hat{y})^2 = \lVert \mathbf{y}-\mathbf{\hat{y}} \rVert^2$$ We know that $\mathbf{\hat{y}} = \frac{1}{1+e^{-Xw}}$.
First of all, please correct me if I am wrong, but I think $\frac{\partial f}{\partial \hat{y}} = -2(\mathbf{y}-\mathbf{\hat{y}} )$.
$\frac{\partial \hat{y}}{\partial z} = \frac{\partial \hat{y}}{\partial (Xw + b)} = \frac{\partial (\mathbf{1} + e^{-Xw})^{-1}}{\partial (Xw + b)} = \frac{e^{-Xw}}{(1+e^{-Xw})^2} = (1-\hat{y})\cdot \hat{y}$.
$\frac{\partial z}{\partial w} = \frac{\partial Xw +b}{\partial w} = x$
$\therefore \frac{\partial f}{\partial w} = -2(y-\hat{y})\cdot (1-\hat{y})\cdot \hat{y} \cdot x$