I am working on fitting a model using gradient descent. My initial equation is:
$MSE = \frac{1} {N} \sum^N_{i=1} (M_i - \theta*P_{i-1} - K * Y_i))^2$.
$M$ and $P$ are $5$ by $N$ matrices, $\theta$ is 5 by 5, $K$ is 5 by 96, and $Y$ is 96 by $N$.
I want to take the derivative of this with respect to $K$. I get:
$\frac{dMSE}{dK} = \frac{2} {N} \sum^N_{i=1} (-M_i + \theta*P_{i-1} + K * Y_i))(Y_i)^T$.
Can someone tell me if this is correct and if it isn't what I did wrong?