I've been tasked with deriving the gradient for the following non-convex function and I'd like to verify whether my solution / progress is correct so far (I'm not too confident with calculus):
$$F(U, V) = \dfrac{1}{2}\sum_{(i,j)\in{\Omega}}(M_{ij}-u_iv_{j}^T)^2 + \frac{\lambda}{2}(||U||^2_F + ||V||^2_F)$$
Where $M, U, V$ are matrices, $u_i$ and $v_j$ refer to the $i$th row and $j$th row of $U$ and $V$ respectively, and $\lambda$ is a variable denoting some learning rate.
I've obtained the following so far for the partial derivative with respect to $U$:
$$\dfrac{\partial F(U,V)}{\partial U} =-\sum_{(i,j)\in{\Omega}}(M_{ij}-u_iv_{j}^T)^2 + \lambda(||U||^2_F)$$
But I'm a bit unsure about taking the derivative for $\frac{\lambda}{2}(||U||^2_F + ||V||^2_F)$. I think what I have above is correct as $V$ would be treated like a constant (since we're doing the partial derivation with respect to $U$). I assume the approach for taking the partial derivative with respect to $V$ would be the same.
The tricky part here is computing $$ \frac{\partial}{\partial U}\sum_{(i,j)\in{\Omega}}(M_{ij}-u_iv_{j}^T)^2. $$ One approach is to go entry by entry. Note that the $p,q$ entry of this partial derivative is equal to the partial derivative of the sum with respect to the $p,q$ entry of $U$. We note that $$ \begin{align} \frac{\partial}{\partial U_{pq}} \sum_{(i,j)\in{\Omega}}(M_{ij}-u_iv_{j}^T)^2 &= \sum_{(i,j)\in{\Omega}} \frac{\partial}{\partial U_{pq}}(M_{ij}-u_iv_{j}^T)^2 \\ & = \sum_{j \in S_p} \frac{\partial}{\partial U_{pq}}(M_{pj}-u_pv_{j}^T)^2 \\ & = \sum_{j \in S_p} 2(M_{pj}-u_pv_{j}^T)\frac{\partial}{\partial U_{pq}}(M_{pj}-u_pv_{j}^T) \\ & = \sum_{j \in S_p} 2(M_{pj}-u_pv_{j}^T)(-V_{jq}) = -2\sum_{j \in S_p} (M_{pj}-u_pv_{j}^T)V_{jq}. \end{align} $$ Here, $S_p = \{j: (p,j) \in \Omega\}$.
If the known entries form a submatrix, then let $J_1$ denote the matrix whose rows are the rows of the identity matrix corresponding to the selected rows of $M$, and let $J_2$ denote the matrix whose columns are the columns are the columns of the identity matrix corresponding to the selected columns. We can then write the sum as $$ \sum_{(i,j)\in{\Omega}}(M_{ij}-u_iv_{j}^T)^2 = \|J_1(M - UV^T)J_2\|_F^2. $$ We compute $$ \begin{align} \|J_1(M - &[U+H]V^T)J_2\|_F^2 = \operatorname{tr}([J_1(M - [U+H]V^T)J_2][J_1(M - [U+H]V^T)J_2]^T) \\ & = \|J_1(M - [U+H]V^T)J_2\|_F^2 -2 \operatorname{tr}([J_1HV^TJ_2][J_1MJ_2]^T) + o(\|H\|) \\ & = \|J_1(M - [U+H]V^T)J_2\|_F^2 -2 \operatorname{tr}([V^TJ_2][J_1MJ_2]^TJ_1H) + o(\|H\|) \\ & = \|J_1(M - [U+H]V^T)J_2\|_F^2 + \operatorname{tr}([-2J_1^TJ_1MJ_2J_2^TV]^TH) + o(\|H\|). \end{align} $$ Correspondingly, we compute $$ \frac{\partial}{\partial U}\|J_1(M - [U+H]V^T)J_2\|_F^2 = -2 J_1^TJ_1MJ_2J_2^TV. $$ To make some sense of this: we can write this product in the form $-2 J_1^T[J_1MJ_2][J_2^TV]$. $[J_1MJ_2]$ is the known submatrix of $M$, and $V$ is the submatrix of $V$ obtained by taking only the rows of $V$ corresponding to known columns of $M$.