sum of square derivative

1.5k Views Asked by At

What is the partial derivative of the following expression with respect to $U_i,V_j$ and M, respectively: $$L=\sum_{i}^m \sum_{j}^n(P_{ij} - g(U_i^T M V_j))^2 $$ where $$ U \in R^{d*m} , V \in R^{d*n}, M \in R^{d*d} , P\in R^{m*n} $$ are matrices.

thanks.

2

There are 2 best solutions below

1
On

the full answer seen in the text 'linear algebra with applications'by Bernard Kolman where M is constant matrix.

0
On

Assume we know the scalar function $g(x)$ and its first derivative $h=\frac{dg}{dx}$, which will be applied elementwise to matrix arguments.

For convenience, define the matrices $$\eqalign{ X &= U^TMV \cr H &= h(x) \cr G &= g(X) &\implies dG=H\odot dX \cr }$$ where $\odot$ represents the elementwise/Hadamard product. We will also use a colon to represent the trace/Frobenius product, i.e. $\,A:B={\rm tr}(A^TB)$

Rewrite the function in terms of the above conventions and find its differential $$\eqalign{ L &= (G-P):(G-P) \cr dL &= 2\,(G-P):dG \cr&= 2\,(G-P):H\odot dX \cr&= 2\,(G-P)\odot H:dX \cr }$$ Now let's say we want the gradient wrt $V$. We simply expand $dX$ in terms of $dV$ in the differential, and isolate $dV$ on the RHS of the Frobenius product. $$\eqalign{ dL &= 2\,(G-P)\odot H:(U^TM\,dV) \cr &= 2\,M^TU\Big((G-P)\odot H\Big):dV \cr \frac{\partial L}{\partial V} &= 2\,M^TU\big(G\odot H-P\odot H\big) \cr }$$ To obtain the gradient wrt a particular column of $V$, multiply the full matrix result by the corresponding standard base vector $$\frac{\partial L}{\partial v_j}=\bigg(\frac{\partial L}{\partial V}\bigg)\,e_j$$ One final "trick" is that the standard basis vectors (and only those vectors) distribute across a Hadamard product $$\frac{\partial L}{\partial v_j} = 2\,M^TU\,(g_j\odot h_j-p_j\odot h_j)$$