Matrix derivative of sum of squared errors

561 Views Asked by At

I have a matrix $\textbf{X}_{m\times n}$

I am trying to take the derivative of the following expression w.r.t $\textbf{X}_{m\times n}$,

$$\|\textbf{1}_{1\times m} \textbf{X}_{m\times n} - A_{1\times n}\|_2^2 + \|\textbf{1}_{1\times n} \textbf{X}^T_{m\times n} - B_{1\times m}\|_2^2 $$

when I take the derivative with respect to $X$, I am getting

$$ 2\cdot (\textbf{1}_{1\times m} \textbf{X}_{m\times n} - A_{1\times n})\cdot\textbf{1}_{1\times m} + 2\cdot (\textbf{1}_{1\times n} \textbf{X}^T_{m\times n} - B_{1\times m})\cdot\textbf{1}_{1\times n} $$

$A , B$ are constant matrices (or vectors).

I am unable to figure out what I am doing wrong in the derivative. I am not getting the matrix subscripts right for multiplication in the derivative.

1

There are 1 best solutions below

0
On BEST ANSWER

Write the norms in terms of the inner/Frobenius product, which I'll denote by a colon. To clarify things further, let's use lowercase letters for vectors and reserve uppercase for matrices. $$\eqalign{f=(1^T_mX-a^T):(1^T_mX-a^T)+(X1_n-b):(X1_n-b)\cr\cr}$$ In this form it is straightforward to find the differential and gradient of the function $$\eqalign{ df &= 2(1^T_mX-a^T):1^T_m\,dX + 2(X1_n-b):dX\,1_n \cr &= 2\,1_m(1^T_mX-a^T):dX + 2(X1_n-b)1^T_n:dX \cr &= 2\,\big(1_m1^T_mX-1_ma^T + X1_n1^T_n-b1^T_n\big):dX \cr \cr \frac{\partial f}{\partial X} &= 2\,\big(1_m1^T_mX-1_ma^T + X1_n1^T_n-b1^T_n\big) \cr\cr }$$ Note that the Frobnius product is merely a convenient infix notation for the trace, i.e. $$A:B={\rm tr}(A^TB)$$