Gradient Descent wrt matrix

713 Views Asked by At

Suppose $\mathbf{X}$ is a p x n matrix, $\mathbf{Y}$ is q x n, $\mathbf{C}$ is an unknown q x p matrix. Can you minimize the following with gradient descent to find C? (multivariate regression)

$|| \mathbf{Y}-\mathbf{C}\mathbf{X}||^2_F$

Wouldn't the gradient wrt $\mathbf{C}$ just be

$\nabla g = -2 (\mathbf{Y}-\mathbf{C}\mathbf{X}) \mathbf{X}'$

3

There are 3 best solutions below

5
On

Your function can be written in the form $$ g(C) = \operatorname{tr}[(Y - CX)(Y - CX)'] = \operatorname{tr}[CXX'C] - 2\operatorname{tr}[CXY'] + [\text{const}] $$ Following this table, the gradient (either the transpose of the derivative in the left side column, or equivalently the derivative in the right-most column) will be $$ \frac{\partial g}{\partial C} = [(Y - CX)(-X') + (-X)(Y - CX)']' - [2XY']' \\ = -X(Y - CX)' - (Y - CX)X' - 2YX'. $$


For a more abstract approach, note that

$$ g(C+H) = \operatorname{tr}[(Y - [C+H]X)(Y - [C+H]X)']\\ = g(C) - \operatorname{tr}[HX(Y - CX)'] - \operatorname{tr}[(Y - CX)X'] + o(H)\\ = g(C) - 2\operatorname{tr}[HX(Y - CX)'] + o(H). $$ We therefore find that $dg = -\operatorname{tr}[2X(Y - CX)' dC]$, so that the gradient is equal to $$ -[X(Y - CX)']' = -2(Y - CX)X'. $$ This matches your result.

4
On

Or explicitly, if $g(C) = \langle Y-CX, Y-CX \rangle$, we have $g(C+H) -g(C) = -2 \langle Y-CX, HX \rangle+ O(\|H\|^2)$ and since $\langle Y-CX, HX \rangle = \operatorname{tr}((Y-CX)^T HX) = \operatorname{tr}(X(Y-CX)^T H) = \langle (Y-CX)X^T, H \rangle $.

Hence $\nabla g(C) = -2(Y-CX)X^T$.

0
On

You do not need gradient descent to solve a linear equation.
Simply use the Moore-Penrose inverse $X^+$ $$\eqalign{ CX &= Y \quad\implies\quad C = YX^+ \\ }$$ You can also include contributions from the nullspace (multiplied by an arbitrary matrix $A$) $$\eqalign{ C = YX^+ + A(I-XX^+) \\ }$$