Defining the Hessian of a function that takes general matrices as an input

585 Views Asked by At

I'm trying to analyze optimization problems of the form $$\text{Minimize: }||A - XY||^{2}_{F}$$ where $A$ is size [m x n], $X$ is size [m x k], $Y$ is size [k x n], $k\le min(m,n)$, and $A$, $X$, and $Y$ contain only real numbers. It is known that in the general case this problem is not convex, but several variants of it are (Principal Component Analysis can be formulated this way, for example). My question relates to formulating the Hessian matrix of this type of system.

Using the matrix cookbook and several other online references I have calculated the Hessian as follows $$H(f(X,Y)) = \left( \begin{array}{ccc} -2YY^{T} & -2A + 4XY \\ -2A + 4XY & -2X^{T}X \end{array} \right)$$ with $f(X,Y) = ||A - XY||^{2}_{F}$. Here are the steps I used to arrive at this form. $$\frac{\partial f}{\partial X} = -2(A - XY)Y^{T}, \frac{\partial f}{\partial Y} = -2X^{T}(A - XY) \\\frac{\partial^{2} f}{\partial X^{2}} = -2YY^{T}, \frac{\partial^{2} f}{\partial Y^{2}} = -2X^{T}X\\\frac{\partial^{2} f}{\partial X \partial Y} = -2A + 4XY, \frac{\partial^{2} f}{\partial Y \partial X} = -2A + 4XY$$

My first question is whether or not this is correct (I'm moderately confident that the partials in terms of only X or Y are correct, but much less confident in the others). My second question is if the above is correct, does that mean the Hessian matrix is undefined when $A$ is not square? That appears to be the case because when $A$ is rectangular $-2YY^{T}$ and $-2X^{T}X$ are of different dimensions, which makes the Hessian a matrix of size $\left( \begin{array}{ccc} [m \times m] & [m \times n] \\ [m \times n] & [n \times n] \end{array} \right)$, but I'm relatively new to the concepts behind matrix calculus so I can't tell if I missed something (for instance, are one of the symmetric partials supposed to be transposed?).

1

There are 1 best solutions below

1
On

I am afraid that your formula for the Hessian is not right.

I am going to show here a precise component-by-component solution in the particular case of $2 \times 2$ matrices, in disagreement with your general solution.

(I don't pretend in the following to provide a general solution).

Notational conventions:

$$A=\begin{bmatrix} p & q\\ r & s \end{bmatrix} \ \ \ X=\begin{bmatrix} a & b\\ c & d \end{bmatrix} \ \ \ Y=\begin{bmatrix} e & f\\ g & h \end{bmatrix} $$

The expression to be minimized is the square of the Frobenius norm of $A-XY$, i.e.,

$$s \ = \ (p - a e - b g)^2 + (q - a f - b h)^2 + (r - c e - d g)^2 + (s - c f - d h)^2$$

where the variables are $a,b,c,d,e,f,g,h$ ; $p,q,r,s$ are fixed numbers.

The upper left part of the Hessian ($4 \times 4$ block of second order partial derivatives with respect to variables $a,b,c,d$) is obtained through straightforward computations (with Mathematica, I confess)

$$\begin{bmatrix} 2YY^T & 0\\ 0 & 2YY^T \end{bmatrix} \ \ \text{instead of} \ \ -2YY^T$$

The presence of $0$ blocks is understandable: for example $\dfrac{\partial^2 s}{\partial a \partial c}$ is clearly zero.

The upperright block is

$$2\begin{bmatrix} 2 a e + b g - p & 2 a f + b h - q & b e & b f\\ a g & a h & a e + 2 b g - p & a f + 2 b h - q\\ 2 c e + d g - r & 2 c f + d h - s & d e & d f\\ c g & c h & c e + 2 d g - r & c f + 2 d h - s \end{bmatrix}$$

where everything looks scrambled, and rather far from your results (even if one finds something that is similar to $XY-A$).

Question: Don't you have any assumption about matrices $A$, $X$, $Y$? (not symmetric, not positive definite...) ?

Remark: I would have liked to follow your computation and spot a possible mistake but, honestly, I don't master the technique of derivation with respect to matrices.