Analyzing the Hessian of this function

137 Views Asked by At

The problem I'm looking at is, given a matrix $A$ of size $m\times n$ where $A_{ij}\ge 0$ minimize $f(x, y) = \|A - xy\|^{2}_{F}$ where $x$ is a column vector of length $m$, $y$ is a row vector of length $n$, and $\|x\|_{2} = \|y\|_{2}$. My attempts to do this led to me doing the following calculations:

$\frac{\partial f}{\partial x} = -2(A - xy)y^{T} = -2Ay^{T} + 2x\|y\|_{2}, \frac{\partial f}{\partial y} = -2x^{T}(A - xy) = -2x^{T}A + 2y\|x\|_{2} \Rightarrow$ $\frac{\partial^{2} f}{\partial x^{2}} = 2\|y\|_{2}, \frac{\partial^{2} f}{\partial y^{2}} = 2\|x\|_{2}, \frac{\partial^{2} f}{\partial x \partial y} = -2A + 2x\frac{y}{\|y\|_{2}}, \frac{\partial^{2} f}{\partial y \partial x} = -2A + 2\frac{x}{\|x\|_{2}}y$

Which seems to indicate that the Hessian for this function $f(x,y)$ is $$H(f(x,y)) = -2 \begin{bmatrix} \|y\|_{2} & A - x\frac{y}{\|y\|_{2}} \\ A - \frac{x}{\|x\|_{2}}y & \|x\|_{2} \end{bmatrix}$$

I'm having trouble conceptualizing this object, which makes me think I made a mistake somewhere in my calculations (probably in $\frac{\partial^{2} f}{\partial x \partial y}$ if I had to guess). Basically my question is did I make a mistake and if so where? And conditionally, if I didn't make a mistake (or if the general structure of the Hessian is correct regardless of mistakes) what sort of mathematical object is this Hessian? A 3rd (or higher) order tensor (this reason this doesn't make sense to me is because the diagonal terms are scalars so their dimensionality doesn't match the off-diagonal terms)?

EDIT:

I went back over the calculations again and the thing that jumps out at me as a possible error on my part is going from $\frac{\partial f}{\partial x} \rightarrow \frac{\partial^{2} f}{\partial x \partial y}$; specifically, taking the derivative of the term $2x\|y\|_{2}$ (and the analogous step in calculating $\frac{\partial^{2} f}{\partial y \partial x}$). According to every reference I can find, $\frac{df}{dx}\|x\|_{2}=\frac{x^{T}}{\|x\|_{2}}$, but following that rule here results in a dimension mismatch because the product $x\frac{y^{T}}{\|y\|_{2}}$ is undefined when $x$ and $y$ are both column vectors. Am I missing something (I'm sure I am, so maybe I should just ask what I'm missing)?

1

There are 1 best solutions below

2
On BEST ANSWER

First of all, under your definition, $yy^T=||y||_2^2$ not $||y||_2$.

Then, $\partial f/ \partial x = -2Ay^T + 2Diag(||y||_2^2)x$, where $Diag(||y||_2^2)$ is a diagonal matrix whose dimension is $dim(x)\times dim(x)$. Therefore, $\partial^2 f/ \partial x^2 = Diag(||y||_2^2)$ and the dimension of you hessian becomes correct.

Hope this help.