How can I optimise a scalar function over a matrix?

293 Views Asked by At

I need to optimise the following scalar function with respect to a matrix $S$. $$ f(S) = \boldsymbol{y}^{T}\boldsymbol{X}w - \boldsymbol{1}_{n}^{T} \exp \left\{ \boldsymbol{X}w + \frac{1}{2} \operatorname{diag}(\boldsymbol{X}S\boldsymbol{X}^{T}) \right\} - \frac{1}{2}(w - \boldsymbol{\mu})^{T}\Sigma^{-1}(w-\boldsymbol{\mu}) - \frac{1}{2} \operatorname{tr}(\Sigma^{-1}S) + \frac{1}{2} \log \left| S \right| -\frac{1}{2}\log \left| \Sigma \right| $$

$\boldsymbol{y}: n \times1 \text{ column vector}$

$\boldsymbol{X}: n \times p \text{ matrix}$

$w, \, \boldsymbol{\mu}: p \times 1 \text{ column vectors}$

$S, \, \Sigma: p \times p \text{ semi-positive definite matrices}$

$\exp: \text{element-wise exponentiation}$

$\boldsymbol{1}_{n}: n \times 1 \text{ column vector with only ones}$

$\operatorname{diag}: \text{vectorizing the diagonal elements of the matrix inside}$

The Newton-Raphson method encompasses the optimisation of a scalar function with respect to a vector, resulting in computing the Jacobian matrix and then iteratively performing line search. However, here it's not a vector that is involved but rather a matrix. Is the Newton-Raphson method implemented equivalently, or is there something I should modify? (If there are other methods I can use, please tell me. It doesn't necessarily have to be Newton-Raphson.)

1

There are 1 best solutions below

4
On

To save some typing, define
$$\eqalign{ E &= {\rm Diag}\Bigg(\exp\Big(Xw+\frac{1}{2}{\rm diag}(XSX^T)\Big)\Bigg) \cr M &= \Sigma^{-1} \cr }$$ Next, collect all of the pieces of the objective function which do not depend on $S$ into a constant term $K$, and write a simplified version $$\eqalign{ f &= K + \frac{1}{2}\Big({\rm tr}(\log(S)) - M:S\Big) - 1:E \cr }$$ Per your previous question, you already know the gradient of the final $E$-term. The gradient of the other 2 terms are simple. Thus the gradient of the overall function is $$\eqalign{ G &= \frac{\partial f}{\partial S} &= \frac{1}{2}\Big(S^{-1}-M-X^TEX\Big) \cr }$$ Any method based on the Hessian {Newton-Raphson, Quasi-Newton, Levenberg-Marquardt} is going to be impractical, since the Hessian will be a $(p\times p\times p\times p)\,$ 4th-order tensor --- or a $(p^2\times p^2)$ matrix if you apply vectorization tricks.

Also, finding a closed-form expression for the Hessian will be quite challenging. So that leaves you with gradient-based methods, which is okay.

Barzilai-Borwein is particularly easy to adapt to the situation where the working variables are matrices instead of vectors. The iteration is $$\eqalign{ dS_{k} &= S_{k} - S_{k-1} \cr dG_{k} &= G_{k} - G_{k-1} \cr \lambda_{k} &= \frac{dS_{k}:dG_{k}}{dG_{k}:dG_{k}} \cr S_{k+1} &= S_{k} - \lambda_{k} G_{k} \cr }$$ In place of the scalar product of 2 vectors, e.g. $(ds^Tdg),\,$ in the typical vector-based method, the above iteration substitutes the Frobenius product of the corresponding matrices, $(dS:dG)$.

The Barzilai-Borwein method works just fine for most problems, but may require an unconventional, non-monotone line search for "hard" problems. Google for "Raydan Barzilai" for details.