Armijo's rule line search

3.6k Views Asked by At

I have read a paper (http://www.seas.upenn.edu/~taskar/pubs/aistats09.pdf) which describes a way to solve an optimization problem involving Armijo's rule, cf. p363 eq 13.

The variable is $\beta$ which is a square matrix. If $f$ is the objective function, the paper states that Armijo's rule is the following:

$f(\beta^{new})-f(\beta^{old}) \leq \eta(\beta^{new}-\beta^{old})^T \nabla_{\beta} f$

where $\nabla_{\beta} f$ is the vectorization of the gradient of $f$.

I am having problems with this as the expression on the right of the inequality above does not make sense due to dimension mismatch. I am unable to find an analogue of the above rule elsewhere. Can anyone help me as to figure out what the right expression means? The left expression is a scalar while the right expression suffers from a dimension mismatch...

1

There are 1 best solutions below

0
On

In general (i.e. for scalar-valued $x$), Armijo's rule states $$f(x^{new}) - f(x^{old}) \le \eta \, (x^{new}-x^{old})^\top \nabla f(x^{old}).$$ Hence, you need the vectorization of $\beta^{new}-\beta^{old}$ on the right hand side.

(Alternatively, you could replace $\nabla_\beta f$ by the gradient w.r.t. the frobenius inner product and write $(\beta^{new}-\beta^{old}) : \nabla_\beta f$, where $:$ denotes the frobenius inner product).