Does this iterative optimization method have a name? (problem and method described here)

69 Views Asked by At

I will describe an iterative optimization problem and method, and I am interested in if this method has a specific name.

Problem: Let $f: \mathbb{R}^2 \rightarrow \mathbb{R}^3$. (In general, $f:\mathbb{R}^m \rightarrow \mathbb{R}^n$, m < n.) Let the loss function be $L(x) = \frac{1}{2}||f(x)||^2.$ Find $x$ which minimizes the loss. We assume that $f$ has first derivatives. We also need a few other assumptions that I think we can take for granted for this discussion.

Partials notation: Let $\partial_1, \partial_2$ be the partial with respect to coordinates 1 and 2 of the domain.

Method. Hereafter named "Mystery iterative method." Let $x_0$ be the initial guess.

We compute $\partial_1 f(x_0), \partial_2 f(x_0) \in \mathbb{R}^3$.

We seek reals $\beta_1,\beta_2$ such that $f(x_0) + \beta_1 \partial_1 f(x_0) + \beta_2 \partial_2 f(x_0)$ has minimal $L_2$ norm. This is simply a least squares problem with 3 data points and 2 variables. Let us assume that this problem is not singular, so $\beta_1, \beta_2$ are well defined.

Then the next guess is $x_1 = x_0 + (\beta_1, \beta_2)$ (adding this vector). The next iteration only depends on $x_1,$ and we repeat until a suitable stopping condition, which is not specified by this method.

Definitely not gradient descent! To show that this is not gradient descent (without being clever in a way I can't see), I prefer an example.

Suppose that $f(x_0) = (1, 2, 2), \; \; \partial_1 f (x_0) = (1, 1, 1),$ and $ \; \partial_2 f(x_0) = (1, 1, 2).$ $x_0$ is arbitrary.

In the Mystery iterative method, we compute $\beta_1, \beta_2$ minimizing $(1, 2, 2) + \beta_1 (1, 1, 1) + \beta_2 (1, 1, 2).$ The answer is $(-1.0, -0.5)$, which can be verified because the residual is $(-0.5, 0.5, 0),$ which is orthogonal to both $(1, 1, 1)$ and $(1, 1, 2).$ Thus, the step is $(-1.0, -0.5),$ and the next guess is $x_0 + (-1.0, -0.5).$

In gradient descent, we compute $\partial_1 L(x_0)$ and $\partial_2 L(x_0).$

$\partial_1 L(x_0) = f(x_0) \cdot \partial_1 f(x_0) = (1, 2, 2) \cdot (1, 1, 1) = 5.$

$\partial_2 L(x_0) = f(x_0) \cdot \partial_2 f(x_0) = (1, 2, 2) \cdot (1, 1, 2) = 7.$

In gradient descent, the step is a positive multiple of $(-5, -7)$ and thus cannot possibly equal the step in the Mystery iterative method, which is $(-1.0, 0.5).$

Moreover, the Mystery iterative method does not have choosable step sizes, unlike gradient descent.

Misc: Example. (Lots of approximations.) Let $I_1$ and $I_2$ be grayscale images, or real matrices, of dimension $(480, 720).$ It is standard in CV to say that this corresponds to the rectangle in $\mathbb{R}^2$ $[0, 720] \times [0, 480].$

Let $T$ be an affine transformation on $\mathbb{R}^2,$ and assume that the image of the rectangle $R_0:= [0, 720] \times [0, 480].$ Contains the inner 60% by 60% of this rectangle, or $R_1 := [144, 576] \times [96, 384].$

We can then consider the difference (1) the affine transformation of $I_1$ by $T$, then restricted to the rectangle $R_1$ and (2) the image $I_2$ restricted to the rectangle $R_1.$ This difference is done pixel-wise, and as usual, (1) really involves certain re-sampling steps. Let us define the function $D(T)$ (domain is affine transformations with the "rectangle-covering property") such that $D(T)$ equals the difference of resulting arrays (2) and (1) above, which is a $(288, 432)$ array.

The goal is to find a transform $T^*$ such that $D(T^*)$ is minimal in the $L_2$ norm. (We unwind the $(288, 432)$ array into a dimension-$288 \times 432$ vector). The minimal $D$ is unlikely to be the zero vector because of resampling error. Also suppose that for various reasons, we can't get a great initial guess $T_0.$ I am aware that this problem is often approached with CV keypoints, but I would rather not get into that here.

This use case then corresponds with a function $f: \mathbb{R}^6 \rightarrow \mathbb{R}^{288 \times 432}.$

1

There are 1 best solutions below

0
On BEST ANSWER

This is the Gauss-Newton algorithm. (See especially the Notes section.)