A list of techniques to find optimal point of an optimization problem algorithmically?

299 Views Asked by At

I am interested to collect and analyze a list of technique to find an optimal point of an optimization problem algorithmically as follows:

Problem: Find a point where the gradient has non-negative inner product with any feasible direction at the point.

Methods:

  1. Gradient descent: It is best suited to unconstrained convex problem, more specifically unconstrained strongly convex problem. It is very sensitive to condition number of the Hessian at the current iterate and change of co-ordinates.

  2. Newton method: It is best suited to unconstrained convex problem and equality constrained convex problem, more specifically unconstrained self-concordance problem.

  3. Interior point method (barrier method and central path): It is best suited to inequality constrained convex problem.

  4. Primal-dual interior point method: Closely related to barrier method but not quite the same as it needs to introduce surrogate duality gap for stopping criteria. It is best suited to semi-definite program.

Can we do futher analysis and discussion of these techniques in order to summarize the process of finding optimal points algorithmically.?

1

There are 1 best solutions below

1
On

List of optimization techniques

(i) Least squares computed exactly

Unconstrained optimization:

(ii) Conjugate gradient least squares

(iii) Gradient descent

(iv) Momentum gradient descent

(v) Newton's method

(vi) BFGS

(vii) Proximal gradient method, Majorization minimization (non-smooth)

(viii) ISTA, FISTA (non-smooth)

Constrained optimization:

(ix) ADMM

(x) Dual ascent

Examples

(1) $\min_x \|y-Ax\|^2$ with $x$ having $100,000$ variables, $y$ consisting of $200,000$ observations and $A$ is sparse.

Conjugate gradient should be used as $A^*$ can easily be computed due to $A$'s sparsity.

(2) Minimize a function $f(x_1, x_2)$. The function consists of only exponents and polynomials in $x_1, x_2$. The exact solution should be reached in a small number of steps.

Newton's method should be used as since the function consists of only polynomials and exponents, the gradient and hessian can be analytically computed.

(3) Minimize a smooth function $f(x)$ with $x$ having $10,000$ variables.

Momentum gradient descent should be used. Newton's method cannot be used as solving a 10000 x 10000 inverse problem on each iteration is infeasible. Momentum gradient descent has a better convergence rate for ill-conditioned problems than gradient descent.

(4) Minimize a function $f(x)$ with $100$ variables, where the eigenvalues of the hessian range from $10^{-3}$ to $10^6$.

Momentum gradient descent or Newton's method should be used. With 100 variables, Newton's method is reasonable and improves upon poor conditioning. Momentum gradient descent is also acceptable since it improves upon gradient descent for ill-conditioning.

(5) Minimize $f(x)$ where $f(x)$ is smooth with an ill-conditioned Hessian, but a small number of variables.

Newton's method can overcome an ill-conditioned problem with a small number of variables

(6) Minimize $f(x)$ where $f(x)$ is smooth with an ill-conditioned Hessian and a large number of variables.

Momentum gradient descent does better than gradient descent, though not as well as Newton's method, for ill-conditioned problems when the computational complexity becomes too high for a large number of variables.

(7) Minimize $\| Y-A(X) \|^2_F + \lambda \|X\|_F^2$ where $X$ is a matrix representing a large image and $A(X)$ is a linear blurring operator. Each pixel output of $A(X)$ depends only on a small number of pixels of $X$.

(8) Minimize $\|y-x\|_2^2+\lambda \|Ax\|_1$ where $A$ is some matrix.

(9) Minimize $\sum_{i=1}^K f_i(x_i)$ subject to $\sum_{i=1}^K c_i^Tx_i=0$ for some functions $f_i(x_i)$ and vectors $c_i$.