Maximization of square exponential function

144 Views Asked by At

I want to maximize a squared exponential based function $f(z)$ given as $$f(z)=\sum^N_i exp\bigg \{-\frac{||\mathbf{x}_i-\mathbf{z}||^2_2}{2l^2}\bigg \}\beta_i$$

Where, $\mathbf{x}_i \in \mathbb{R}^d$ is constant vector for all $i=1,2 \dots N$ and $z$ being the variable of same dimension. $\beta_i$'s are also real constant. The $||\mathbf{x}_i-\mathbf{z}||_2$ indicate L2-Norm.

As the function is continuously differentiable I understand that Jacobian and Hassian can be used efficiently via gradient-based or newton decent based optimization methods.

But I am not sure which method of optimization to opt as $f(z)$ can have multiple optimums and under what condition I can find or bound the global optimum.

Also, I believe, the analytical way of obtaining the critical point and corresponding 2nd order derivative test for maxima is not suitable as the number of critical points explode with increase in variable dimensions.

1

There are 1 best solutions below

2
On

If $\beta_i > 0$ for all $i$ then $f(x_i) > 0$ and the problem therfore cannot be "solved" by letting $z$ go to infinity (for which the function approaches $0$). So the problem must be solved by some $z^*$ with $\lVert z^* \rVert < \infty$. In other words the problem is assured to permit a solution.

On the other hand If $\beta_i < 0$ for all $i$ then the problem is not coercive and is "solved" by letting $\lVert z \rVert \rightarrow \infty$. This can also happen if $beta_i$ are negative for some $i$ and positive for others.

In all cases you can bound the function value at the optimum as $$ f(z^*) \leq \sum_{i} \max(0, \beta_i) $$ but I'm not sure if this is very helpful.

I don't agree that the nonconvexity of the problem makes gradient-based methods unsuitable. You do need to run the algorithms with multiple initial points and choose the largest value. Even then you cannot be certain that you really have found the optimal solution, but such is simply the nature of nonconvex problems.

You can also try something like simulated annealing or stochastic gradient descent, which may "jump out" of local minima. However, it is also possible that these will "jump out of" the global minimum.

As for the question of a large amount of critical points this is going to affect all algorithms equally.

PS: I write convex as if it were a minimization problem of the negative of the function.