Analysis of optimization procedure for two equivalent optimization problems

29 Views Asked by At

I come up with this question. Assume the optimization problem is: $$argmin_w\|\hat{y}(w)-y\|^2$$ Now suppose there is a overcomplete set for the representation of $\hat{y}$ and $y$, this set has lots of filters, to me above problem is equivalent to the following one: $$argmin_w\sum_{n}\|F_n*(\hat{y}(w)-y)\|^2$$ in which $F_n$ is the filter from the filterbank. and $*$ shows convolution. Now the question is that: if we know that the optimum value for this two problems is the same, but we use iterative algorithms for solving them, like Newton's method or Gradient Descent, What are the differences in optimization procedures for these problems?

1

There are 1 best solutions below

1
On

The filtered objective function will differ from the original objective function. Some things to look out for in objective functions are the following:

Is it differentiable (smooth)? Differentiability is required for almost all deterministic procedures. Statistical optimisation can tackle non differentiable objective functions

Is it convex? Strictly convex? (Various different types of convexity). Convexity usually guarantees that the routine converges to the global optimum, not a local one.

Can we prove from the form of the objective function that the optimum will lie in a certain region? For example you could prove the monotonicity of the objective function on a certain domain.

Another massive thing that comes to mind is you'll have to compute a sum for every iteration on the filtered objective function, that will make the program take ages!