Very basic question. I am quite sure that people do that, I am just not aware if that has a unique name.
If we have $\text{proj}: \mathbf R^n \to \mathcal X$ - cheap and differentiable projection operator (e.g. projection to unit sphere or to stochastic matrixes).
Can we solve
$$ \arg\min_{x\in \mathbf R^n} f(\text{proj}(x)) + ||x - \text{proj}(x)||^2 $$
instead of
$$ \arg\min_{x\in \mathcal X} f(x) $$ via, lets say, gradient decent? It seems reasonable, at least if set $\{\text{proj}^{-1}(x) | x\}$ is convex for all $x \in \mathcal X$ (which seems often the case), then it must be easy to prove that their minimas should match (if you are not in $\mathcal X$, you can improve target by moving towards it).
I am interested in the case when $f$ itself is not convex and we are actually searching for some local optima.
If differs from adding $L(d(x, \mathcal X))$ to objective because in our case projections in applied to x before passing it to f.
Thank you.
I haven't seen your specific formulation before, but it feels like a combination of two existing methods:
Projected gradient descent uses the original objective, but applies a projection step after each gradient descent iteration: $$x^{k+1}=\operatorname{proj}(x^k - \alpha\nabla f(x^k)).$$ One can use a projected line search to choose the ideal step size $\alpha$, that is, by solving $\min_\alpha f(\operatorname{proj}(x^k - \alpha\nabla f(x^k)))$.
Penalty methods construct an unconstrained problem $\min_{x\in\mathbb R^n} g(x)$ whose solution is close to the optimum of the original constrained problem, and then solve it using unconstrained optimization schemes like gradient descent or Newton's method. A penalty term is added to discourage the iterates from moving far from the feasible set, for example $$g(x) = f(x) + \frac12\mu\|x-\operatorname{proj}(x)\|^2.$$ Of course the optimum here may not lie exactly in $\mathcal X$, so one typically solves a series of problems with increasing $\mu$.
The difficulty with penalty methods is that as $\mu$ gets larger and larger the conditioning of the problem becomes worse and worse. You won't have that problem as you don't need to change $\mu$, but you may still have to consider what $\mu$ will give you the best conditioning. For example, if you start with the objective $f(x) = 1000\|x\|^2$, or $f(x) = \frac1{1000}\|x\|^2$, they're both perfectly well-conditioned, but if you modify either of them to $g(x) = f(\operatorname{proj}(x)) + \|x-\operatorname{proj}(x)\|^2$ the Hessian of the modified objective has condition number $1000$. Newton's method will work fine, but gradient descent may struggle. That said, if you have prior knowledge of the range of eigenvalues of $f$'s Hessian, you can pick $\mu$ in that range and it won't worsen the conditioning.