I want to project a vector $\tilde{x}$ onto a hyperplane, which leads to the following optimization problem:
$\min_x \frac{1}{2} ||x - \tilde{x}|| \quad s.t. \quad w^{T} x + b=0$
Using Lagrangians I can write:
$\mathcal{L}(x,\lambda) =\frac{1}{2} ||x - \tilde{x}|| + \lambda (w^{T} x + b)$
So to solve the problem I have to calculate the derivative of $\mathcal{L}(x,\lambda)$ w.r.t. to x.
However, I'm not quite sure how to correctly derive $||x - \tilde{x}||$ w.r.t. $x$.
Is it valid to rewrite the problem as following, without changing the result, does this make the derivation easier?
$\mathcal{L}(x,\lambda) =\frac{1}{2} ||x - \tilde{x}||^2 + \lambda (w^{T} x + b)$
At some places I saw the equivalience:
$||x - \tilde{x}||^2 = ||x||^2 + ||y||^2 + 2xy $
But I'm not sure if this is correct, neither it's obvious to me why it should be.
I'm especially confused because there is no p for the norm given, is there any convention to just assume $p=2$ or any other arbitrary number?
In general, when projecting, we use the 2 norm. You're basically looking at a particular proximal operator. It is fine to use the square of the norm rather than just the norm, it doesn't make a difference as far as the projection is concerned. You can derive the formula for taking the gradient of the squared 2-norm explicitly component wise.