Derivative of norm of difference between two vectors

795 Views Asked by At

I want to project a vector $\tilde{x}$ onto a hyperplane, which leads to the following optimization problem:

$\min_x \frac{1}{2} ||x - \tilde{x}|| \quad s.t. \quad w^{T} x + b=0$

Using Lagrangians I can write:

$\mathcal{L}(x,\lambda) =\frac{1}{2} ||x - \tilde{x}|| + \lambda (w^{T} x + b)$

So to solve the problem I have to calculate the derivative of $\mathcal{L}(x,\lambda)$ w.r.t. to x.

However, I'm not quite sure how to correctly derive $||x - \tilde{x}||$ w.r.t. $x$.

Is it valid to rewrite the problem as following, without changing the result, does this make the derivation easier?

$\mathcal{L}(x,\lambda) =\frac{1}{2} ||x - \tilde{x}||^2 + \lambda (w^{T} x + b)$

At some places I saw the equivalience:

$||x - \tilde{x}||^2 = ||x||^2 + ||y||^2 + 2xy $

But I'm not sure if this is correct, neither it's obvious to me why it should be.

I'm especially confused because there is no p for the norm given, is there any convention to just assume $p=2$ or any other arbitrary number?

2

There are 2 best solutions below

0
On

In general, when projecting, we use the 2 norm. You're basically looking at a particular proximal operator. It is fine to use the square of the norm rather than just the norm, it doesn't make a difference as far as the projection is concerned. You can derive the formula for taking the gradient of the squared 2-norm explicitly component wise.

1
On

Define the variable $z=(x-\bar{x})$ and write the objective function as $$\eqalign{ \phi &= \|x-\bar{x}\| = \|z\| \cr }$$ Now write the general solution to the linear constraint as the least-squares solution plus an arbitrary contribution from the null space $$\eqalign{ w^Tx &= -b \cr x &= (I-ww^+)y - (w^+)^Tb \cr }$$ where $w^+$ denotes the pseudoinverse and $y$ is an arbitrary vector.

Note that $P=(I-ww^+)$ is an orthoprojector (i.e. $P^2=P=P^T$) into the nullspace of $w$. Therefore $$\eqalign{ P(w^+)^T &= 0 \cr Px &= P^2y - P(w^+)^Tb \,\,= Py \cr }$$

Substitute this into the objective function to obtain an unconstrained problem with respect to $y$. $$\eqalign{ \phi^2 &= \|z\|^2 = z:z \cr \phi\,d\phi &= z:dz \cr d\phi&= \phi^{-1}z:dx \cr &= \phi^{-1}z:P\,dy \cr &= \phi^{-1}Pz:dy \cr \frac{\partial\phi}{\partial y} &= \phi^{-1}Pz \cr }$$ Set the gradient to zero and solve $$\eqalign{ Pz &= 0 \implies Px &= P\bar{x} \implies Py &= P\bar{x} \cr }$$ Substitute this result into the parametric expression for $x$ $$\eqalign{ x &= Py - (w^+)^Tb \cr &= P\bar{x} - (w^+)^Tb \cr &= (I-ww^+)\bar{x} - (w^+)^Tb \cr\cr }$$ For vectors, there is a closed-form expression for the pseudoinverse $$w^+ = \frac{w^T}{w^Tw}$$ In some of the intermediate steps above, a colon was used to denote the trace/Frobenius product $$A:B = {\rm tr}(A^TB)$$