How to take the derivative of minimum of a norm?

134 Views Asked by At

Suppose $f: \mathbb{R}^n \rightarrow \mathbb{R}$ where $f$ is the following:

$$ f(z) = \begin{cases} 0 \,\,\,\, z \in C \\ \min_{x\in C} \frac{1}{2} \|x-z\|_2^2 \,\,\,\ z\notin C \end{cases} $$

where $C$ is a closed convex $C$ in $\mathbb{R}^n$ and $x$ is a point in $C$.

How can we find derivative of $f$.

I know the answer is

$$ f'(z) = \begin{cases} 0 \,\,\,\, z \in C \\ z-x \,\,\,\ z\notin C \end{cases} $$ but how would we find this derivative using the definition?

Is there any way to get that using:

$$ f'(z;d) = \lim_{d \rightarrow 0^{+}} \frac{\min_{x\in C} \frac{1}{2} \|x-(z+td)\|_2^2 - \min_{x\in C} \frac{1}{2} \|x-z\|_2^2}{t} $$

1

There are 1 best solutions below

2
On BEST ANSWER

Actually the answer is $$f^\prime(z) = z - x,$$ and not $x - z$. Here $x$ is a point of $C$ nearest to $z$.

First of all, let us denote by $x(u)$ a point of $C$ nearest to $z + u$. We will show that $\|x(u) - x\| \to 0$ as $u\to 0$ (i.e., function mapping $z$ to its nearest point is continuous). Indeed, assume for contradiction that there is a limit point of $x(u)$ which is different from $x$, i.e., there is a sequence $u_m\in\mathbb{R}^n$ such that $u_m\to 0$ and $x(u_m) \to x^\prime \neq x$. Then $x^\prime\in C$ since $C$ is closed. On the other hand, $x^\prime$ is more far from $z$ than $x$ (this is because the nearest point is unique). So assume that $\|x^\prime - z\| - \|x - z\| \ge \varepsilon > 0$. Let $m$ be such that $\| u_m\| \le \varepsilon/10$ and $\|x(u_m) - x^\prime\| \le \varepsilon/10$. Then $z + u_m$ is closer to $x$ than to $x(u_m)$. Indeed, by using the triangle inequality multiple times we get: \begin{align*} \| z + u_m - x\| &\le \|z - x\| + \|u_m\| \le \|z - x\| + \varepsilon/10\\ &\le \|z - x^\prime\| - \varepsilon + \varepsilon/10 = \|z - x^\prime\| - 9\varepsilon/10 \\ &\le \|z - x(u_m)\| + \|x(u_m) - x^\prime\| - 9\varepsilon/10 \le \|z - x(u_m)\| - 8\varepsilon/10 \\ &\le \|z + u_m - x(u_m)\| + \|-u_m\| - 8\varepsilon/10 \le \|z + u_m - x(u_m)\| - 7\varepsilon/10, \end{align*} and this contradicts the definition of $x(u_m)$.

By definition we have to show that $$f(z + u) - f(z) - \langle u, z - x\rangle = o(\|u\|)$$ for every $z$ when $u\to 0$. Let us write it down: \begin{align*} f(z + u) - f(z) - \langle u, z - x\rangle = 1/2\langle z + u - x(u), z + u - x(u)\rangle - 1/2 \langle z - x, z - x\rangle - \langle u, z - x\rangle\end{align*} Now, imagine that we had $x$ instead of $x(u)$ in the first term of the last expression. Then this expression would be equal to just $\langle u, u\rangle = o(\|u\|)$, as required. However, we have not $x$ but $x(u)$. So to finish the argument we have to show that the difference \begin{align*} \langle z + u - x(u), z + u - x(u)\rangle - \langle z + u - x, z + u - x\rangle \end{align*} is $o(\|u\|)$ as $u\to 0$. First of all, by definition $z + u$ i closer to $x(u)$ than to $x$, hence: $$\langle z + u - x(u), z + u - x(u)\rangle \le \langle z + u - x, z + u - x\rangle.$$ On the other hand: \begin{align*} \langle z + u - x, z + u - x\rangle &= \langle z - x, z - x\rangle + 2 \langle u, z - x\rangle + \langle u, u\rangle \\ &\le \langle z - x(u), z - x(u)\rangle + 2 \langle u, z - x\rangle + \langle u, u\rangle \end{align*} The latter is because $z$ is closer to $x$ than to $x(u)$. Once again, imagine that we replace $x$ by $x(u_m)$ in the last expression. Then we obtain exactly $\langle z + u - x(u), z + u - x(u)\rangle$. However, the cost of that is $\langle u, z - x\rangle - \langle u, z - x(u)\rangle = \langle u, x(u) - x\rangle$. Fortunately, the absolute value of the last expression is at most $\|u\| \cdot \|x(u) - x\| = o(\|u\|)$ and the latter is due to the fact that $x(u) \to x$ as $u\to 0$, as we have proved.