Under which condition $d(y)=\min_x f(x,y)$ is smooth ($C^1$) (similar to Danskin's theorem))?

98 Views Asked by At

Under which condition $d(y)=\min_x f(x,y)$ is smooth (differentiable, $C^1$)?

  1. To determine the smoothness of $d(y)$, I think the following conditions are required:

    Differentiability: The function $f(x, y)$ is smooth with respect to both $x$ and $y$ in the feasible set. This ensures that the partial derivatives of $f(x, y)$ exist and can be used to find the minimum.

  2. Then, is this condition enough to guarantee that $d$ is smooth?

    For example, let $x\in\mathbb{R}_+$, $y\in\mathbb{R}^n$ and $$ f(x,y)=x + \frac{\|y\|_2^2}{x}.$$ Then $d(y) = 2\|y\|_2$ is not smooth at zero point.

  3. Which conditions should we add next? What if we add the Lispchitz continuous?

    A similar result for the convex case can be found here.

  4. I would rather consider a special case $$ d(y) = \min_x f(x,y) + g(x),$$ where $g$ is a strongly convex function, Which conditions should we add?

    If the solution to $$\min_x f(x,y)+ g(x)$$ is unique, can we use implicit function theorem to obtain the smoothness of $d$?

1

There are 1 best solutions below

0
On

Interesting question. I just noticed that my answer is a bit long. If you want to win time, the interesting results are written in bold, the rest is their proofs.

First of all, differentiability of $f$ is not absolutely necessary. Take for example $f(x,y) = g(x)$ with $g$ an ugly function that admits a minimum (like $\textbf{1}_{\mathbb{Q}}$), then $d$ is constant and equals $\min g$ hence smooth. But to avoid these weird case, we shall simply assume that $f$ is smooth (at least continuous). $4$ problems can occur now,

$*$ $d$ is not necessarily well defined everywhere even when $f$ is bounded from below. You gave an example of it with $f(x,y) = x + y^2/x$ with $x > 0$ and $y \in \mathbb{R}$. the minimum is not reached when $y = 0$ (because $x > 0$) so $d(0)$ doesn't even exists.

$*$ $d$ might not even be continuous even when $f$ is smooth. Take for example, $f(x,y) = 1$ if $x > 1/y^2 + 1$, $0$ if $x < 1/y^2 - 1$ and $0 \leqslant f \leqslant 1$ when $|x - 1/y| \leqslant 1$. It is always possible to complete $f$ into a smooth function on this strip using bump functions. In particular, $f(x,0) = 0$ for all $y$ so $d(0) = 0$ but $d(y) = 1$ if $y \neq 0$. You can check that $f$ is smooth even at points $(x,0)$. It happens because the points $x$ where $f(x,y)$ reaches its minimum tend toward infinity when $y$ approaches $0$.

$*$ The third problem that can occur is when the point where the minimum is attained changes discontinuously. For example, take $f(x,y) = \cos(x) + y$ if $x < -\pi/2$, $f(x,y) = 0$ when $x > \pi/2$ and complete it into a smooth function in the strip $|x| \leqslant \pi/2$ where $f \geqslant \min\{\cos(-\pi/2) + y,0\} = \min\{y,0\}$ on this strip.

In particular, if $y$ is close to $1$, $f$ is non negative on this strip. When $y \leqslant 1$, $d(y) = \min_{x}f(x,y) = y - 1$ is reached for example at $x = -\pi$ and when $y \geqslant 1$, $d(y) = \min_{x}f(x,y) = 0$ is reach for example at $x = 2$. $d$ is non smooth at $y = 1$ because the point where the minimum is reached jumped over the strip when $x$ outstriped $1$.

$*$ Even when the minimum is reached at a points that don't leave toward infinity, $d$ might not be smooth. For example, take $f(x,y) = x^4 - 4xy$. We have $\partial_xf(x,y) = 4x^3 - 4y$ vanishes at $x = \sqrt[3]{y}$ where the minimum of $f$ is reached with $d(y) = f(\sqrt[3]{y},y) = -3y^{4/3}$. $d$ is $\mathcal{C}^1$ but not $\mathcal{C}^2$. (I couldn't find an example where $f$ is $\mathcal{C}^1$ but not $d$).

When $d$ is well defined at some $y_0$ and $f$ is such that the points $x$ such that $f(x,y)$ is small don't go toward infinity when $y$ approaches $y_0$, $d$ is well defined around $y_0$ and continuous at $y_0$. In other words,

If $f : A \times B \rightarrow \mathbb{R}$ is a continuous function with $A$ and $B$ connected open subsets of finite dimensional vector spaces, if $y_0 \in B$ is such that $x \mapsto f(x,y_0)$ admits a minimum and for all sequence $(x_n,y_n)$ of $A \times B$ such that $\limsup f(x_n,y_n) \leqslant d(y_0)$ and $y_n \rightarrow y$, we have $\{x_n\} \subset\!\subset A$ i.e. all the $x_n$ belong to some compact subset of $A$, then $d$ is well defined and continuous at $y_0$

Let us prove it. Assume that $d$ is not well defined on a neighborhood of $y_0$. It implies that there exists a sequence $y_n \rightarrow y_0$ such that for all $n \geqslant 1$, $x \mapsto f(x,y_n)$ admits no minimum. Let $K_n$ be an increasing sequence of compact subsets of $A$ such that $\bigcup_{n \geqslant 1} \overset{\circ}{K_n} = A$. Such a sequence always exists. Take for example $K_n = \{x \in A||x| \leqslant n, \mathrm{dist}(x,^c\!A) \geqslant 1/n\}$. Let $x_0$ be a point of $A$ such that $d(y_0) = f(x_0,y_0)$. Up to removing a finite number of $K_n$, we may assume that $x_0 \in K_1$.

$x \mapsto f(x,y_n)$ is a continuous function so it admits a minimum on the compact set $K_n$ but by hypothesis, it doesn't admit a global minimum. It implies that we can find $x_n$ such that $f(x_n,y_n) < \min_{x \in K_n} f(x,y_n) \leqslant f(x_0,y_n) \rightarrow f(x_0,y_0) = d(y_0)$. Notice that $f(x_n,y_n) < \min_{x \in K_n} f(x,y_n)$ implies that $x_n \notin K_n$, so $x_n \notin K_m$ if $m \leqslant n$.

These inequalities imply that $\limsup f(x_n,y_n) \leqslant f(x_0,y_0) = d(y_0)$ so $x_n$ converges in $A$ up to extraction by our hypothesis. Let $x$ be its limit. $x \in \overset{\circ}{K_{n_0}}$ for some $n_0$, which contradicts the fact that $x_n \notin K_{n_0}$ when $n \geqslant n_0$. By contradiction, we deduce that $d$ is well defined in a neighborhood of $y_0$.

For the continuity, let $y_n \rightarrow y_0$. We want to show that $d(y_n) \rightarrow d(y_0)$. Let for all $n$, $x_n \in A$ such that $d(y_n) = f(x_n,y_n)$. We have for all $n$, $d(y_n) = f(x_n,y_n) \leqslant f(x_0,y_n) \rightarrow f(x_0,y_0) = d(y_0)$ so by our hypothesis, there exists a compact $K \subset A$ such that for all $n$, $x_n \in K$. Up to extraction $x_n \rightarrow x \in A$ so $d(y_n) = f(x_n,y_n) \rightarrow f(x,y_0) \geqslant d(y_0)$ and for all $n$, $f(x_n,y_n) \leqslant f(x_0,y_n)$ so $f(x,y_0) \leqslant d(y_0)$. It implies that $d(y_n) \rightarrow d(y_0)$.

This is only up to extraction. We proved that $d(y_0)$ is an accumulation point of $(d(y_n))$. If $l$ is an other accumulation point, we have $d(y_n) \rightarrow l$ up to exctraction and up to an other exctration, $d(y_n) \rightarrow d(y_0)$ using the same reasoning as previously. We deduce that $l = d(y_0)$ is the only accumulation point of $(d(y_n))$. Therefore, $d(y_n) \rightarrow d(y_0)$ so $d$ is continuous at $y_0$.

The hypothesis of this proposition don't seem easy to verify but we can give a few sufficient conditions for it. For example,

If for every $K \subset B$ compact, $f_{|A \times K}$ is proper and bounded from below, the hypothesis of the previous proposition are verified for all $y_0 \in B$.

Indeed, if $y_0 \in B$, let $a = \inf_x f(x,y_0)$ which exists by boundedeness from below of $f$ and let $I = [a,a + 1]$, which is compact. $f_{|A \times \{y_0\}}^{-1}(I)$ is therefore compact too by properness of $f_{|A \times \{y_0\}}$. Let $(x_n)$ be a sequence of element of $A$ such that $f(x_n,y_0) \rightarrow a$, which implies that $(x_n,y_0) \in f_{|A \times \{y_0\}}^{-1}(I)$ for $n$ large enough. Up to extraction, $(x_n,y_0)$ converges toward some $(x_0,y_0) \in f_{|A \times \{y_0\}}^{-1}(I)$ and we have $f(x_0,y_0) = a$ so $a$ is a minimum. $d(y_0)$ exists and equals $a$.

Now, if $(x_n,y_n)$ is a sequence such that $y_n \rightarrow y_0$ and $\limsup f(x_n,y_n) \leqslant d(y_0)$, let $K = \{y_n, n \geqslant 1\} \cup \{y_0\}$ which is compact. By hypothesis, $f_{|A \times K}$ is bounded from below so $(f(x_n,y_n))$ is bounded from below too. Let $b = \inf_n f(x_n,y_n)$ and $J = [b,d(y_0) + 1]$. As $\limsup f(x_n,y_n) \leqslant d(y_0)$, for all $n$ large enough, $(x_n,y_n) \in f_{|A \times K}^{-1}(J)$, which is compact by properness of $f_{|A \times K}$. We deduce that for $n$ large enough, $x_n \in p_1(f_{|A \times K}^{-1}(J))$ where $p_1 : A \rightarrow B \rightarrow A$ is the first variable projection. This set is compact, which proves the proposition.

Moreover, in the case that interests you,

If $A$ is the whole space, $g : A \rightarrow \mathbb{R}$ is a $\mathcal{C}^1$ strongly convex function, $h : A \times B \rightarrow \mathbb{R}$ is continuous and bounded from below on every $A \times K$ for $K \subset B$ compact and $f : (x,y) \mapsto g(x) + h(x,y)$, then the hypothesis of the second proposition are verified.

Indeed, if $K \subset B$ is compact, on $A \times K$, $f \geqslant \min g - \min_{A \times K} h > -\infty$.

For the properness, let $I$ be a compact subset of $\mathbb{R}$ and $Q = f_{|A \times K}^{-1}(I) = A \times K \cap f^{-1}(I)$. Let $(x_n,y_n)$ be a sequence of elements of $Q$. $y_n$ converges up to extraction because $K$ is compact. Moreover, $g(x_n) = f(x_n,y_n) - h(x_n,y_n)$ is a bounded sequence because $h$ is bounded on $A \times K$ and $f(x_n,y_n)$ stays in $I$, which is bounded. As strongly convex implies proper, $x_n$ stays in a compact subset of $A$ hence it converges up to extraction. It proves that, up to exctraction, $(x_n,y_n)$ converges, and the limit is obviously in $Q$ because $Q$ is closed.

Notice that we could replace $g$ $\mathcal{C}^1$ strongly convex by $g$ bounded from below and proper, the proof would be the same. Now, let us focus on the smoothness of $d$. It is pretty hard to imagine a sufficient and necessary condition for $d$ to be smooth in the neighborhood of a point, but we can at least give sufficient conditions, especially in the case where the minima of $f$ are only reached once.

If $f$ is $\mathcal{C}^k$ with $2 \leqslant k \in \mathbb{N} \cup \{\infty,\omega\}$ where $\mathcal{C}^\omega$ means real analytic verify the hypothesis of the first proposition at some $y_0 \in B$, if $x_0$ is such that $f(x_0,y_0) = d(y_0)$ and is unique, and if $\partial_x^2f(x_0,y_0)$ is non-degenerate, then $d$ is $\mathcal{C}^k$ in a neighborhood of $y_0$. Moreover, the minimas of $f$ in a neighborhood of $y_0$ are uniquely reached at some $\varphi(y) \in A$ and $\varphi$ is $\mathcal{C}^{k - 1}$.

Let $E$ be the total space in which $A$ is included. $\partial_x^2f(x_0,y_0)$ can be seen as a non-degenerate symmetric bilinear form of $E^2$ (hence it is definite positive because $f(x_0,y_0)$ is a local minimum) or equivalently, as an invertible map $\partial_x^2f(x_0,y_0) : E \rightarrow E^*$ where $E^*$ is the dual space of $E$. As $x_0$ is where $x \mapsto f(x,y_0)$ reaches its minimum, we have $\partial_xf(x_0,y_0) = 0$ so by the implicit function theorem, there is a neighborhood $U \times V$ of $(x_0,y_0)$ such that for all $(x,y) \in U \times V$, $\partial_xf(x,y) = 0 \Leftrightarrow x = \varphi(y)$ for some $\varphi \in \mathcal{C}^{k - 1}(U,V)$ (because $\partial_xf : A \times B \rightarrow E^*$ is $\mathcal{C}^{k - 1}$). In particular, we have $\varphi(y_0) = x_0$.

Up to reducing $U \times V$, we can assume that $\partial_x^2f(x,y)$ is always symmetric definite positive when $(x,y) \in U \times V$. It implies that each $f(\varphi(y),y)$ is a local minima of $x \mapsto f(x,y)$. Let us show it is global and uniquely reached by contradiction, at least in a small enough neighborhood of $y_0$. Assume that there exists a sequence $y_n \rightarrow y_0$ such that for all $n$, there exists an $x_n$ verifing $f(x_n,y_n) \leqslant f(\varphi(y_n),y_n)$ and $x_n \neq \varphi(y_n)$. For $n$ large enough, $y_n \in V$, so $x_n \notin U$ (or else we would have $x_n = \varphi(y_n)$).

$f(\varphi(y_n),y_n) \rightarrow f(\varphi(y_0),y_0) = d(y_0)$ so $\limsup f(x_n,y_n) \leqslant d(y_0)$. By hypothesis, $x_n$ converges toward some $x \in A$ up to extraction. $f(x_n,y_n) \rightarrow f(x,y_0) \geqslant f(x_0,y_0)$ and $f(x_n,y_n) \leqslant f(x_0,y_n) \rightarrow f(x_0,y_0)$ thus $f(x_n,y_n) \rightarrow f(x,y_0) = f(x_0,y_0)$ and by uniqueness of $x_0$, we deduce that $x = x_0$, which contradicts the fact that for all $n$, $x_n \notin U$.

It proves by contradiction that for $y$ closed enough to $y_0$, $d(y) = f(\varphi(y),y) \neq f(x,y)$ if $x \neq \varphi(y)$. In particular, $d$ is $\mathcal{C}^1$ and, $$ \partial_yd(y) = \partial_xf(\varphi(y),y)\partial_y\varphi(y) + \partial_yf(\varphi(y),y) = \partial_yf(\varphi(y),y). $$ $\varphi$ is $\mathcal{C}^{k - 1}$ so is $y \mapsto (\varphi(y),y)$ and $\partial_yf$ is also $\mathcal{C}^{k - 1}$ thus by composition, $\partial_yd : y \mapsto \partial_yf(\varphi(y),y)$ is $\mathcal{C}^{k - 1}$. It proves that $d$ is $\mathcal{C}^k$ in a neighborhood of $y_0$.

And for the case that interests you, we have :

If $f : (x,y) \mapsto g(x) + h(x,y)$ like previously is such that $g$ is $\mathcal{C}^k$ and strongly convex with parameter $m > 0$ and $h$ is $\mathcal{C}^k$ and such that for all $y_0 \subset B$ compact, there exists a constant $m_{y_0} < m$ such that for all $x$, all the eigenvalues of $\partial_x^2h(x,y_0)$ are greater that $-m_{y_0}$, then $f$ verifies the hypothesis of the previous proposition at all $y_0 \in B$. In particular, $d$ is well defined and $\mathcal{C}^k$ on the whole $B$.

All we have left to verify is that for all $y_0$, $x \mapsto f(x,y_0)$ reaches its minimum at only one point $x_0$ and that $\partial_x^2f(x_0,y_0)$ is non-degenerate. In fact, the hypothesis on $h$ clearly implies that $x \mapsto f(x,y_0)$ is strongly convex on $A$, which implies the wanted result.

The only left case is when $f$ is $\mathcal{C}^1$, do we have $d$ $\mathcal{C}^1$ ? I think yes, and the formula $\partial_yd(y) = \partial_yf(\varphi(y),y)$ suggests that $f$ only needs to be continuous with respect to the first variable and $\mathcal{C}^1$ with respect to the second one (plus all the other technical hypothesis).