Hessian matrix of the function defined with Implicit function theorem

1k Views Asked by At

Let $x=(x_1,...,x_n) \in \mathbb{R}^n, y\in \mathbb{R}$ and let $F(x,y)=F(x_1,...,x_n,y) \in C^2(\mathbb{R}^{n+1})$. Suppose we have all the hypothesis for the existence of the function $f(x)=y$ implicity defined by $F$ through the equation $F(x,y)=0$.

The first derivative is characterize with $\dfrac{\partial f}{\partial x_i}(x)=- \dfrac{F_{x_i}(x,f(x))}{F_y(x,f(x))}$.

How can characterize the Hessian way $D^2f$ is some similar ?

P.S. Not the a general second derivative $D_{ij}f$, but the matrix Hessian!

1

There are 1 best solutions below

1
On

For everything which follows, my go-to reference is Loomis and Sternberg's book on Advanced Calculus, and Henri Cartan's book on Differential Calculus. First let me establish some basic notational conventions/definitions/basic properties so that we're on the same page.

Throughout, we shall let $U,V,W$ etc mean (finite-dimensional say) normed vector spaces over $\Bbb{R}$. All maps in question shall be assumed to be $C^{\infty}$ for simplicity, because I don't feel like keeping track of the exact smoothness criteria ($C^2$ or twice-Frechet differentiable will probably suffice though)

  • Given a map $f: U \to V$, its (Frechet) derivative at a point $x \in U$ is a linear map $Df_x: U \to V$ whose defining property is that \begin{align} \lim_{h \to 0} \dfrac{\lVert f(x+h) - f(x) - Df_x(h)\rVert}{\lVert h \rVert} = 0. \end{align}

  • Note that for any $h \in U$, $Df_x(h)$ is an element of $V$. One can prove (a simple exercise using the definitions and chain rule) that $Df_x(h) = \dfrac{d}{dt}\bigg|_{t=0}f(x+th)$. This latter is what we might call the directional derivative of $f$ at the point $x$ in the direction of $h$. The notation I use for this is $(D_hf)(x)$. So, it's kind of like a role-reversal: $Df_x(h) = (D_hf)(x)$. In particular, if $U = \Bbb{R}^n$ and $V = \Bbb{R}$, then by letting $e_i = (0, \dots 1, \dots 0)$, with $1$ in the $i^{th}$ slot, we have that \begin{align} Df_x(e_i) = \dfrac{d}{dt}\bigg|_{t=0}f(x+te_i) = (\partial_if)(x) \end{align} is the usual $i^{th}$ partial derivative of $f$ at the point $x$.

  • Now, we define the second derivative as follows: note that for each $x \in U$, $Df_x \in \text{Hom}(U,V)$. In other words, we have a map $Df: U \to \text{Hom}(U,V)$, and notice that both $U$ and $\text{Hom}(U,V)$ are normed vector spaces (the latter can be given the operator norm). So, we are able to talk about the derivative of $Df$ at a point $x$, as usual. So, we denote $D(Df) := D^2f$. Now, strictly speaking, if we follow the definitions, we have that for $x \in U$, $D^2f_x \equiv D(Df)_x$ is a linear map from $U$ into $\text{Hom}(U,W)$; i.e $D^2f_x \in \text{Hom} \left( U, \text{Hom}(U,W)\right)$. But basic linear algebra gives us a canonical isomorphism between $\text{Hom} \left( U, \text{Hom}(U,W)\right) \to \text{Hom}^2(U; V)$, which is the space of bilinear maps $U \times U \to V$. This isomorphism is simply given as: \begin{align} T \mapsto \bigg( (\xi, \eta) \in U \times U \mapsto \left(T(\xi) \right) [\eta] \in V \bigg) \tag{$*$} \end{align} Hence, we regard $D^2f_x$ as a bilinear map $U \times U \to V$.

  • Finally, we need the notion of a partial derivative. This is defined very similarly to the case we know. Given a map $F: U_1 \times U_2 \to V$ (in your case $U_1 = \Bbb{R}^n$, the space of $x$'s, and $U_2 =\Bbb{R}$, the space of $y$'s, and $V = \Bbb{R}$), and a point $(x_0,y_0) \in U_1 \times U_2$, we define $\dfrac{\partial F}{\partial x}\bigg|_{(x,y)}$ to be the derivative of the map $x\mapsto F(x,y_0)$ at the point $x_0$. I.e if we fix $y_0$, we get a function $F(\cdot, y_0)$ which maps $U_1 \to V$; it is this function which we are differentiating at the point $x_0$. If you want to think in terms of matrices, let's consider the special case $U_1 = \Bbb{R}^n$, $U_2 = V = \Bbb{R}$. Then, $DF_{(x,y)}$ is a linear map $\Bbb{R}^n \times \Bbb{R} \to\Bbb{R}$, as such its matrix representation is of size $1 \times (n+1)$. In this context, the matrix representation of $\dfrac{\partial F}{\partial x}\bigg|_{(x,y)} : \Bbb{R}^n \to \Bbb{R}$ will be the first $1 \times n$ block, and the matrix representation of $\dfrac{\partial F}{\partial y}\bigg|_{(x,y)}: \Bbb{R} \to \Bbb{R}$ will be the last $1 \times 1$ block of $DF_{(x,y)}$

  • More generally, given normed vector spaces $U_1, \dots, U_n, V$, and a map $F: U_1 \times \dots U_n \to V$, we define $\dfrac{\partial F}{\partial x^i}\bigg|_{(a_1, \dots, a_n)}$ to be the derivative of the function $x \mapsto F(a_1, \dots, a_{i-1}, x, a_{i+1}, \dots a_n)$, which maps $U_i \to V$, at the point $a_i$. So, notice this is identical to how we learn the usual partial derivatives; we keep all but one variable fixed, and then differentiate that function of one-variable (it's just that in this case, our variables come from a certain normed-vector space).

  • Finally, we need the chain rule, which says that $D(\phi \circ \psi)_x = D\phi_{\psi(x)} \circ D\psi_x$.


Now, we can start addressing your question. Let $U_1, U_2, V$ be given normed vector spaces. Let $F: U_1 \times U_2 \to V$ and $f: U_1 \to U_2$ be smooth maps, and define a new map $g: U_1 \to V$ by \begin{align} g(x) := F(x, f(x)). \end{align} In the situation of the implicit function theorem, we will have that $g=0$ identically.

Let's calculate using the chain rule what $Dg_x$ will be: \begin{align} Dg_x &= \dfrac{\partial F}{\partial x} \bigg|_{(x,f(x))} + \dfrac{\partial F}{\partial y} \bigg|_{(x,f(x))} \circ Df_x. \end{align} As a sanity check, notice that this is an equality of linear operators in the space $\text{Hom}(U_1, V)$. $\dfrac{\partial F}{\partial x} \bigg|_{(x,f(x))} \in \text{Hom}(U_1, V)$ and $\dfrac{\partial F}{\partial y} \bigg|_{(x,f(x))} \in \text{Hom}(U_2, V)$ while $Df_x \in \text{Hom}(U_1, U_2)$, so the composition makes sense, and the addition also makes sense. Notice that $g = 0$, so $Dg_x = 0$. This is why we get \begin{align} Df_x &= - \left(\dfrac{\partial F}{\partial y} \bigg|_{(x,f(x))} \right)^{-1} \circ \dfrac{\partial F}{\partial x} \bigg|_{(x,f(x))} \end{align} (compare this with the equality you've written down, and convince yourself they're the same thing).

Next, let's compute $D^2g_x[h]$, for $h \in U_1$, so that $D^2g_x[h] \in \text{Hom}(U_1, V)$. Note that $Dg_x$ is a sum of two terms, and in the second term, there is a compoition $\circ$ of functions of $x$. However, composition is a bilinear operation, which means it's like a "generalized product" of functions; therefore the product rule can be applied. Now, it's going to get slightly confusing, because we have to keep track of the point where we're taking derivatives, and keep track of where everything lives.

Ok, so for $h \in U_1$, we have (all these equalities are in the space $\text{Hom}(U_1, V)$)

\begin{align} D^2g_x[h] &= \left(\dfrac{\partial^2 F}{\partial x^2} \bigg|_{(x,f(x))}[h] + \dfrac{\partial^2 F}{\partial y\partial x}\bigg|_{(x,f(x))}[Df_x(h)]\right) \\ &+ \left( \dfrac{\partial^2 F}{\partial x\partial y}\bigg|_{(x,f(x))}[h] + \dfrac{\partial^2 F}{\partial y^2}\bigg|_{(x,f(x))}[Df_x(h)] \right) \circ Df_x \\ &+\dfrac{\partial F}{\partial y} \bigg|_{(x,f(x))} \circ \left( D^2f_x[h] \right) \tag{$\ddot{\smile}$} \end{align} Here the first line is what you get by differentiating $\dfrac{\partial F}{\partial x}\bigg|_{(x,f(x))}$, and the second and third line are what you get by applying product and chain rule to $\dfrac{\partial F}{\partial y} \bigg|_{(x,f(x))} \circ Df_x$. As I mentioned above, the equality in $(\ddot{\smile})$ is as elements in $\text{Hom}(U_1, V)$. Which means, we can feed it another vector of $U_1$ to get an element of $V$. I'll now also make use of the canonical isomorphism described in $(*)$.

So, for all $\xi, \eta \in U_1$, we have (as equality of elements in $V$): \begin{align} D^2g_x[\xi, \eta] &= \dfrac{\partial^2 F}{\partial x^2} \bigg|_{(x,f(x))}[\xi, \eta] + \dfrac{\partial^2 F}{\partial y\partial x} \bigg|_{(x,f(x))}[Df_x(\xi), \eta] \\ &+ \dfrac{\partial^2 F}{\partial x\partial y} \bigg|_{(x,f(x))}[\xi, Df_x(\eta)] + \dfrac{\partial^2 F}{\partial y^2} \bigg|_{(x,f(x))}[Df_x(\xi), Df_x(\eta)] \\ &+ \dfrac{\partial F}{\partial y} \bigg|_{(x,f(x))}\left(D^2f_x[\xi, \eta]\right). \end{align}

Now, if you recall that $D^2g_x = 0$, and recall that by hypothesis $\dfrac{\partial F}{\partial y} \bigg|_{(x,f(x))}$ is invertible, then by rearranging, we find that \begin{align} D^2f_x[\xi, \eta] &= - \left( \dfrac{\partial F}{\partial y}\bigg|_{(x,f(x))}\right)^{-1} \left(\dfrac{\partial^2 F}{\partial x^2} \bigg|_{(x,f(x))}[\xi, \eta] + \dfrac{\partial^2 F}{\partial y\partial x} \bigg|_{(x,f(x))}[Df_x(\xi), \eta]\right) \\ &+ \left( \dfrac{\partial F}{\partial y}\bigg|_{(x,f(x))}\right)^{-1} \left(\dfrac{\partial^2 F}{\partial x\partial y} \bigg|_{(x,f(x))}[\xi, Df_x(\eta)] + \dfrac{\partial^2 F}{\partial y^2} \bigg|_{(x,f(x))}[Df_x(\xi), Df_x(\eta)]\right) \end{align}

Of course, we could "factor" the $\left( \dfrac{\partial F}{\partial y}\bigg|_{(x,f(x))}\right)^{-1}$, but there's simply no space to type it out in a long formula, so here it is. If you want to think in terms of matrices, then you have to multiply various sub-blocks together, and take transposes appropriately etc.

I suggest you try to compare this compuation with the standard one you would perform to calculate $(\partial_{ij}f)(x)$, so that you can see what's going on.