Finding an expression for $\partial_j \partial_i g$ from implicit function theorem

99 Views Asked by At

The existence of such function $g$ is guaranteed for by the well-known

Theorem. (implicit function) Let $\Omega$ be an open subset of $\mathbb R^{n+1}$, and let $f : \Omega \to \mathbb R$ have $\mathcal C^2$ regularity. Let the $n$-dimensional variable be denoted with $x$, and let the leftover variable be denoted with $y$. Let then $(x_0, y_0)$ be a point in $\Omega$ such that

  1. $f$ vanishes at $(x_0,y_0)$;

  2. $\partial_y f(x_0,y_0) \neq 0$.

Then there exist two open sets $\Omega' \ni x_0$ and $\Omega'' \ni y_0$ such that

  1. First of all, $\Omega' \times \Omega'' \subseteq \Omega$;

  2. For all $x \in \Omega'$ there exists one and only one $y \in \Omega''$ such that $f(x,y) = 0$;

  3. The function $g : \Omega' \to \Omega''$ that assigns to every $x$ such $y$ is also $\mathcal C^2$.

First derivatives. Seen the level of regularity that $g$ is proved to achieve, it is natural to ask how the first partials of $g$ would be expressed in terms of the partials of $f$. That can be done by noticing that, for all $x \in \Omega'$, by construction, $$f(x,g(x)) = 0 \tag{1}$$ Thus, if we define the function $\mathbf s : \Omega' \to \Omega$ such that $\mathbf s(x) = (x,g(x))$, we can use the chain rule to obtain $$D(f \circ \mathbf s)(x) = Df(\mathbf s(x)) D\mathbf s(x) = \mathbf 0 \tag{2}$$ where $D\phi$ indicates the Jacobian matrix of a function $\phi$ and $\mathbf 0$ represents the $1 \times n$ identically $0$ matrix. Let us call $D_x f$ the restriction of the Jacobian of $f$ (which is a row vector) to the first $n$ variables. We also see trivially that $$D\mathbf s(x) = \begin{bmatrix} & & \\ & \mathrm{Id}_n & \\ & & \\ \hline \partial_{1}g(x) & \cdots & \partial_{n} g(x) \end{bmatrix} $$ (notice we abbreviate the first $n$ partial derivative symbols to $\partial_1, \dots \partial_n$) so that equation $(2)$ becomes $$D_x f(\mathbf s(x)) + \partial_y f(\mathbf s(x))Dg(x) = \mathbf 0 $$ We can solve for $Dg$ to find $$Dg(x) = - \frac{D_x f(\mathbf s(x))}{\partial_y f(\mathbf s(x))} = - \frac{D_x f(x,g(x))}{\partial_y f(x,g(x))} \tag{3a}$$ which makes sense due to the second hypothesis in the theorem statement and the continuity of $\partial_y f$. In terms of single partial derivatives, we have $$\partial_i g (x) = - \frac{\partial_i f(x,g(x))}{\partial_y f(x,g(x))} \tag{3b}$$


Second derivatives. The truly hard problem, though, is obtaining an expression for the second partial derivatives of $g$ (both mixed and pure). I've defined the function $\psi : \Omega' \times \Omega'' \to \mathbb R$ such that $$\psi(x,y) \doteq -\frac{1}{\partial_y f(x,y)} $$ so that $$D\psi = - \frac{D\left(\partial_y f\right)}{\left(\partial_y f\right)^2} = - \frac{1}{\left(\partial_y f\right)^2} \begin{bmatrix} \partial_1 \partial_y f & \cdots & \partial_n \partial_y f & \partial_y^2 f\end{bmatrix}$$

This way, I can rewrite $Dg$ as $$Dg : \Omega' \to \mathbb R \qquad Dg = (\psi D_x f) \circ \mathbf s $$ so that, by the chain rule and the product rule, $$D^2 g(x) = D(\psi D_x f)(\mathbf s(x)) D\mathbf s(x) = [\psi D(D_x f) + D_x f D\psi ](\mathbf s(x))D\mathbf s(x) $$

This expression "simplifies" to $$\begin{split} D^2g(x) &= \left(- \frac{1}{(\partial_y f)^2}\right)\Bigg(\partial_y f \begin{bmatrix} \partial_1^2f & \cdots & \partial_1\partial_n f \\ \vdots & \ddots & \vdots \\ \partial_n \partial_1 f & \cdots & \partial_n^2 f\end{bmatrix} + \partial_y f \begin{bmatrix} \partial_1 g \partial_1 \partial_y f & \cdots & \partial_n g \partial_1 \partial_y f \\ \vdots & \ddots &\vdots \\ \partial_1 g \partial_n \partial_y f & \cdots & \partial_n g \partial_n \partial_y f \end{bmatrix} \\ &\quad+ \begin{bmatrix} \partial_1 f \partial_1 \partial_y f & \cdots & \partial_n f \partial_1 \partial_y f \\ \vdots & \ddots &\vdots \\ \partial_1 f \partial_n \partial_y f & \cdots & \partial_n f \partial_n \partial_y f \end{bmatrix} + \partial_y^2 f \begin{bmatrix} \partial_1 g\partial_1 f & \cdots & \partial_n g\partial_1 f \\ \vdots & \ddots & \vdots \\ \partial_1 g\partial_n f & \cdots & \partial_n g \partial_n f \end{bmatrix} \Bigg) \end{split}$$ where the whole left-hand side is evaluated at $\mathbf s(x)$.

Is this expression correct? If so, can it be further simplified?

2

There are 2 best solutions below

0
On BEST ANSWER

it looks fine to me. A slightly simpler way to do it. Starting from $f(x,g(x))=0$, differentiate with respect to $x_i$ as you did and get $$\partial_i f(x,g(x))+\partial_y f(x,g(x)) \partial_i g(x)=0.$$ Now differentiate with respect to $x_j$ to get $$\partial_j\partial_i f(x,g(x))+\partial_y\partial_i f(x,g(x))\partial_jg(x)+\partial_j\partial_y f(x,g(x)) \partial_i g(x)+\partial_y^2 f(x,g(x)) \partial_i g(x)\partial_j g(x)+\partial_y f(x,g(x)) \partial_j\partial_i g(x)=0.$$ Hence,$$-\frac1{\partial_y f(x,g(x))}\left[\partial_j\partial_i f(x,g(x))+\partial_y\partial_i f(x,g(x))\partial_jg(x)+\partial_j\partial_y f(x,g(x)) \partial_i g(x)+\partial_y^2 f(x,g(x)) \partial_i g(x)\partial_j g(x)\right]= \partial_j\partial_i g(x).$$ It's simpler to use the product rule than the quotient rule.

0
On

With a little bit of work, following Gio67's input, I managed to answer my own question in a much more general setting. In the end, some nice symmetries arise from the symbolic mess, possibly even hinting toward a formula for higher-order derivatives. The level of generality required switching to the Leibniz notation for derivatives, $$\partial_j = \frac{\partial}{\partial x^j} $$


The theorem. I'll restate the theorem in its full generality:

Theorem (implicit function, generalized). Let $\Omega​$ be an open subset of $\mathbb R^n \times \mathbb R^m​$, and let $\mathbf f : \Omega \to \mathbb R^m​$ be of class $\mathcal C^k​$, with $k \geq 1$. Calling $\mathbf x​$ the variable in $\mathbb R^n​$ and $\mathbf y​$ the variable in $\mathbb R^m​$, let $(x_0, y_0)​$ be an element in $\Omega​$ such that

  1. The function $\mathbf f$ vanishes $(x_0,y_0)$;
  2. The jacobian matrix of $\mathbf f$ restricted to the variable $\mathbf y$, namely $D_\mathbf y \mathbf f$, is non-singular in $(x_0,y_0)$.

Then there exist an open neighborhood $\Omega'$ of $x_0$ and an open neighborhood $\Omega''$ of $y_0$ such that

  1. First of all, $\Omega' \times \Omega'' \subseteq \Omega$;
  2. Forall $\mathbf x \in \Omega'$ there exists one and only one $\mathbf y \in \Omega''$ such that $\mathbf f(\mathbf x,\mathbf y) = \mathbf 0$;
  3. The function $\mathbf g : \Omega' \to \Omega''$ that maps every $\mathbf x$ to such $\mathbf y$ is also of class $\mathcal C^k$.

First derivatives. The quest is to find the partial derivative of $\mathbf g$ with respect to an arbitrary component $x^i$ of variable $\mathbf x$, where of course $1 \leq i \leq n$. Let us then pick such an $i$.

By construction, we have in $\Omega' \times \Omega''$ the following identity: $$ \mathbf f(\mathbf x,\mathbf g(\mathbf x)) = \mathbf 0 \tag{1} $$ Now call $\mathbf s : \Omega' \to \mathbb R^n \times \mathbb R^m$ the function that associates to each $\mathbf x$ the couple $(\mathbf x,\mathbf g(\mathbf x))$ and rewrite $(1)$ as $$ (\mathbf f \circ \mathbf s)(\mathbf x) = \mathbf 0 $$ Let us differentiate both sides with respect to $x^i$ and expand the left-hand side through the multivariable chain rule (which we can apply, since all functions involved, one by assumption and the other as a consequence of the theorem, is at least $\mathcal C^1$ in an open set, and as such it is differentiable). We obtain $$ \begin{split} \frac{\partial(\mathbf f \circ \mathbf s)}{\partial x^i}(\mathbf x) &= \frac{\partial \mathbf 0}{\partial x^i} = \mathbf 0\\ \frac{\partial \mathbf f}{\partial x^i}(\mathbf x, \mathbf g(\mathbf x)) + \sum_{k=1}^m\frac{\partial \mathbf f}{\partial y^k}(\mathbf x,\mathbf g(\mathbf x))\frac{\partial g^k}{\partial x^i}(\mathbf x) &= \mathbf 0 \\ D_\mathbf y \mathbf f (\mathbf x,\mathbf g(\mathbf x)) \frac{\partial \mathbf g}{\partial x^i}(\mathbf x) &= - \frac{\partial \mathbf f}{\partial x^i}(\mathbf x,\mathbf g(\mathbf x)) \end{split} \tag{2} $$ Multiplying both sides from the left with the inverse matrix of $D_\mathbf y \mathbf f$ (which exists because the matrix is regular by assumption), we have the desired formula. From it one can naturally derive the jacobian matrix of $\mathbf g$.


Second derivatives. In order to compute higher order derivatives of $\mathbf g$, say of order $k\geq 1$, it is necessary to assume that $\mathbf f$ is at least $\mathcal C^k$ so that the essential theorems apply. Let us then assume $\mathbf f \in \mathcal C^2(\Omega,\mathbf R^m)$ and derive the formula for the arbitrary second partial derivative of $\mathbf g$, which we'll see as the first partial derivative of the previously computed $\dfrac{\partial \mathbf g}{\partial x^i}$ taken along an arbitrary direction $x^j$, with $1 \leq j \leq n$.

To do so, it is convenient to revert to the summation form of $(2)$ instead of its matrix form. First of all, observe the validity of the formula $$ \begin{split} \frac{\partial(\mathbf F \circ \mathbf s)}{\partial x^j}(\mathbf x) &= \sum_{p=1}^{n+m} \frac{\partial \mathbf F}{\partial z^p}(\mathbf s(\mathbf x))\frac{\partial s^p}{\partial x^j}(\mathbf x) \\ &=\frac{\partial \mathbf F}{\partial x^j}(\mathbf x, \mathbf g(\mathbf x)) + \sum_{p=1}^m \frac{\partial \mathbf F}{\partial y^p}(\mathbf x,\mathbf g(\mathbf x)) \frac{\partial g^p}{\partial x^j}(\mathbf x)\\ \end{split} \tag{3} $$ for any function $\mathbf F$ in the same conditions as $\mathbf f$, and any index $j$ in the same conditions as $i$. The second step is made possible by the definition of $\mathbf s$, that implies $$ \frac{\partial s^p}{\partial x^j}(\mathbf x) = \begin{cases} \delta_{jp} & 1 \leq p \leq n; \\ \dfrac{\partial g^{(p-n)}}{\partial x^j}(\mathbf x) & n+1 \leq p \leq n+m\end{cases} $$ Let us pick an arbitrary index $j$, and rewriting $(2)$ as $$ \sum_{k=1}^m\frac{\partial \mathbf f}{\partial y^k}(\mathbf x,\mathbf g(\mathbf x))\frac{\partial g^k}{\partial x^i}(\mathbf x) = - \frac{\partial \mathbf f}{\partial x^i}(\mathbf x, \mathbf g(\mathbf x)) $$ we differentiate both sides with respect to $x^j$ to get, for the linearity of the derivative, $$ \sum_{k=1}^m \frac{\partial}{\partial x^j}\left[\frac{\partial \mathbf f}{\partial y^k}(\mathbf x,\mathbf g(\mathbf x))\frac{\partial g^k}{\partial x^i}(\mathbf x)\right] = - \frac{\partial}{\partial x^j} \left[\frac{\partial \mathbf f}{\partial x^i}(\mathbf x, \mathbf g(\mathbf x)) \right] $$ It is now necessary to apply the chain rule to the right-hand side, and the same rule together with the product rule to the left-hand side. Thus we obtain $$ \begin{split} \sum_{k=1}^m \left[\frac{\partial}{\partial x^j} \left(\frac{\partial \mathbf f}{\partial y^k}(\mathbf x,\mathbf g(\mathbf x)) \right) \frac{\partial g^k}{\partial x^i}(\mathbf x) + \frac{\partial \mathbf f}{\partial y^k}(\mathbf x,\mathbf g(\mathbf x)) \frac{\partial^2 g^k}{\partial x^j \partial x^i}(\mathbf x)\right] =- \frac{\partial}{\partial x^j} \left[\frac{\partial \mathbf f}{\partial x^i}(\mathbf x, \mathbf g(\mathbf x)) \right] \end{split} $$ We can now employ $(3)$ first on the left-hand side, choosing $\mathbf F \equiv \dfrac{\partial \mathbf f}{\partial y^k}$, and then on the right-hand side, adopting instead $\mathbf F \equiv \dfrac{\partial \mathbf f}{\partial x^i} $: $$ \begin{split} \sum_{k=1}^m \left[\left(\frac{\partial^2 \mathbf f}{\partial x^j \partial y^k}(\mathbf x,\mathbf g(\mathbf x)) + \sum_{p=1}^m \frac{\partial^2 \mathbf f}{\partial y^p \partial y^k}(\mathbf x, \mathbf g(\mathbf x)) \frac{\partial g^p}{\partial x^j}(\mathbf x)\right) \frac{\partial g^k}{\partial x^i}(\mathbf x) \right]\ + \\ +\ \sum_{k=1}^m \frac{\partial \mathbf f}{\partial y^k}(\mathbf x, \mathbf g(\mathbf x)) \frac{\partial^2 g^k}{\partial x^j \partial x^i}(\mathbf x) = - \frac{\partial^2 \mathbf f}{\partial x^j \partial x^i}(\mathbf x,\mathbf g(\mathbf x)) - \sum_{q=1}^m \frac{\partial^2 \mathbf f}{\partial y^q \partial x^i}(\mathbf x,\mathbf g(\mathbf x)) \frac{\partial g^q}{\partial x^j}(\mathbf x) \end{split} $$ Simplifying, we isolate the second derivatives of $\mathbf g$ to find $$ \begin{split} &\sum_{k=1}^m \frac{\partial \mathbf f}{\partial y^k}(\mathbf x, \mathbf g(\mathbf x)) \frac{\partial^2 g^k}{\partial x^j \partial x^i}(\mathbf x) = - \frac{\partial^2 \mathbf f}{\partial x^j \partial x^i}(\mathbf x,\mathbf g(\mathbf x)) - \sum_{k=1}^m\frac{\partial^2 \mathbf f}{\partial x^j \partial y^k}(\mathbf x,\mathbf g(\mathbf x))\frac{\partial g^k}{\partial x^i}(\mathbf x) \\ &- \sum_{q=1}^m \frac{\partial^2 \mathbf f}{\partial y^q \partial x^i}(\mathbf x,\mathbf g(\mathbf x)) \frac{\partial g^q}{\partial x^j}(\mathbf x) - \sum_{k=1}^m \sum_{p=1}^m \frac{\partial^2 \mathbf f}{\partial y^p \partial y^k}(\mathbf x, \mathbf g(\mathbf x)) \frac{\partial g^p}{\partial x^j}(\mathbf x)\frac{\partial g^k}{\partial x^i}(\mathbf x) \end{split} \tag{4} $$ Notice that $(4)$ can be written in matrix form as $$ \begin{split} D_\mathbf y \mathbf f(\mathbf x,\mathbf g(\mathbf x)) \frac{\partial^2 \mathbf g}{\partial x^j \partial x^i}(\mathbf x) = &- \frac{\partial^2 \mathbf f}{\partial x^j \partial x^i}(\mathbf x,\mathbf g(\mathbf x)) \\ &- D_\mathbf y \left(\frac{\partial \mathbf f}{\partial x^j}\right)(\mathbf x,\mathbf g(\mathbf x)) \frac{\partial \mathbf g}{\partial x^i} (\mathbf x) \\ &- D_\mathbf y \left(\frac{\partial \mathbf f}{\partial x^i}\right)(\mathbf x,\mathbf g(\mathbf x)) \frac{\partial \mathbf g}{\partial x^j} (\mathbf x) \\ &- \left[\frac{\partial \mathbf g}{\partial x^j} (\mathbf x)\right]^\top D_\mathbf y^2\mathbf f(\mathbf x, \mathbf g(\mathbf x)) \left[\frac{\partial \mathbf g}{\partial x^i} (\mathbf x)\right] \end{split} \tag{5} $$ So we have found a formula for an arbitrary second derivative of $\mathbf g$ (be it mixed or pure), from which one can of course derive the hessian of the function.

Notice the nice symmetry that arises in $(5)$: it is probably a hint toward a more general formula for the $k$-th derivative of $\mathbf g$, which, seen the level of symbolic obscurity already achieved, I'd expect would be very complicated.