Problem regarding application of chain rule in differentiation

149 Views Asked by At

I was going through "From Calculus to Cohomology" on reading nice things about said book in Stack exchange posts. However, I have been trumped from the beginning so to speak, with my first halt being the following applications of chain rule and product rule in the following image at this link:

I may be being more than a tad slow here, but would somebody kindly explain the two steps in the picture ?

enter image description here

EDIT:As @giobrach advised below,the condition is being mentioned.It states that partial derivative of f1 w.r.t x2 equals the partial derivative of f2 w.r.t x1.

1

There are 1 best solutions below

6
On BEST ANSWER

Warning: this answer is very long, as it explains every single step in nauseating detail. Be prepared.

That portion of the proof needs five (and a half?) major results from analysis: (a) the rule for differentiating parameter-dependent integrals with respect to a parameter, (b) the linearity of differential operators, (c) the product rule, (d) the chain rule, an important regularity result, and finally (e) the fundamental theorem of calculus.


Parameter-dependent integrals. The first rule can be stated as follows:

Theorem 1. Let $K = [a,b]$ be a compact Riemann-measurable subset of $\mathbb R$ (i.e., a closed interval) and let $\Omega$ be an open subset of $\mathbb R^n$; call $t$ the variable in $[a,b]$ and $\mathbf x = (x_1, \dots, x_n)$ the variables in $\Omega$. Let then $G$ be a continuous real function defined on $[a,b] \times \Omega$, such that its partial derivatives with respect to the $x$ variables exist and are continuous everywhere in $\Omega$ (i.e., it is of class $C^1$ with respect to the $\mathbf x$ variables). Then the function $g : \Omega \to \mathbb R$ given by $$g(\mathbf x)= \int_a^b G(t,\mathbf x)\ dt, \qquad \mathbf x \in \Omega$$ is of class $C^1$, and its partial derivatives are given by the formula $$\frac{\partial g}{\partial x_j}(\mathbf x) = \int_a^b \frac{\partial G}{\partial x_j}(t,\mathbf x)\ dt, \qquad \mathbf x \in \Omega.$$

In the situation described by the proof, we have that $\Omega = U$ is open, and it is a subset of $\mathbb R^2$, so that $\mathbf x = (x_1,x_2)$; furthermore, $[a,b] = [0,1]$ (the domain of the integral that defines $F$), so that $g$ is $F$ while $G$ is given by the formula $$G(t,x_1,x_2) = x_1 f_1(tx_1,tx_2) + x_2 f_2(tx_1,tx_2); \tag 1$$ now if $G$ satisfies the conditions imposed by the statement of Theorem 1, you may surely conclude that $F$ is $C^1$ and $$\frac{\partial F}{\partial x_1}(x_1,x_2) = \int_0^1 \frac{\partial G}{\partial x_1}(x_1,x_2)\ dt = \int_0^1 \frac{\partial}{\partial x_1}[x_1 f_1(tx_1,tx_2) + x_2 f_2(tx_1,tx_2) ] \ dt. \tag 2$$ But does $G$ satisfy these properties? This ultimately comes down to whether $f_1$ and $f_2$ do, so it depends on what properties they were defined to have by the author of your book before stating Theorem 1.4. Let's see why this is so.


The linearity of differential operators. This one is simple and intuitive:

Theorem 2. Let $v, w : \Omega \to \mathbb R$ be differentiable functions defined over the open subset $\Omega$ of $\mathbb R^n$, and let $a,b$ be real numbers. Then the function $c : \Omega \to \mathbb R$ defined by $c(\mathbf x) = a\cdot v(\mathbf x) + b \cdot w(\mathbf x)$ for all $\mathbf x \in \Omega$ is differentiable and its partial derivatives satisfy $$\frac{\partial c}{\partial x_j}(\mathbf x) = a \cdot \frac{\partial v}{\partial x_j}(\mathbf x) + b \cdot \frac{\partial w}{\partial x_j}(\mathbf x).$$

If we define two functions $v, w : [0,1] \times U \to \mathbb R$, with $$\begin{split} v(t,x_1,x_2) &= x_1 f_1(tx_1,tx_2), \\ w(t,x_1,x_2) &= x_2 f_2(tx_1,tx_2), \end{split}$$ for all $t \in [0,1]$ and $(x_1,x_2) \in U$, then we may write $G = 1\cdot v + 1 \cdot w$. In the case that we also know that $v$ and $w$ are differentiable on $[0,1]\times U$, then Theorem 2 holds and $G$ is also differentiable and it satisfies

$$\begin{split} \frac{\partial G}{\partial x_1}(t,x_1,x_2) &= 1 \cdot \frac{\partial v}{\partial x_1}(t,x_1,x_2) + 1\cdot \frac{\partial w}{\partial x_1}(t,x_1,x_2) \\ &=\frac{\partial}{\partial x_1}(x_1 f_1(tx_1, tx_2) ) + \frac{\partial}{\partial x_j}(x_2 f_2(tx_1,tx_2)), \end{split}\tag 3$$


The product rule. Now, as you can see, the functions $v$ and $w$ that we find in equation $(3)$ each contain the product of two simpler functions. There is, indeed, a theorem that helps us in this situation.

Theorem 3. Let $q,r : \Omega \to \mathbb R$ be two differentiable functions defined on an open subset $\Omega$ of $\mathbb R^n$. Then the function $p = q\cdot r : \Omega \to \mathbb R$ is also differentiable and its partial derivatives obey the formula $$\frac{\partial p}{\partial x_j}(\mathbf x) = \frac{\partial q}{\partial x_j}(\mathbf x)\ r(\mathbf x) + q(\mathbf x)\ \frac{\partial r}{\partial x_1}(\mathbf x), \qquad \forall \mathbf x \in \Omega. $$

In this case we may define four functions $q_1,r_1,q_2,r_2 : [0,1]\times U \to \mathbb R$ as $$\begin{split} q_1(t,x_1,x_2) &= x_1, \\ r_1(t,x_1,x_2) &= f_1(tx_1,tx_2), \\ q_2(t,x_1,x_2) &= x_2, \\ r_2(t,x_1,x_2) &= f_2(tx_1,tx_2), \end{split}$$ so that $v = q_1 \cdot r_1$ and $w = q_2 \cdot r_2$. If we know that $q_i,r_i$ are differentiable, then Theorem 3 holds and we also know that $v$ and $w$ are, and that they satisfy $$\frac{\partial v}{\partial x_j} = \frac{\partial q_1}{\partial x_1} \cdot r_1 + q_1 \cdot \frac{\partial r_1}{\partial x_1} \tag{4a}$$ and $$\frac{\partial w}{\partial x_j} = \frac{\partial q_2}{\partial x_1} \cdot r_2 + q_2 \cdot \frac{\partial r_2}{\partial x_1}. \tag{4b}$$ Surely $q_1$ and $q_2$ are differentiable, and we also know their partial w.r.t. $x_1$: they are, respectively, $1$ and $0$ everywhere. The question is whether $r_1$ and $r_2$ are differentiable: to see this we need


The chain rule. This one corresponds to the following result:

Theorem 4. Let $\Omega$ be an open subset of $\mathbb R^n$ and $\Omega'$ be an open subset of $\mathbb R^k$. Let then $\mathbf s : \Omega \to \Omega'$ and $u : \Omega' \to \mathbb R$, and $h : \Omega \to \mathbb R$ be the composite of $\mathbf s$ and $u$, that is, $h = u \circ \mathbf s$. If $s$ is differentiable in $\Omega$ and $u$ is differentiable in $\Omega'$, then $h$ is differentiable and it satisfies $$ \frac{\partial h}{\partial x_j}(\mathbf x) = \sum_{i=1}^k \frac{\partial u}{\partial y_i}(\mathbf s(\mathbf x)) \cdot \frac{\partial s_i}{\partial x_j}(\mathbf x), \qquad \forall x \in \Omega.$$

Let us examine only the case of $r_1$, since the function $r_2$ behaves identically. We may define $\mathbf s : U \to R^2$ such that $$ \begin{split} \mathbf s(t,x_1,x_2) &= (tx_1,tx_2), \end{split}$$ so that indeed $r_1 = f_1(\mathbf s(t,x_1,x_2))$; in this manner, $f_1$ behaves like function $u$ from the statement, and $r_1$ like $h$. If we can show that $\mathbf s$ is differentiable and we know that $f_1$ is too, then Theorem 4 guarantees that $r_1$ is differentiable and that $$\begin{split} \frac{\partial r_1}{\partial x_1}(t,x_1,x_2) &= \frac{\partial f_1}{\partial y_1}(\mathbf s(t,x_1,x_2)) \frac{\partial s_1}{\partial x_1}(t,x_1,x_2) + \frac{\partial f_1}{\partial y_2}(\mathbf s(t,x_1,x_2)) \frac{\partial s_2}{\partial x_1}(t,x_1,x_2) \\ &= \frac{\partial f_1}{\partial y_1}(tx_1,tx_2) \frac{\partial s_1}{\partial x_1}(t,x_1,x_2) + \frac{\partial f_1}{\partial y_2}(tx_1,tx_2) \frac{\partial s_2}{\partial x_1}(t,x_1,x_2) \end{split}\tag 5$$ But we know that $\mathbf s$ is differentiable: indeed $\mathbf s(t,x_1,x_2) = (tx_1,tx_2) = t(x_1,x_2)$, and so it satisfies $$\begin{split} \frac{\partial s_1}{\partial x_1}(t,x_1,x_2) &= \frac{\partial}{\partial x_1}(tx_1) = t, \\ \frac{\partial s_2}{\partial x_1}(t,x_1,x_2) &= \frac{\partial}{\partial x_1}(tx_2) = 0. \end{split} \tag {6}$$ The differentiability of $f_1$ instead depends on how it was defined in Question 1.1 in your book. If this definition implies the differentiability of $f_1$ then you are good to go. Note: it does, because in Question 1.1 it is stated that $f_1,f_2$ are equal to the partial derivatives of a smooth function, which are smooth, and so they are $C^1$, and so they are also differentiable, because of this other very important result:

Lemma. Let $\Omega$ be an open subset of $\mathbb R^n$ and let $\chi : \Omega \to \mathbb R^m$. if $\chi$ is of class $C^1$ in $\Omega$, then it is differentiable in $\Omega$.


Explaining the second equation. Now let's piece everything together. Plugging in $(6)$ into $(5)$ you get $$\frac{\partial r_1}{\partial x_1}(t,x_1,x_2) = \frac{\partial f_1}{\partial y_1}(tx_1,tx_2) \cdot t + \frac{\partial f_1}{\partial y_2}(tx_1,tx_2) \cdot 0 = t \frac{\partial f_1}{\partial y_1}(tx_1,tx_2) \tag {7a}$$ The same, of course, applies to $r_2$: $$\frac{\partial r_2}{\partial x_1}(t,x_1,x_2) = \frac{\partial f_2}{\partial y_1}(tx_1,tx_2) \cdot t + \frac{\partial f_2}{\partial y_2}(tx_1,tx_2) \cdot 0 = t\frac{\partial f_2}{\partial y_1}(tx_1,tx_2) \tag {7b}$$ Going back to $(4a)$ and $(4b)$, we find $$\frac{\partial v}{\partial x_1}(t,x_1,x_2) = 1 \cdot f_1(tx_1,tx_2) + x_1 \cdot \left( t\frac{\partial f_1}{\partial y_1}(tx_1,tx_2) \right) \tag{8a} $$ and $$\frac{\partial w}{\partial x_1}(t,x_1,x_2) = 0 \cdot f_1(tx_1,tx_2) + x_2 \cdot \left( t\frac{\partial f_2}{\partial y_1}(tx_1,tx_2) \right), \tag{8a} $$ so that, finally, $$\begin{split}\frac{\partial G}{\partial x_1}(t,x_1,x_2) &= \frac{\partial v}{\partial x_1}(t,x_1,x_2) + \frac{\partial w}{\partial x_1}(t,x_1,x_2) \\ &= f_1(tx_1,tx_2) + tx_1\frac{\partial f_1}{\partial y_1}(tx_1,tx_2) + tx_2 \frac{\partial f_2}{\partial y_1}(tx_1,tx_2) \end{split} \tag 9$$ But this is exactly what we find within the integral sign in the second equation of your proof! (Ignore the fact that your author is not using $y$'s where I am: he's just being notationally lazy – or I'm being pedantic.)


The third equation. Let us come to the third equation from your proof. There, we can find the total derivative of a function expanded by means of the very same Theorems 2, 3, and 4 that we have used above. Suppose that we want to construct a function $\varphi_{(x_1,x_2)} : \mathbb R \to \mathbb R$ of the variable $t$, which depends on the fixed parameters $x_1$ and $x_2$, chosen such that $(x_1,x_2) \in U$. One way to do so is to define $$\varphi_{(x_1,x_2)}(t) = t \cdot f_1(x_1t,x_2t), \qquad \forall t \in \mathbb R. \tag{10}$$ Now, we know that since $\mathbf s$ is differentiable, then the function $\boldsymbol \sigma_{(x_1,x_2)} : t \mapsto \boldsymbol \sigma_{(x_1,x_2)}(t) = \mathbf s(t,x_1,x_2)$ is also differentiable w.r.t. $t$ (remember: $x_1$ and $x_2$ are now fixed numbers!): its derivative is the constant vector $$\frac{d\boldsymbol \sigma_{(x_1,x_2)}}{dt}(t) = (x_1,x_2). \tag {11} $$ Notice that we may rewrite $\varphi_{(x_1,x_2)}(t)$ as $$\varphi_{(x_1,x_2)}(t) = t\cdot f_1(\boldsymbol \sigma(t)), \qquad \forall t \in \mathbb R; $$ since we have established that $\boldsymbol\sigma_{(x_1,x_2)}$ is differentiable, and since we know $f_1$ is as well, then $f_1 \circ \boldsymbol \sigma_{(x_1,x_2)}$ is differentiable by Theorem 4, and multiplying it by $t$ still produces a differentiable function by Theorem 3. Then $\varphi_{(x_1,x_2)}$ is also differentiable, and it satisfies $$\begin{split} \frac{d\varphi_{(x_1,x_2)}}{dt}(t) &= \frac{dt}{dt}\cdot (f_1\circ\boldsymbol\sigma_{(x_1,x_2)})(t)) + t \cdot \frac{d(f_1 \circ \boldsymbol\sigma_{(x_1,x_2)})}{dt}(t) \\ &= f_1(\boldsymbol\sigma_{(x_1,x_2)}(t)) + t\left(\frac{\partial f_1}{\partial y_1}(\boldsymbol\sigma_{(x_1,x_2)}(t)) \cdot \frac{d \sigma_{(x_1,x_2);1}}{dt}(t) + \frac{\partial f_1}{\partial y_2}(\boldsymbol\sigma_{(x_1,x_2)}(t)) \cdot \frac{d \sigma_{(x_1,x_2);2}}{dt}(t)\right) \\ &= f_1(\boldsymbol\sigma_{(x_1,x_2)}(t)) + t\left(\frac{\partial f_1}{\partial y_1}(x_1t,x_2t) \cdot x_1 + \frac{\partial f_1}{\partial y_2}(x_1t,x_2t) \cdot x_2\right) \\ &= f_1(x_1t,x_2t) + tx_1 \frac{\partial f_1}{\partial y_1}(x_1t,x_2t) + tx_2 \frac{\partial f_1}{\partial y_2}(x_1t,x_2t) \end{split}\tag {12}$$ where we have used the product rule at the first equal sign, the chain rule at the second (with the notation that $\sigma_{(x_1,x_2);i}$ indicates the $i$-th component function of the vector function $\boldsymbol\sigma_{(x_1,x_2)}$), equation $(11)$ at the third, and just some algebra together with the definition of $\boldsymbol \sigma_{(x_1,x_2)}$ at the fourth. You can see that this is exactly the content of the third equation in your proof!


Toward the fourth equation. Now we can see that $\partial G/\partial x_1$ (equation $(9)$) and $d\varphi_{(x_1,x_2)}/dt$ (equation $(12)$) are very similar in shape: their first two terms are identical, but the third term of $\partial G/\partial x_1$ contains the partial of $f_2$ w.r.t. $y_1$, while the third term of $d\varphi_{(x_1,x_2)}/dt$ contains the partial of $f_2$ w.r.t. $y_2$ instead. So we may say that $\partial G/\partial x_1$ is equal to $d\varphi_{(x_1,x_2)}/dt$ on the condition that we subtract the wrong term and add the correct one: $$\begin{split} \frac{\partial G}{\partial x_1}(t,x_1,x_2) &= \frac{d \varphi_{(x_1,x_2)}}{dt}(t) - tx_2 \frac{\partial f_1}{\partial y_2}(tx_1,tx_2) + tx_2 \frac{\partial f_2}{\partial y_1}(tx_1,tx_2) \\ &= \frac{d \varphi_{(x_1,x_2)}}{dt}(t) + tx_2 \left(\frac{\partial f_2}{\partial y_1}(tx_1,tx_2) - \frac{\partial f_1}{\partial y_2}(tx_1,tx_2)\right). \end{split} \tag{13}$$ We may now substitute this formula into equation $(2)$ to obtain the first equal sign of the last equation in your proof. If you go back to assumption $(2)$ from your book, it states that the partial derivatives in $(13)$ cancel out, leaving just the total derivative w.r.t. $t$: $$\frac{\partial F}{\partial x_1}(x_1,x_2) = \int_0^1 \left[\frac{d \varphi_{(x_1,x_2)}}{dt}(t)\right]\ dt . \tag{14}$$ In order to proceed, we need one last piece:


e. The (second) fundamental theorem of calculus. It states

Theorem 5. Let $\psi : [a,b] \to \mathbb R$ be of class $C^1$. Then $$\int_{t_0}^{t_1} \psi'(t)\ dt = \psi(t_1) - \psi(t_0), \qquad \forall t_1 \in [a,b], $$ and in particular $$\int_a^b \psi'(t)\ dt = \psi(b) - \psi(a). $$

In our situation we have a function, $ \varphi_{(x_1,x_0)}$, that is surely $C^1$ (as we have discussed earlier), so the theorem applies with $\psi = \varphi_{(x_1,x_0)}$. Going back to $(14)$, we get: $$\frac{\partial F}{\partial x_1}(x_1,x_2) = \int_0^1 \left[\frac{d \varphi_{(x_1,x_2)}}{dt}(t)\right]\ dt = \varphi_{(x_1,x_2)}(t)\Big|_0^1. \tag{15}$$

The home run. By using the definition of $\varphi_{(x_1,x_2)}$ that we gave at $(10)$, we obtain from $(15)$ $$\frac{\partial F}{\partial x_1}(x_1,x_2) = [t f_1(tx_1,tx_2)]\Big|_0^1 = 1 \cdot f_1(1 \cdot x_1,\cdot x_2) - 0 \cdot f_1(0\cdot x_1,0\cdot x_2) = f_1(x_1,x_2). $$