Decomposition of a function and chain rule.

286 Views Asked by At

This question is about the basic chain rule (and I think of it when I read about calculation of variation in defining distance in manifold using usual Riemannian metrics) and is related to the another (temporarily deleted) post https://math.stackexchange.com/q/3769640/577710 I cite it here for my reference, as a reminder of the oringinal question.

The context of the question is as follows: it seems Riemannian metrics are defined as a kind of inner product or 2-tensor so that we can define inner product and norm of tangent vectors, particularly ones along the curve segment (with two ends $p, q$ fixed) whose length is used to define the distance between any two points $p, q$ in $M$.

When we calculate the length of the shortest curve $\gamma$ between $p, q$ in $\mathbb{R}^2$, say $\gamma={(t, f(t))}$, using the usual metric, $L_\gamma=\int \sqrt{\gamma_1'(t)^2+\gamma_2'(t)^2} =\int \sqrt{1+(f'(t))^2}dt$, we may define $F(t, f(t), f'(t))=1+(f'(t))^2$.


My question is,

  1. in my eyes, the three 'independent' variables of $F$ are obvious not independent, then why we define an $F$ as such, instead of defining $F$ to have less variables? Is it, for example, just for the convenience of calculation?
  2. And even if independent variables are not independent, we can still use chain rule to calculate $dF/dt$, i.e. $$\frac{dF}{dt}=\frac{\partial F}{\partial t}+\frac{\partial F}{\partial f}\frac{df}{dt}+\frac{\partial F}{\partial f'}\frac{d(f')}{dt}?$$

If we think further, the 2nd questions can be broken down to two more fundamental aspects.

2-1. Actually that practice seems common when we decompose a function into a composition of functions, for example, $r=1$ is the radius of a unit circle, we can decompose $r$ into $r=\sqrt{x^2+y^2}$ and $x=\cos \theta, y =\sin \theta$, where $r(x,y)$ is a function of two 'dependent' variables. And using the chain rule we get $$\frac{dr}{d\theta}=\frac{\partial r}{\partial x}\frac{dx}{d\theta}+\frac{\partial r}{\partial y}\frac{dy}{d\theta}=-\cos \theta\sin\theta+\cos \theta\sin\theta=0.$$ So an aspect of the 2nd question may be restated as follows: can we always decompose a function into the composition of a function of 'dependent' variables and some other functions and still use the chain rule?

2-2. We notice that $F$ here is decomposed into $f$ and $f'$, which are obviously more 'dependent' than normal 'dependent' variables like the above $x$ and $y$. It causes some convolution. I will use an example to illustrate the point.

$h=x^2+2x, u=x^2, v=2x$, and so $u'=v$, if so there is obvious not a single way to write h as a function of $u$ and $v$ (similarly there can be more than a way to write $F$ as a function of $t, f(t), f'(t)$), as (1) algebraic expressions of $u, v$ (2) as differential and integral equations of $u, v$, e.g. $$h=u+v, h=v^2/4+v, h=(\int v)+v, h=u+u',h=v^2/4+u'.$$

Such non-uniqueness of decomposition makes me wonder, can we still use chain rule and get the same result? and how we know, given $h, u, v$, how to write $h$ as a function of $u,v$? Will the case (2) cause more complicated issues than case (1)? And would anyone name specific fields dealing with these issues, if there is any?

1

There are 1 best solutions below

6
On

Let me go to your first example, but I'm going to rewrite it:

Define $$ F: \Bbb R^3 \to \Bbb R : (u, v, w) \mapsto 1 + w^2. $$ While it's conventional to denote the partial derivatives of $F$ with symbols like $$ \frac{\partial F}{\partial u}, $$ etc., this can lead to considerable confusion, esp. when we let $G(u,v,w) = F(v, w, u)$, for instance. I propose for now to write the derivatives of $F$ with respect to the "slots" in which arguments appear, so that the thing written above is now written $$ D_1 F, $$ i.e., $D_1 F$ denotes the derivative of $F$ with respect to its first argument, regardless of the temporary variable used to name that first argument when $F$ was defined. Clear?

When we do this, the chain rule is no longer quite as pretty. But at least in one case, it retains some of its niceness. If $g_1, g_2, g_3 : \Bbb R \to \Bbb R$, and we define $$ H(t) = F(g_1(t), g_2(t), g_3(t)), $$ then the chain rule becomes $$ H'(t) = D_1 F(g_1(t), g_2(t), g_3(t)) g_1'(t) + D_2 F(g_1(t), g_2(t), g_3(t)) g_2'(t) + D_3 F(g_1(t), g_2(t), g_3(t)) g_3'(t). $$

Now in the particular case you're looking at, we have the function $F$; it's a function defined on all of 3-space, and has nothing to do with the function $f$. Let's go ahead and compute its derivatives: $$ D_1 F(u,v,w) = 0\\ D_2 F(u,v,w) = 0\\ D_3 F(u, v, w) = 2w. $$ Not so bad, right?

If we define $$ H(t) = L(1, f(t), f'(t)) $$

(notice that I'm using a new name here, because $H$ is a function of a single variable, while $F$ is a function of three variables), then we can use the chain rule to compute \begin{align} H'(t) &= D_1 F(1, f(t), f'(t)) 1'(t) +D_2 F(1, f(t), f'(t)) f'(t) +D_3 F(1, f(t), f'(t)) (f')'(t)\\ &= D_1 F(1, f(t), f'(t)) 0 +0~f'(t) +2(f'(t)) (f')'(t)\\ &= 2f'(t) f''(t) \end{align}

Now if you compare this simple computation to the confusion you describe in the "My question is" section, you'll see a couple of things.

  1. You've used the letter $F$ to denote two different things: a function of three variables, and a function of one variable. Sadly, this is very common, and eventually with practice you get used to it. But for beginners, it's just a nightmare. So when I encounter things like this, I rewrite them more clearly, even if it involves more writing

  2. The author may have chosen to write the function $F$ with three arguments because later in the exposition there will be a need to make parallel constructions --- things involving some other function of three variables where each of the three variables enter into the formula for $F$, not just the third one. If I'm guessing correctly, you're looking at a Calculus of Variations explanation, and the author is explaining how to minimize arclength. But what if the thing you wanted to minimize was something involving not only the derivative of $f$, but $f$ itself? Then your formula for $F$ would involved both $v$ and $w$.

I don't believe I've answered all your questions, but perhaps I've helped you to get onto the right track.