Why does total differential is sum of partial derivatives?

6.7k Views Asked by At

Everywhere is definition of total differential I see the sum of partial derivatives multiplied by appropriate differentials, but there is nowhere clear explanation why it is.

1

There are 1 best solutions below

4
On

The differential of a function $f:\mathbb R^m\to\mathbb R^n$ is a linear map that is the “best” approximation to the change of $f$ near some point $\mathbf p=(p^1,\dots,p^n)$, i.e., $f(\mathbf p+\mathbf h)=f(\mathbf p)+\operatorname{d}f_{\mathbf p}[\mathbf h]+o(\|\mathbf h\|)$. Restricting ourselves to a scalar-valued function $f:\mathbb R^n\to\mathbb R$, it’s fairly straightforward to show that ${\partial f\over\partial x_k}(\mathbf p)=\operatorname{d}f_{\mathbf p}[\mathbf e^k]$, where $\mathbf e^k$ is the basis vector corresponding to the $x^k$ coordinate. Since a linear map is determined by its action on the basis vectors, in this coordinate system we can write $\operatorname{d}f_{\mathbf v}$ as the row vector $\left({\partial f\over\partial x_1}(\mathbf p),\dots,{\partial f\over\partial x_n}(\mathbf p)\right)$ so that $\operatorname{d}f_{\mathbf p}[\mathbf h]$ becomes simple matrix multiplication (or, if you prefer, a dot product).

Now, the differential $dx^i$ of the affine coordinate function $x^i$ is just a function that assigns to a point $\mathbf p$ its $i$th coordinate. Using the above matrix formulation, this means that $dx^1=(1,0,\dots,0)$, $dx^2=(0,1,0,\dots,0)$, and so on. So we can write $\operatorname{d}f_{\mathbf p}$ as $${\partial f\over\partial x_1}(\mathbf p)(1,0,\dots,0)+\cdots+{\partial f\over\partial x_n}(\mathbf p)(0,0,\dots,1)$$ or $${\partial f\over\partial x_1}dx^1+\cdots+{\partial f\over\partial x_n}dx^n$$ (with the partial derivatives evaluated at $\mathbf p$).

It might help to look at this geometrically. For a scalar-valued function $f$, this linear approximation amounts to approximating the $n$-dimensional hypersurface (in $\mathbb R^{n+1}$) $y=f(\mathbf x)$ at the point $\mathbf p$ by its tangent hypersurface at that point. Just as the derivative of $f$ gives the slope of the tangent line to the curve $y=f(x)$ in the one-dimensional case $f:\mathbb R\to\mathbb R$, in the multidimensional case each partial derivative ${\partial f\over\partial x_i}$ gives the slope of the tangent hypersurface in the $x^i$ direction. The equation of the tangent hypersurface at $\mathbf p$ is thus $$y={\partial f\over\partial x_1}(x^1-p^1)+\cdots+{\partial f\over\partial x_n}(x^n-p^n)=\left({\partial f\over\partial x_1},\cdots,{\partial f\over\partial x_n}\right)(\mathbf x-\mathbf p),$$ with the partial derivatives evaluated at $\mathbf p$. Comparing this to the definition of $\operatorname{d}f_{\mathbf p}$ at the top, we again find that it can be represented as a row vector of partial derivatives, and proceed as before.