Where Does the Jacobian Matrix Come from (Why Does it Work)?

2.2k Views Asked by At

Why does the Jacobian matrix

$$ J = \begin{pmatrix} \frac{\partial f_1}{\partial x_1} & \dots & \frac{\partial f_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial f_n}{\partial x_1} & \dots & \frac{\partial f_n}{\partial x_n} \end{pmatrix} $$

work, and where does it come from?

I came across this matrix in a multivariable calculus context where it was used to do multivariate substitution. It seems so arbitrary to me and I don't understand where it comes from. Can anyone give some insight or intuition about this?

4

There are 4 best solutions below

0
On BEST ANSWER

In functions of one variable, the derivative is the slope of the tangent line to graph of $f(x)$.

The tangent line to the curve $y=f(x)$ at $x=a$ is given by,

$h(x)=f(a)+f'(x)(x-a)$

As $x \rightarrow a$, $f(x)$ approaches the tangent line $h(x)$. On a computer algebra system like Mathematica, if you zoom in at the point $x=a$, $f(x)$ looks more and more like the tangent line $h(x)$. $f'(x)$ is it's slope.

In fact, a function is said to be differentiable, if there exists the limit :

$$\lim_{x \rightarrow a} \frac{f(x)-h(x)}{x-a}=0$$

The Jacobian matrix plays the role of the derivative of a vector-valued function $\mathbf{f}$,

$$\mathbf{f}=(f_1(x_1,\ldots,x_n),f_2(x_1,\ldots,x_n),\ldots,f_m(x_1,\ldots,x_n))$$

of $n$ input variables and $m$ output variables.

For concreteness assume, $m=1$, $n=2$, that is a function of two variables.

Analogous to the single variable case, the tangent plane to a surface $z=f(x,y)$ at the point $(x_0,y_0)$ is given by,

$h(x,y)=f(a,b)+Df(x,y)\cdot (x-a,y-b)$

where $Df(x,y)=\left[\frac{\partial f}{\partial x},\frac{\partial f}{\partial y}\right]^T$ is the Jacobian matrix.

As $(x,y) \rightarrow (a,b)$, the surface $f(x,y)$ approaches the tangent plane $h(x,y)$. On Mathematica, if you zoom in at the point $(a,b)$, $f(x,y)$ looks more and more like the tangent plane. $\partial f/\partial x$ is the increase in the function value, for small bump $\Delta x$. $\partial f/\partial y$ is the increase in the function value, for small bump $\Delta y$.

On similar lines, a vector valued function is said to be differentiable, if there exists the limit :

$$\lim_{\mathbf{x} \rightarrow \mathbf{a}} \frac{\mathbf{f(x)}-\mathbf{h(x)}}{||\mathbf{x}-\mathbf{a}||}=0$$

7
On

The Jacobian matrix is a listing of all the function's derivatives relative to the standard basis. It tells you how fast the function changes in each of its various dimensions, as the input coordinates change.

It plays the same role that the derivative does in single variable calculus.

0
On

Let $f\colon\mathbb{R}^m\rightarrow\mathbb{R}^n$ be a function, $x\in\mathbb{R}^m$ a point and pick a vector $v\in\mathbb{R}^m$. The difference quotient $[f(x+tv)-f(x)]/t$ measures the rate of change in the value of $f$ as the input changes by travelling $t$ units in the $v$ direction. If the function $f$ is continuously differentiable, then the limit of this quotient exists as $t\rightarrow0$ and is denoted $\partial_vf(x)$. It is the "instantaneous rate of change of $f$ in the direction $v$" (this generalizes the interpretation of the classical derivative for functions of one variable). If $Jf(x)$ is the Jacobian of $f$ at $x$, then $Jf(x)\cdot v=\partial_vf(x)$, so the Jacobian matrix contains all the information about the instantaneous rates of change of $f$ in all possible directions.

0
On

The Jacobian matrix by itself is not the fundamental concept. The matrix by itself is simply a useful computational tool (actually sometimes it's useful, sometimes it completely obscures the "big picture"). What is important is the notion of differentiability; see this answer for some additional heuristic and motivating remarks.

In general, the definition of a differentiability is as follows:

Let $V,W$ be normed vector spaces over a field (either $\Bbb{R}$ or $\Bbb{C}$), and let $A\subseteq V$ be a non-empty open set. Let $f:A\to W$ be a function, and let $\alpha\in A$ be a point. We say $f$ is differentiable at $\alpha$ if there exists a (continuous) linear transformation $T:V\to W$ such that \begin{align} \lim_{h\to 0}\dfrac{\lVert f(\alpha+h) - f(\alpha) - T(h)\rVert_W}{\lVert h \rVert_V} &= 0 \tag{$*$} \end{align} In this case, we can prove $T$ is unique, and so we denote it using any of the symbols $Df(\alpha), Df_{\alpha}, df(\alpha), df_{\alpha}$ or really any other notation which reminds you of a derivative (depends on the author).

Now, the derivative $Df_{\alpha}$ (which is by definition a linear transformation $V\to W$) is the fundamental object. Recall from basic linear algebra that if $\dim V = n$ and $\dim W = m$ are finite-dimensional spaces, then given any linear transformation $T:V\to W$, if we choose a basis $\beta$ on $V$ and a basis $\gamma$ on $W$, then we obtain a certain $m\times n$ matrix $[T]_{\beta}^{\gamma}$.

Partial derivatives come into the picture as a calculational tool if you assume $V=\Bbb{R}^n$ and $W=\Bbb{R}^m$. In this case, we choose the standard ordered basis $\sigma_n = \{e_1, \dots, e_n\}$ on $V= \Bbb{R}^n$ and $\sigma_m = \{e_1, \dots, e_m\}$ on $W=\Bbb{R}^m$. In this case, the matrix representation $[Df_{\alpha}]_{\sigma_n}^{\sigma_m}$ will be a certain $m\times n$ matrix, called the Jacobian matrix, and it is usually denoted as $f'(\alpha)$ or $Df_{\alpha}$ (sometimes people blur the distinction between a linear transformation and its matrix representation), or even $Jf_{\alpha}$ or something like $J_{f}(\alpha)$ (I like the $f'(\alpha)$ notation because it agrees with the single-variable case $V=W=\Bbb{R}$; i.e $n=m=1$).

It turns out that the Jacobian matrix $f'(\alpha)$ is exactly the matrix of partial derivatives: \begin{align} f'(\alpha):= [Df_{\alpha}]_{\sigma_n}^{\sigma_m} &= \begin{pmatrix} \partial_1 f_1(\alpha) & \dots & \partial_n f_1(\alpha) \\ \vdots & \ddots & \vdots \\ \partial_1f_m(\alpha) & \dots & \partial_nf_m(\alpha) \end{pmatrix} \end{align}

So, now if I wanted to calculate the evaluation of the derivative on a vector, such as $Df_{\alpha}(h) \in \Bbb{R}^m$, all we have to do is basic linear algebra: \begin{align} Df_{\alpha}(h) &= Df_{\alpha}\left(\sum_{j=1}^n h_j e_j\right) =\sum_{j=1}^n h_jDf_{\alpha}(e_j) = \sum_{j=1}^n \sum_{i=1}^m h_j \partial_j f_i(\alpha) \, e_i \end{align}


So far we've been talking about differential calculus. In your question you mentioned substitution, so I guess you mean in the context of integration? Well, the determinant of the derivative of your change of variables (Jacobian determinant for short) comes up as the "fudge factor" which takes into account how volumes of regions get distorted when you change from one set of coordinates to another. In this answer I briefly outline the heuristics of why this works.