Could anyone explain in simple words (and maybe with an example) what the difference between the gradient and the Jacobian is?
The gradient is a vector with the partial derivatives, right?
Could anyone explain in simple words (and maybe with an example) what the difference between the gradient and the Jacobian is?
The gradient is a vector with the partial derivatives, right?
The gradient vector of a scalar function $f(\mathbf{x})$ that maps $\mathbb{R}^n\to\mathbb{R}$ where $\mathbf{x}=<x_1,x_2,\ldots,x_n>$ is written as $$\nabla f(\mathbf{x})=\frac{\partial f(\mathbf{x})}{\partial x_1}\hat{x}_1+\frac{\partial f(\mathbf{x})}{\partial x_2}\hat{x}_2+\ldots+\frac{\partial f(\mathbf{x})}{\partial x_n}\hat{x}_n$$
Whereas the Jacobian is taken of a vector function $\mathbf{f}(\mathbf{x})$ that maps $\mathbb{R}^n\to\mathbb{R}^m$, where $\mathbf{f}=<f_1,f_2,\ldots,f_m>$ and $\mathbf{x}=<x_1,x_2,\ldots,x_n>$. The Jacobian is written as
$$J_\mathbf{f} = \frac{\partial (f_1,\ldots,f_m)}{\partial(x_1,\ldots,x_n)} = \left[ \begin{matrix} \frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial f_m}{\partial x_1} & \cdots & \frac{\partial f_m}{\partial x_n} \end{matrix} \right]$$
Note that when $m=1$ the Jacobian is same as the gradient because it is a generalization of the gradient.
The Jacobian determinant can be used for changes of variables because it can be viewed as the ratio of an infinitesimal change in the variables of one coordinate system to another. This requires that the function $\mathbf{f}(\mathbf{x})$ maps $\mathbb{R}^n\to\mathbb{R}^n$, which produces an $n\times n$ square matrix for the Jacobian. For example
$$\iiint_R f(x,y,z) \,dx\,dy\,dz = \iiint_S f(x(u,v,w),y(u,v,w),z(u,v,w))\left|\frac{\partial (x,y,z)}{\partial(u,v,w)}\right|\,du\,dv\,dw$$
where the Jacobian $J_\mathbf{g}$ is taken of the function
$$\mathbf{g}(u,v,w)=x(u,v,w)\hat{\imath}+y(u,v,w)\hat{\jmath}+z(u,v,w)\hat{k}$$
and the areas $R$ and $S$ correspond to each other.
The gradient and Jacobian are both disguise names for what is really "the derivative" of respectively a real-valued function of several real variables and a vector field, respectively. Or in better terms the gradient and Jacobian are exactly to functions of type $\mathbb{R}^n \rightarrow \mathbb{R}$ and $\mathbb{R}^m \rightarrow \mathbb{R}^n$ what the ordinary derivative $\frac{d}{dx}$ is to function of type $\mathbb{R} \rightarrow \mathbb{R}$, as opposed to partial and directional derivatives and other such derivative-like concepts. They are the "straight-up" honest-to-goodness derivatives - something all the calc texts I've seen seem to just quietly omit despite that it is incredibly important to developing an intuitive grasp of these concepts!
The derivative of an ordinary function at some point gives you, intuitively, the following things:
When you go to a real-valued function of $n$ variables, the derivative becomes something of a different kind than the original map. (*) This can be seen by noting that if, say, we're considering a real-valued function of two variables, what the direct generalization of 1) above is is the "slope of the tangent plane" to the surface formed by the function when graphed as a surface plot in 3D, but a plane doesn't just have a steepness, it also has a direction in which it slopes upward or which it "points" (along the horizontal plane) with its upward climb. Thus to specify its slope, we need both a magnitude and a direction - that is, a vector, and thus it should be no surprise that the derivative of our function of two variables, $\mathrm{grad}[f]$, is just such a vector. The corresponding linear map is a linear functional (linear map that outputs a number), and it acts upon the input vector by this acting through the dot product. Equivalently the linear map is the covector $(\mathrm{grad}[f])^T$, where the little $T$ is chosen purposefully to coincide with, as it actually is, the linear-algebraic transpose.
When you get to a function of several variables to several variables, now you are talking a general linear map between spaces, and the derivative must be described by a matrix at each point. This matrix is just the Jacobian matrix. It is the exact equivalent of the number $a$ above, or the slope $m$ of the tangent line - only now it's the "slope" of some weird hyperdimensional tangenty-thing on another such hyperdimensional surfacey-thing: a "thing" we'd more properly call, respectively, the tangent space to the manifold that is formed by the hyperdimensional "graph" of the function (who lives in $\mathbb{R}^{n+m}$) - concepts that we distinguish because there is a way to describe them on their own terms without actually needing to reference them as being embedded in any kind of hyperdimensional space to begin with.
(*) Actually, technically so is the ordinary derivative, at least under the second interpretation above. A linear map is not a number, and actually if we want to get real picky (in mathese: stop identifying by isomorphisms) the "first" notion of derivative is, for consistency, a 1-vector, not a scalar.
The gradient is the vector formed by the partial derivatives of a scalar function.
The Jacobian matrix is the matrix formed by the partial derivatives of a vector function. Its vectors are the gradients of the respective components of the function.
E.g., with some argument omissions,
$$\nabla f(x,y)=\begin{pmatrix}f'_x\\f'_y\end{pmatrix}$$
$$J \begin{pmatrix}f(x,y),g(x,y)\end{pmatrix}=\begin{pmatrix}f'_x&&g'_x\\f'_y&&g'_y\end{pmatrix}=\begin{pmatrix}\nabla f;\nabla g\end{pmatrix}.$$
If you want, the Jacobian is a generalization of the gradient to vector functions.
Addendum:
The first derivative of a scalar multivariate function, or gradient, is a vector,
$$\nabla f(x,y)=\begin{pmatrix}f'_x\\f'_y\end{pmatrix}.$$
Thus the second derivative, which is the Jacobian of the gradient is a matrix, called the Hessian.
$$H(f)=\begin{pmatrix}f''_{xx}&&f''_{xy}\\f''_{yx}&&f''_{yy}\end{pmatrix}.$$
Higher derivatives and vector functions require the tensor notation.
The gradient in a general coordinate system depends on the metric tensor but the Jacobian matrix consists of only the partial derivatives.
The gradient of a vector field is given by:
$\nabla\mathbf{f}=g^{jk}\frac{\partial f^{i}}{\partial x^{j}}\mathbf{e}_{i}\otimes\mathbf{e}_{j}$,
where the Einstein summation notation is implied, $g^{jk}$ are the metric tensor elements evaluated from the Jacobian matrix consisting of the partial derivatives of the coordinate transformation from the Cartesian coordinate system. In Cartesian coordinate system, this equals to exactly the transpose of the Jacobian matrix, which is given by, regardless of the metric tensor,
$\left\{ J\mathbf{f}\right\} _{i,j}=\frac{\partial f^{i}}{\partial x^{j}}$.
For example, for the spherical coordinate system with the coordinate $\mathbf z$, we have
$x_0=z_0\sin z_1\sin z_2,\ x_1=z_0 \cos z_1 \sin z_2,\ x_2=z_0\cos z_2$, where $\mathbf x$ is the coordinate in the Cartesian coordinate system.
Given a vector function $f$, Each column of gradient is given by $\left\{ \nabla\mathbf{f}\right\} _{:,i}=\frac{\partial f^{i}}{\partial\mathbf{z}_{0}}\hat{\mathbf{z}}_{0}+\frac{1}{\mathbf{z}_{0}}\frac{\partial f^{i}}{\partial\mathbf{z}_{1}}\hat{\mathbf{z}}_{1}+\frac{1}{\mathbf{z}_{0}\sin\mathbf{z}_{2}}\frac{\partial f^{i}}{\partial\mathbf{z}_{2}}\hat{\mathbf{z}}_{2}$
or
$\begin{bmatrix}\frac{\partial f^{i}}{\partial\mathbf{z}_{0}} & \frac{1}{\mathbf{z}_{0}}\frac{\partial f^{i}}{\partial\mathbf{z}_{1}} & \frac{1}{\mathbf{z}_{0}\sin\mathbf{z}_{2}}\frac{\partial f^{i}}{\partial\mathbf{z}_{2}}\end{bmatrix}^\mathrm{T}$
but the each row of Jacobian is still given by
${\left\{ J\mathbf{f}\right\} _{i,:}}=\begin{bmatrix}\frac{\partial f^{i}}{\partial\mathbf{z}_{0}} & \frac{\partial f^{i}}{\partial\mathbf{z}_{1}} & \frac{\partial f^{i}}{\partial\mathbf{z}_{2}}\end{bmatrix}$
The reason for this is because the Jacobian matrix is applied to solve integrals by substitution where the determinant of the Jacobian matrix is needed. It is also used to transform partial derivatives into partial derivatives of another coordinate system. Another application is to evaluate the metric tensor as mentioned before.
PS: I hope this answer doesn't get deleted because I elaborated the one liner I initially had but probably was not very convincing.
These are two particular forms of matrix representation of the derivative of a differentiable function $f,$ used in two cases:
For example, with $f:\mathbb{R}^2\to\mathbb{R}$ such as $f(x,y)=x^2+y$ you get $\mathrm{grad}_{(x,y)}(f)=[2x \,\,\,1]$ (or $\nabla f(x,y)=(2x,1)$) and for $f:\mathbb{R}^2\to\mathbb{R}^2$ such as $f(x,y)=(x^2+y,y^3)$ you get $\mathrm{Jac}_{(x,y)}(f)=\begin{bmatrix}2x&1\\0&3y^2\end{bmatrix}.$