Can someone explain how to construct a meaningful interpretation of the Jacobian (and full derivative) of a matrixial mapping?

60 Views Asked by At

Many are familiar with $R^n \rightarrow R^m$ mappings, mappings from n-dimensional coordinates to m-dimensional coordinates. Although I still don't entirely understand what it means to take the "derivative" of such a map, I can understand the instructions for calculating the jacobian of any given $R^n \rightarrow R^m$ function which will result in an $R^{m \times n}$ matrix of first partial derivatives.

However, I'm considering a slightly different and less conventional scenario: maps from $R^n \rightarrow R^{m \times k}$, or that is, maps from n-dimensional coordinates to matrices.

In particular, let's consider $R^2 \rightarrow R^{2 \times 2}$ maps, maps from 2D coordinates to 2x2 matrices. An example of this might be a function

$f(x,y) = \begin{pmatrix} xy & x+y \\ x^2 & y^2-x \end{pmatrix}$.

This leaves a question: how do we construct a meaningful analog of a Jacobian for these types of mappings, and thus a meaningful concept of a "derivative" for multi-variate matrices, rather than strictly multi-variate vectors?

I suspect the answer involves a rank-3 tensor, which I don't really want to deal with unless the math works out nicely. So, I might work with a sort of trick I've seen mentioned on stackexchange in another thread, though never explained nor proven in any fashion, of re-writing a $2 \times 2$ matrix into the form of a $4 \times 1$ vector.

However, this then results in a $4 \times 2$ Jacobian, and I'm not sure how to work with that or if I'm interpreting that correctly, I have no idea how to proceed or what the formalization is. What does it even mean to calculate a "total derivative" of this map? Is this the correct way to construct a Jacobian of a multi-variate matrix? And now that I've calculated it, what do I do with it to construct a meaningful notion of the "derivative" of the original multi-variate matrix, in a particular direction in the domain space?

1

There are 1 best solutions below

3
On

The derivative of a map $f: \mathbb{R}^2 \to \mathbb{R}^{2 \times 2}$ at a point $x \in \mathbb{R}^2$ is defined to be the linear map $f^\prime (x) : \mathbb{R}^2 \to \mathbb{R}^{2 \times 2}$ that satisfies $$ f(x+h ) = f(x) + f^\prime (x) h + o(h)$$ using the little o notation.

Let $e_i$ be the vector with a 1 at the $i$-th position and zeros elsewhere. From the above we can compute $$f^\prime (x) e_i = \lim_{t \to 0} \frac{f(x+e_it) -f(x)}{t} = \frac{\partial f}{x_i}(x)$$ which is just the matrix containing all the partial derivatives of $f$ with respect to the i-th variable. So in this case (using the notation $x = (x_1, x_2)$) $$\frac{\partial f}{x_1}(x) = \begin{pmatrix} x_2 & 1 \\ 2x_1 & -1 \end{pmatrix} $$ and $$\frac{\partial f}{x_2}(x) =\begin{pmatrix} x_1 & 1 \\ 0 & 2x_2 \end{pmatrix}. $$

So in total the Jacobi "matrix" $M_{f^\prime} $ of $f^\prime$ is given by $$M_{f^\prime} (x)= \begin{pmatrix} \begin{pmatrix} x_2 & 1 \\ 2x_1 & -1 \end{pmatrix} & \begin{pmatrix} x_1 & 1 \\ 0 & 2x_2 \end{pmatrix} \end{pmatrix}. $$

The same result can be achieved by realizing that the map $\mathrm{vec}: \mathbb{R}^{2 \times 2} \to \mathbb{R}^4$ that stacks the columns on each other is linear. Define the map $\tilde{f}: \mathbb{R}^2 \to \mathbb{R}^4$ by $\tilde{f}(x) = \mathrm{vec} f(x) $. We can compute the usual Jacobi matrix of $\tilde{f}$.

With your example $f$ we have $$\tilde{f}(x_1,x_2) = \begin{pmatrix} x_1 x_2 \\ x_1^2 \\ x_1 +x_2\\ x_2^2-x_1 \end{pmatrix} $$ and so the Jacobi matrix $M_{\tilde{f}^\prime}$ of $\tilde{f}$ is
$$ M_{\tilde{f}^\prime}(x_1,x_2) = \begin{pmatrix} x_2 & x_1 \\ 2x_1 & 0 \\ 1 & 1 \\ -1 & 2x_2 \end{pmatrix}. $$ Because $\mathrm{vec}$ is linear we obtain $f^\prime =\mathrm{vec}^{-1} \tilde{f}^\prime $. And then for the Jacobi matrices $ f^\prime(x) e_i = \mathrm{vec}^{-1}\tilde{f}^\prime (x) e_i$ and $\tilde{f}^\prime (x) e_i$ is just the $i$-th column of the Jacobi matrix of $\tilde{f}^\prime$ at $x$. And therefore $$ f^\prime(x_1,x_2) e_1 = \begin{pmatrix} x_2 &1 \\ 2 x_1 & -1 \end{pmatrix} $$ and $$ f^\prime(x_1,x_2) e_2 = \begin{pmatrix} x_1 &1 \\ 0& 2 x_2 \end{pmatrix} $$ since $M_{f^\prime} (x_1,x_2) = (f^\prime(x_1,x_2) e_1 , f^\prime(x_1,x_2) e_2) $ we see that this calculation is totally equivalent to the above.