Why do we need scalar-by-matrix derivative?

479 Views Asked by At

We all know that there are such types of derivative:

enter image description here

And scalar-by-matrix derivative $\frac{\partial y}{\partial \textbf{X}} $ is defined as follow:

\begin{pmatrix} \partial y / \partial x_{11} & \partial y / \partial x_{21} & \ldots &\partial y / \partial x_{p1}\\ \partial y / \partial x_{12} & \partial y / \partial x_{22} & \ldots &\partial y / \partial x_{p2}\\ \vdots & \vdots & \ddots & \vdots \\ \partial y / \partial x_{1q} & \partial y / \partial x_{2q} & \ldots &\partial y / \partial x_{pq}\\ \end{pmatrix}

So, why do we need this if we can just put all variables of $y$ in vector and take scalar-by-vector derivative?

Probably, I failed to get the main idea about scalar-by-matrix derivative, can you give me some numerical example? I can't find any concrete example in any textbooks.

2

There are 2 best solutions below

0
On

You may as well ask why we need matrices at all, when we can just put all the variables in one long vector.

0
On

The interesting thing is often how our objects transform under base changes. For example, if we rotate our coordinate system by $90°$, the vector $(1,0)$ is transformed into the vector $(0,1)$. This is because the matrix $$ \begin{pmatrix} 0 &1 \\ 1 & 0 \end{pmatrix}$$ acts on it, the rotations matrix by $90°$. Similarly, if we want to rotate a matrix $A$ by $90^°$, we act on it like this: $$\begin{pmatrix} 0 &1 \\ 1 & 0 \end{pmatrix}^{-1} A \: \begin{pmatrix} 0 &1 \\ 1 & 0 \end{pmatrix}$$ This yields a different result that writing out our matrix $A$ as a vector and rotating that, as one easily sees (one wouldn't even know in which direction to rotate that $4$-dim. vector). What we learn from this is the following: A matrix is a matrix because it transforms like a matrix under coordinate transformations, not because it has $n \times n$ components. By this reasoning, it is clear why we should keeps this matrix form explicit when taking the derivative by matrix indices.

However, if I am honest, I can't really think of many situtation where one actually needs to derive by a matrix. Only in the theory of Lie groups and algebras, if one doesn't want to take a more abstract approach.