Vector/Matrix Calculus and Chain Rule Question

99 Views Asked by At

Here's the problem I'm trying to solve.

Let $f:\mathbb{R}^3\to\mathbb{R}$ be a differentiable function, and let $M$ be a $3\times3$ matrix. Let $f':\mathbb{R}^3\to\mathbb{R}$ be defined by $f'(x,y,z)=f\left((M\cdot(x,y,z)^T)^T\right)$. How can I express the gradient of $f'$ in terms of the gradient of $f$?

I assume this requires use of the vector chain rule, which I'm having trouble wrapping my head around, and some matrix calculus, which is something I know almost nothing about. My end goal is to show that if the curve defined as the roots of $f$ has a singular point, then the one defined as the roots of $f'$ has a corresponding singular point, but to do that I need to be able to determine where $\nabla f'=0$. I don't expect that this is relevant to my question, but in case it is, you can assume that $f$ is a homogeneous polynomial and $M$ is invertible.

Thank you for your help!

2

There are 2 best solutions below

0
On

So the chain rule tells us that $$Df'(\vec x) = Df(M\vec x)M.$$ (A $1\times 3$ matrix is the product of a $1\times 3$ matrix with a $3\times 3$ matrix.) Transposing to get the gradient, $$\nabla f'(\vec x) = M^\top \,\nabla f(M\vec x).$$

0
On

Let's say that you have worked really hard and calculated the gradient of a particular function $$f=f(y),\quad g=\left(\frac{\partial f}{\partial y}\right)$$ Afterwards you are informed that what you had assumed was the independent variable is actually a function of a more fundamental variable, i.e. $\;y=Mx$.

Is there any way to leverage the gradient wrt $y$ when calculating the gradient wrt $x$?

Yes, there is! Use the old gradient to write down the differential of the function, then perform a change of variables from $y\to x$, then recover the new gradient. $$\eqalign{ df &= g^Tdy \\ &= g^TM\,dx \\ &= (M^Tg)^Tdx \\ \left(\frac{\partial f}{\partial x}\right) &= M^Tg \;=\; M^T\left(\frac{\partial f}{\partial y}\right) \\ \\ }$$


NB: $\;$The matrix $M$ does not need to be invertible for this analysis to hold$.\;$ In fact, it is often helpful to assume that $M$ is rectangular in order to ensure the correct ordering of the terms and transposes based on dimensional considerations.

As you learn about matrix calculus, you will find that the chain rule can be difficult to apply because it requires the calculation of intermediate quantities that are often third and fourth order tensors.

The differential approach is simpler because the differentials of vectors and matrices follow the usual rules of matrix algebra.

An alternative is to vectorize all of your matrices (refer to Matrix Differential Calculus by Magnus and Neudecker).

The most powerful approach is to learn index notation, which was explicitly created to handle calculations involving higher-order tensors (refer to any good Physics textbook).