How to do chain rule in matrix calculus?

397 Views Asked by At

$$f(x,y) = \begin{bmatrix} x^2 + y^2 \\ xy \end{bmatrix}$$ and $$g(u,v)=\begin{bmatrix} g_1 \\ g_2 \end{bmatrix}= \begin{bmatrix} 2u -v \\ v-u \end{bmatrix}$$, find $ [f \circ g]'$

I got the derivative matric of $f$:

$$ Df= \begin{bmatrix} 2x & 2y \\ y & x \end{bmatrix}$$

And, I can not figure out the correct derivative matrix of g: should it be $ \begin{bmatrix} \frac{\partial g_1}{\partial u} & \frac{\partial g_2}{\partial u} \\ \frac{\partial g_2}{\partial v} & \frac{\partial g_1}{\partial v}\end{bmatrix}$ or $ \begin{bmatrix} \frac{\partial g_1}{\partial v} & \frac{\partial g_2}{\partial v} \\ \frac{\partial g_1}{\partial u} & \frac{\partial g_2}{\partial u}\end{bmatrix}$ for taking chain rule correctly? Which one is correct and why?

Thanks in advance.

3

There are 3 best solutions below

2
On BEST ANSWER

Here is a somewhat detailed derivation of the chain rule using matrix calculus. We have \begin{align*} &f:\mathbb{R}^2\to\mathbb{R}^2\\ &f(x,y)=\begin{bmatrix}f_1(x,y)\\f_2(x,y)\end{bmatrix} =\begin{bmatrix}x^2+y^2\\xy\end{bmatrix}\\ \\ &g:\mathbb{R}^2\to\mathbb{R}^2\qquad\qquad\qquad\qquad\ \\ &g(u,v)=\begin{bmatrix}g_1(u,v)\\g_2(u,v)\end{bmatrix} =\begin{bmatrix}2u-v\\v-u\end{bmatrix} \end{align*}

We obtain \begin{align*} \color{blue}{\frac{\partial (f\circ g)}{\partial(u,v)}} &=\frac{\partial f}{\partial\left(g_1,g_2\right)}\,\frac{\partial g}{\partial(u,v)}\\ &=\begin{bmatrix} \frac{\partial f_1}{\partial g_1}&\frac{\partial f_1}{\partial g_2}\\ \frac{\partial f_2}{\partial g_1}&\frac{\partial f_2}{\partial g_2} \end{bmatrix} \begin{bmatrix} \frac{\partial g_1}{\partial u}&\frac{\partial g_1}{\partial v}\\ \frac{\partial g_2}{\partial u}&\frac{\partial g_2}{\partial v} \end{bmatrix}\\ &=\begin{bmatrix} \frac{\partial }{\partial g_1}\left(g_1^2+g_2^2\right)&\frac{\partial }{\partial g_2}\left(g_1^2+g_2^2\right)\\ \frac{\partial}{\partial g_1}\left(g_1g_2\right)&\frac{\partial }{\partial g_2}\left(g_1g_2\right) \end{bmatrix} \begin{bmatrix} \frac{\partial g_1}{\partial u}&\frac{\partial g_1}{\partial v}\\ \frac{\partial g_2}{\partial u}&\frac{\partial g_2}{\partial v} \end{bmatrix}\\ &=\begin{bmatrix} 2g_1&2g_2\\g_2&g_1 \end{bmatrix} \begin{bmatrix} 2&-1\\-1&1 \end{bmatrix}\\ &=\begin{bmatrix} 2(2u-v)&2(v-u)\\v-u&2u-v \end{bmatrix} \begin{bmatrix} 2&-1\\-1&1 \end{bmatrix}\\ &\,\,\color{blue}{=\begin{bmatrix} 10u-6v&-6u+4v\\ -4u+3v&3u-2v \end{bmatrix}} \end{align*}

A crosscheck by using @PierreCarres result. We have \begin{align*} (f\circ g)(u,v)=\begin{bmatrix} (2u-v)^2+(v-u)^2\\ (2u-v)(v-u) \end{bmatrix} =\begin{bmatrix} 5u^2-6uv+2v^2\\ -2u^2+3uv-v^2 \end{bmatrix} \end{align*}

It follows \begin{align*} \color{blue}{\frac{\partial (f\circ g)}{\partial (u,v)}} &=\frac{\partial}{\partial (u,v)}\begin{bmatrix} 5u^2-6uv+2v^2\\ -2u^2+3uv-v^2 \end{bmatrix}\\ &\,\,\color{blue}{=\begin{bmatrix} 10u-6v&-6u+4v\\ -4u+3v&3u-2v \end{bmatrix}} \end{align*} in accordance with the result above.

0
On

Instead of telling you which is the correct formula for the Jacobian of $g$, let me just tell you that $$ (f\circ g)(u,v)= f(g(u,v))=\begin{bmatrix} (2u-v)^2+(v-u)^2\\(2u-v)(v-u) \end{bmatrix} $$

So, you can compute the derivatives of $f$ with respect to $u,v$ and compare to what you would get by using each of the versions of the Jacobian of $g$ that you mention.

0
On

Expanding on J.G.s comment, call $(u,v) = (\nu_1,\nu_2)$ then we have in tensor notation:

$$ \frac{\partial f_i \circ g}{\partial \nu^j} = \sum \frac{ \partial f_i}{\partial g^k} \frac{\partial g^k}{ \partial \nu^j}$$

Setting $ \frac{\partial f_i}{\partial g^k} = a_k^i $ and $ \frac{\partial g^k}{\partial \nu^j} = b_j^k$ it becomes clear that the matrix of partial derivatives i.e: $Df \circ g$ takes the following form:

$$ D f \circ g = \begin{bmatrix} \frac{\partial f_1}{\partial g^1} & \frac{\partial f_1}{\partial g^2} \\ \frac{\partial f_2}{\partial g^1} & \frac{\partial f_2}{\partial g^2} \end{bmatrix} \begin{bmatrix} \frac{\partial g^1}{\partial \nu^1} & \frac{\partial g^1} {\partial \nu^1} \\ \frac{\partial g^2}{\partial \nu^1} & \frac{\partial g^2}{\partial \nu^2} \end{bmatrix} $$