Verification of matrix chain rule

388 Views Asked by At

I wrote down a simple example of function composition for multivariate and vector-valued functions to see if I can apply the matrix chain rule. I would appreciate it if someone could verify that this is a correct application of the matrix chain rule.

Unfortunately, all the examples I can find online are either partial derivatives (not the total derivative matrix) or functions that are scalar valued, which is not what I'm looking for.

enter image description here

2

There are 2 best solutions below

0
On BEST ANSWER

The chain rules says that $D(f\circ g)(\mathbf x) = Df(g(\mathbf x))\circ Dg(\mathbf x)$. Expanded in terms of coordinates, the right-hand side becomes the product of the Jacobian matrices of $f$ and $g$ evaluated at the appropriate points. You computed the Jacobians and multiplied them, but you evaluated $Df$ at the point $\mathbf x$ instead of at $g(\mathbf x)$. I recommend renaming the variables in the definition of $f$ to help prevent this.

So, using $y_k$ instead of $x_k$ in the definition of $f$, we have $$Df = \begin{bmatrix}1&y_3&y_2\\2y_1&0&0\\y_2&y_1&0\\0&0&1\end{bmatrix} \\ Dg = \begin{bmatrix}x_2&x_1\\2x_1x_2&x_1^2\\0&1\end{bmatrix}$$ and so $$Df(g(\mathbf x))Dg(\mathbf x) = \begin{bmatrix}1&x_2&x_1^2x_2\\2x_1x_2&0&0\\x_1^2x_2&x_1x_2&0\\0&0&1\end{bmatrix} \begin{bmatrix}x_2&x_1\\2x_1x_2&x_1^2\\0&1\end{bmatrix} = \begin{bmatrix}x_2+2x_1x_2^2&x_1+2x_1^2x_2\\2x_1x_2^2&2x_1^2x_2\\3x_1^2x_2^2&2x_1^3x_2\\0&1\end{bmatrix}.$$ The two individual matrices agree with your updated answer, but it looks like you omitted the last row of the product.

To check this, we compute $D(f\circ g)(\mathbf x)$ directly. We have $$f\circ g: (x_1,x_2)\mapsto \left(x_1^2x_2^2+x_1x_2,x_1^2x_2^2,x_1^3x_2^2,x_2\right),$$ so $$D(f\circ g)(\mathbf x) = \begin{bmatrix}2x_1x_2^2+x_2&2x_1^2x_2+x_1\\2x_1x_2^2&2x_1^2x_2\\3x_1^2x_2^2&2x_1^3x_2\\0&1\end{bmatrix},$$ which agrees with the other calculation.

4
On

enter image description here

As IvoTerek's comment mentions, I calculated df/dx and multiplied that by dg/dx, instead I should've calculated df/dg. This is an updated answer, first I calculate the composition f(g(x)), and then take the total derivative with respect to g(x), and then multiply that by the total derivative of g(x) wrt x.

EDIT:

Edited after comment by @amd. First calculate df/dx and evaluate it at g, which I denote by df(g)/dg. And then multiply by dg/dx. I have verified this answer against the straightforward calculation of df(g(x))/dx.