Why is $\nabla_{\mathbf{x}}z \not= \nabla_{\mathbf{y}}z \times \frac{\partial\mathbf{y}}{\partial\mathbf{x}} $?

59 Views Asked by At

I'm learning about the chain rule to compute the gradient, w.r.t to a subset of its variables, of a function that is a composition of vector-fields. Let $\mathbf{x}\in \mathbb{R}^m, \mathbf{y}\in \mathbb{R}^n, \mathbf{g}:\mathbb{R}^m\to\mathbb{R}^n, f:\mathbb{R}^n\to \mathbb{R}$.

The result is

"If $\mathbf{y}=\mathbf{g}(x), z=f(\mathbf{y})=f(\mathbf{g}(x))$, then

$\nabla_{\mathbf{x}}z=\left(\frac{\partial\mathbf{y} }{\partial\mathbf{x}}\right) ^T \times \nabla_{\mathbf{y}}z$"

But how is this derived? My initial attempt to derive this is as follows. Please tell me where it's gone astray. $$z=f(\mathbf{g}(x))\implies \frac{\partial z}{\partial \mathbf{y}} \times \frac{\partial \mathbf{y}}{\partial \mathbf{x}} =\nabla_{y}z \times \frac{\partial \mathbf{y}}{\partial \mathbf{x}}$$

Where I have used the chain rule to obtain the second equality and the definition of the gradient to obtain the third. This can't be correct as the dimensions don't agree, the final product an $n \times1 $ matrix times a $n \times m$ matrix.

Where have I misapplied the chain rule and/or a definition? Thanks :)

$\nabla_{\mathbf{x}}z \not= \nabla_{\mathbf{y}}z \times \frac{\partial\mathbf{y}}{\partial\mathbf{x}} $

1

There are 1 best solutions below

1
On BEST ANSWER

The usual notational convention is to put partial derivatives into columns of the derivative matrix. What you're writing as a gradient vector should therefore be a row matrix, i.e., the transpose of the gradient. The chain rule (in "Newtonian" matrix form as opposed to Leibniz form) reads $$ \underbrace{D(f \circ g)(x)}_{1 \times m} = \underbrace{Df\bigl(g(x))}_{1 \times n}\, \underbrace{Dg(x)}_{n \times m}. $$ Your gradient version is obtained by taking the transpose.