1. Suppose $f: \mathbb{R}^{n} \rightarrow \mathbb{R}$. Let $g(z) = f(Xz)$ where $x \in \mathbb{R}^{n}$ and $X \in \mathbb{R}^{n,n}.$ I am interested in $\nabla_{z} g(z).$
$$\nabla_{z} g(z) = \nabla f_{Xz}(Xz) * D_{z}(Xz)$$ $$\nabla_{z} g(z) = \nabla f_{Xz}(Xz) * X$$
Now, $X \in \mathbb{R}^{nxn}$ and $\nabla f_{Xz}(Xz) \in \mathbb{R}^{n, 1}$, causing a dimension mismatch. Of course, everything works out if we swap the twp, but what about non commutativity of matrix multiplication?
- Consider Newton's method for functions of vector input. $$x_{n+1} = x_{n} - \frac{f(x_{n})}{\nabla f(x_{n})}$$
The numerator is a real number. The denominator is a vector. How to think about this? This isn't the additive inverse is it?
I think the notation is somewhat misleading. The notation $\nabla_z f$ should mean the gradient vector $\nabla f(z)$ of $f$ at the point $z$. But what you are interested in is $\nabla_z g=\nabla g(z)$, not $\nabla_z f$. About the inconsistency, in fact inconsistency comes from regarding the gradient $\nabla f$ as a vector in $\mathbb{R}^{n,1}$, that is, a $n$-dimensional column vector. We should write $$\nabla f(p) = \left(\frac{\partial f}{\partial x_1}(p),\frac{\partial f}{\partial x_2}(p),\ldots, \frac{\partial f}{\partial x_n}(p)\right)$$ as a row vector to be consistent with the chain rule of differentiation. This gives $$ \nabla g(z) = \nabla_z [f(Xz)]=\nabla f(Xz)\cdot \frac{\partial (Xz)}{\partial z}=\nabla f(Xz)\cdot X. $$ Note that we can right multiply $X$ since we write $\nabla f(Xz)$ as a $n$-dimensional row vector.
EDIT: According to @Rahul's comment, probably the way I learned to write $\nabla f$ is different from that of most people. So another way to make the notation consistent with the chain rule is to write $$ Dg(z) = Df(Xz)\cdot X $$ and let $Df =\text{transpose}[\nabla f] =\nabla f'$.