I am studying the affine invariance of the Newton step for unconstrained optimization and I came across with the following:
Suppose $\textbf{T}\in \Re ^{n\times n}$ is nonsingular and define $\tilde{f}(\textbf{y})=f(\textbf{T}\textbf{y})$. Then we have the following statements: $$\nabla \tilde{f}(\textbf{y})=\textbf{T}^T\nabla f(\textbf{x})$$ and $$\nabla ^2 \tilde{f}(\textbf{y})=\textbf{T}^T\nabla ^2 f(\textbf{x})\textbf{T}$$ where $\textbf{x}=\textbf{Ty}$
I don't understand why the above are correct. Is there any property I have to apply?
It helps to look at the Taylor expansions of $f$ and $\tilde{f}$:
$$f(z+\Delta z) \approx f(z) + \langle \nabla f(z), \Delta z \rangle + \tfrac{1}{2}\langle \nabla^2 f(z) \Delta z, \Delta z \rangle$$ $$\tilde{f}(y+\Delta y) \approx \tilde{f}(y) + \langle \nabla \tilde{f}(y), \Delta y \rangle + \tfrac{1}{2}\langle \nabla^2 \tilde{f}(y) \Delta y, \Delta y \rangle$$ Substitute $z=T y$ into the first of these:
$$\begin{aligned}f(T(y+\Delta y)) &\approx f(Ty) + \langle \nabla f(Ty), T \Delta y \rangle + \tfrac{1}{2}\langle \nabla^2 f(Ty) T \Delta y, T \Delta y \rangle\\ &= f(Ty) + \langle T^T \nabla f(Ty), \Delta y \rangle + \tfrac{1}{2}\langle T^T \nabla^2 f(Ty) T \Delta y, \Delta y \rangle \end{aligned}$$ The second line utilizes a known property of inner products and linear operators: that $\langle x, A y\rangle = \langle A^T x, y \rangle$.
From here we can match up terms and see that $$\tilde{f}(y) = f(Ty), \quad \nabla \tilde{f}(y) = T^T \nabla f(Ty), \quad \nabla^2 \tilde{f}(y) = T^T \nabla^2 f(Ty) T$$
Notice that we did not rely on $T$ being nonsingular. That's because it doesn't have to be; in fact, it doesn't even have to be square. But note also that this is not a rigorous proof, it's just something to help you see it more intuitively.