Gradient with a change of variables?

865 Views Asked by At

Given $w\in\mathbb{R}^d$, $X\in\mathbb{R}^{n\times d}$ and $f(w) = \frac{1}{2}||y-Xw||^2_2$. My other function is $z\in\mathbb{R}^n$ such that $g(z) = \frac{1}{2}||y-z||^2_2$, the gradients are,

\begin{align} \nabla_w f(w) &= X^T(Xw-y)\\ \nabla_zg(z) &= z-y. \end{align} What happens when I evaluate the gradient of $g(z)$ at $z=Xw$?

Solution

My though is that because $z\in\mathbb{R}^n$, the gradient should produce a value in $\mathbb{R}^n$, meaning, \begin{align} \left. \nabla_zg(z)\right|_{z=Xw} = Xw-y. \end{align}

But my confusion comes from, if I evaluate, $\nabla_wg(Xw)$, then I get, \begin{align*} \left. \nabla_zg(z)\right|_{z=Xw} = \nabla_wg(Xw) = X^T(Xw-y). \end{align*} But now this value, $X^T(Xw-y)\in\mathbb{R}^d$, which makes me think its wrong. But I am not exactly sure how to approache this substitution of variables.

1

There are 1 best solutions below

0
On

Use $\nabla_zg$ to write the differential of the function $$dg = (z-y) \cdot dz$$ Now use the relationship $\,z=Xw\,$ to switch the independent variable from $z\to w$ $$\eqalign{ dg &= (Xw-y) \cdot (X\,dw) \\ &= \big(X^T(Xw-y)\big) \cdot dw \\ }$$ and recover the gradient from that last equation $$\eqalign{ \nabla_wg &= X^T(Xw-y) \\ }$$ This is identical to the other gradient calculation using $$f(w) = g(Xw)$$