Gradients of functions involving matrices and vectors, e.g., $\nabla_{w} w^{t}X^{t}y$ and $\nabla_{w} w^t X^tXw$

784 Views Asked by At

I have encountered these two gradients $\triangledown_{w} w^{t}X^{t}y$ and $\triangledown_{w} w^t X^tXw$, where $w$ is a $n\times 1 $ vector, $X$ is a $m\times n$ matrix and $y$ is $m\times 1$ vector.

My approach for $\triangledown_{w} w^{t}X^{t}y$ was this:

$w^{t}X^{t}y$ =

$= y_1(\sum_{i=1}^{n}w_ix_{1i}) + y_2(\sum_{i=1}^{n}w_ix_{2i}) + ... + y_m(\sum_{i=1}^{n}w_ix_{mi})$ $= \sum_{j=1}^{m}\sum_{i=1}^{n} y_jw_ix_{ji}$

And I'm stuck there, not knowing how to convert it to matrix notation. I'm not even sure if it is correct.

How can I get the actual gradient $\triangledown_{w} w^{t}X^{t}y$ out of that partial derivative? Is there an easier way to get the gradient (maybe using some rules, like in ordinary calculus), because this way using summation seems tedious, especially when you have to calculate $\triangledown_{w} w^t X^tXw$?

How do I then work out $\triangledown_{w} w^t X^tXw$ ?

3

There are 3 best solutions below

5
On BEST ANSWER

Let

$$f (\mathrm x) := \rm x^\top A \, x$$

Hence,

$$f (\mathrm x + h \mathrm v) = (\mathrm x + h \mathrm v)^\top \mathrm A \, (\mathrm x + h \mathrm v) = f (\mathrm x) + h \, \mathrm v^\top \mathrm A \,\mathrm x + h \, \mathrm x^\top \mathrm A \,\mathrm v + h^2 \, \mathrm v^\top \mathrm A \,\mathrm v$$

Thus, the directional derivative of $f$ in the direction of $\rm v$ at $\rm x$ is

$$\lim_{h \to 0} \frac{f (\mathrm x + h \mathrm v) - f (\mathrm x)}{h} = \mathrm v^\top \mathrm A \,\mathrm x + \mathrm x^\top \mathrm A \,\mathrm v = \langle \mathrm v , \mathrm A \,\mathrm x \rangle + \langle \mathrm A^\top \mathrm x , \mathrm v \rangle = \langle \mathrm v , \color{blue}{\left(\mathrm A + \mathrm A^\top\right) \,\mathrm x} \rangle$$

Lastly, the gradient of $f$ with respect to $\rm x$ is

$$\nabla_{\mathrm x} \, f (\mathrm x) = \color{blue}{\left(\mathrm A + \mathrm A^\top\right) \,\mathrm x}$$

0
On

By the definition of what is to be the gradient vector of the application $$ \mathbb{R}^{n\times 1}\ni w \mapsto w^tX^ty= \sum_{i=1}^n\sum_{j=1}^m w_{i1}\cdot X_{ji}\cdot y_{1j}\in\mathbb{R} $$ we have $$ \nabla_w \big( w^tX^ty \big) = \left( \frac{\partial}{\partial w_{11}} ( w^tX^ty ), \frac{\partial}{\partial w_{21}} ( w^tX^ty ), \ldots, \frac{\partial}{\partial w_{i1}} ( w^tX^ty ), \ldots, \frac{\partial}{\partial w_{21}}( w^tX^ty ), \right) $$ For $i_0=1,2,\ldots,n$; \begin{align} \frac{\partial}{\partial w_{i_0}} ( w^tX^ty ) =& \frac{\partial}{\partial w_{i_01}} \left( \sum_{i=1}^n\sum_{j=1}^m w_{i1}\cdot X_{ji}\cdot y_{1j} \right) \\ =& \sum_{i=1}^n\sum_{j=1}^m \frac{\partial}{\partial w_{i_01}} (w_{i1}\cdot X_{ji}\cdot y_{1j}) \\ =& \sum_{j=1}^m \frac{\partial}{\partial w_{i_01}} (w_{i_01}\cdot X_{ji_0}\cdot y_{1j}) \\ =& \sum_{j=1}^m X_{ji_0}\cdot y_{1j} \\ \end{align} Then $$ \nabla_w \big( w^tX^ty \big) = \left( \sum_{j=1}^m X_{j1}\cdot y_{1j}, \sum_{j=1}^m X_{j2}\cdot y_{1j}, \ldots, \sum_{j=1}^m X_{ji_0}\cdot y_{1j}, \ldots, \sum_{j=1}^m X_{jn}\cdot y_{1j}, \right) $$ With similar calculations, we get the gradient vector of the application $$ \mathbb{R}^{n\times 1}\ni w \mapsto w^tX^tXw= \sum_{1\leq k\leq m} w_{1k}^2\cdot X_{k k}^2 + 2\sum_{1\leq k<\ell \leq m} w_{1k}\cdot X_{\ell k}\cdot X_{k\ell}\cdot w_{1\ell} \in\mathbb{R}. $$

0
On

Better use $w^tX^ty=(w^tX^ty)^t=y^tXw$