Gradients of functions involving matrices and vectors, e.g., $\nabla_{w} w^{t}X^{t}y$ and $\nabla_{w} w^t X^tXw$

Question

Gradients of functions involving matrices and vectors, e.g., $\nabla_{w} w^{t}X^{t}y$ and $\nabla_{w} w^t X^tXw$

784 Views Asked by Bumbble Comm At 15 Apr 2026 - 12:42

I have encountered these two gradients $\triangledown_{w} w^{t}X^{t}y$ and $\triangledown_{w} w^t X^tXw$, where $w$ is a $n\times 1 $ vector, $X$ is a $m\times n$ matrix and $y$ is $m\times 1$ vector.

My approach for $\triangledown_{w} w^{t}X^{t}y$ was this:

$w^{t}X^{t}y$ =

$\begin{bmatrix} w_1 & w_2 & ... & w_n \end{bmatrix} \begin{bmatrix} x_{11} & x_{21} & ... & x_{m1}\\ x_{12} & x_{22} & ... & x_{m2} \\ \vdots & \vdots & \ddots & \vdots\\ x_{1n} & x_{2n} & ... & x_{mn} \end{bmatrix} \begin{bmatrix} y_1\\ y_2\\ \vdots\\ y_m \end{bmatrix}$

$= y_1(\sum_{i=1}^{n}w_ix_{1i}) + y_2(\sum_{i=1}^{n}w_ix_{2i}) + ... + y_m(\sum_{i=1}^{n}w_ix_{mi})$ $= \sum_{j=1}^{m}\sum_{i=1}^{n} y_jw_ix_{ji}$

$\frac{\partial }{\partial w_a}\left [\sum_{j=1}^{m}\sum_{i=1}^{n} y_jw_ix_{ji}\right ] = \sum_{j=1}^{m}\sum_{i=1}^{n} y_jx_{ji}\delta_{ia} = \sum_{j=1}^{m}y_jx_{ja}$

And I'm stuck there, not knowing how to convert it to matrix notation. I'm not even sure if it is correct.

How can I get the actual gradient $\triangledown_{w} w^{t}X^{t}y$ out of that partial derivative? Is there an easier way to get the gradient (maybe using some rules, like in ordinary calculus), because this way using summation seems tedious, especially when you have to calculate $\triangledown_{w} w^t X^tXw$?

How do I then work out $\triangledown_{w} w^t X^tXw$ ?

Original Q&A

There are 3 best solutions below

Bumbble Comm On 06 Jan 2018 - 12:30

By the definition of what is to be the gradient vector of the application $$ \mathbb{R}^{n\times 1}\ni w \mapsto w^tX^ty= \sum_{i=1}^n\sum_{j=1}^m w_{i1}\cdot X_{ji}\cdot y_{1j}\in\mathbb{R} $$ we have $$ \nabla_w \big( w^tX^ty \big) = \left( \frac{\partial}{\partial w_{11}} ( w^tX^ty ), \frac{\partial}{\partial w_{21}} ( w^tX^ty ), \ldots, \frac{\partial}{\partial w_{i1}} ( w^tX^ty ), \ldots, \frac{\partial}{\partial w_{21}}( w^tX^ty ), \right) $$ For $i_0=1,2,\ldots,n$; \begin{align} \frac{\partial}{\partial w_{i_0}} ( w^tX^ty ) =& \frac{\partial}{\partial w_{i_01}} \left( \sum_{i=1}^n\sum_{j=1}^m w_{i1}\cdot X_{ji}\cdot y_{1j} \right) \\ =& \sum_{i=1}^n\sum_{j=1}^m \frac{\partial}{\partial w_{i_01}} (w_{i1}\cdot X_{ji}\cdot y_{1j}) \\ =& \sum_{j=1}^m \frac{\partial}{\partial w_{i_01}} (w_{i_01}\cdot X_{ji_0}\cdot y_{1j}) \\ =& \sum_{j=1}^m X_{ji_0}\cdot y_{1j} \\ \end{align} Then $$ \nabla_w \big( w^tX^ty \big) = \left( \sum_{j=1}^m X_{j1}\cdot y_{1j}, \sum_{j=1}^m X_{j2}\cdot y_{1j}, \ldots, \sum_{j=1}^m X_{ji_0}\cdot y_{1j}, \ldots, \sum_{j=1}^m X_{jn}\cdot y_{1j}, \right) $$ With similar calculations, we get the gradient vector of the application $$ \mathbb{R}^{n\times 1}\ni w \mapsto w^tX^tXw= \sum_{1\leq k\leq m} w_{1k}^2\cdot X_{k k}^2 + 2\sum_{1\leq k<\ell \leq m} w_{1k}\cdot X_{\ell k}\cdot X_{k\ell}\cdot w_{1\ell} \in\mathbb{R}. $$

Bumbble Comm On 06 Jan 2018 - 1:03

Better use $w^tX^ty=(w^tX^ty)^t=y^tXw$

**Bumbble Comm** · Accepted Answer

Let

$$f (\mathrm x) := \rm x^\top A \, x$$

Hence,

$$f (\mathrm x + h \mathrm v) = (\mathrm x + h \mathrm v)^\top \mathrm A \, (\mathrm x + h \mathrm v) = f (\mathrm x) + h \, \mathrm v^\top \mathrm A \,\mathrm x + h \, \mathrm x^\top \mathrm A \,\mathrm v + h^2 \, \mathrm v^\top \mathrm A \,\mathrm v$$

Thus, the directional derivative of $f$ in the direction of $\rm v$ at $\rm x$ is

$$\lim_{h \to 0} \frac{f (\mathrm x + h \mathrm v) - f (\mathrm x)}{h} = \mathrm v^\top \mathrm A \,\mathrm x + \mathrm x^\top \mathrm A \,\mathrm v = \langle \mathrm v , \mathrm A \,\mathrm x \rangle + \langle \mathrm A^\top \mathrm x , \mathrm v \rangle = \langle \mathrm v , \color{blue}{\left(\mathrm A + \mathrm A^\top\right) \,\mathrm x} \rangle$$

Lastly, the gradient of $f$ with respect to $\rm x$ is

$$\nabla_{\mathrm x} \, f (\mathrm x) = \color{blue}{\left(\mathrm A + \mathrm A^\top\right) \,\mathrm x}$$

Gradients of functions involving matrices and vectors, e.g., $\nabla_{w} w^{t}X^{t}y$ and $\nabla_{w} w^t X^tXw$

There are 3 best solutions below

Related Questions in MATRICES

Related Questions in MULTIVARIABLE-CALCULUS

Related Questions in DERIVATIVES

Related Questions in MATRIX-CALCULUS

Trending Questions

Popular # Hahtags

Popular Questions