Gradient of a matrix?

Question

Gradient of a matrix?

38k Views Asked by Bumbble Comm At 12 Apr 2026 - 8:48

I was following Stephen Boyd's convex optimisation course and came across the following slide:

Can somebody explain to me how the gradient was calculated for the quadratic and least-squares objective. Is there a general method to find the gradient of a matrix?

Original Q&A

There are 3 best solutions below

Bumbble Comm On 14 Jul 2017 - 7:02

It is common to define $$ \nabla ^2 f=\nabla\cdot\nabla f=\sum_{k=0}^N\partial_k^2 f = \Delta f $$ where $\Delta$ is called the Laplacian Operator. But here it isn't the case.

It seems that here we have $$ \nabla^2f=(\nabla\nabla^T)f=\begin{pmatrix}\partial_1\partial_1f & \partial_1\partial_2f & \cdots &\partial_1\partial_Nf\\\partial_2\partial_1f & \partial_2\partial_2f & \cdots&\partial_2\partial_Nf\\ \vdots & & \ddots & \vdots\\ \partial_N\partial_1f & \cdots & \cdots & \partial_N\partial_Nf \end{pmatrix}=Hess_f $$ where $Hess_f$ is called the Hessian matrix of $f$.

Edit:

It seems that $\nabla^2=\nabla\nabla^T$ is common in optimization like Surb wrote in the comment below.

Therefore it is the best to check where the operator is defined if it isn't obvious from the context. Some books has an explanation of the signs at the end.

Bumbble Comm On 14 Jul 2017 - 8:35

I simply would use the Gâteaux-Derivative. That derivative is the natural expansion of the 1D Derivative $$\frac{d}{dx}f(x) = \lim_{δx→0}f(x+δx)$$to higher dimensions. Since your function maps $f:ℝ^n→ℝ$ we need an arbitrary direction $δx∈ℝ^n$, and a small increment $ε>0$. Using that "$|_{ε=0}$ formulation the Gâteaux-Derivative for your function reads \begin{align*} d(\|Ax-b\|²;[x,δx]) = (\frac{d}{dε}\|A(x+εδx) - b\|²)\big|_{ε=0} \end{align*}

First it is \begin{align*} \frac{d}{dε}\|A(x+εδx) - b\|² =& \frac{d}{dε}[(A(x+εδx) - b, A(x+εδx) - b)] \\ =&\frac{d}{dε}[\{(Ax, Ax)+ (Ax,Aεδx) + (Ax, -b)\} \\ &+ \{(Aεδx, Ax) + (Aεδx, Aεδx) + (Aεδx, -b)\} \\ &+ \{(-b, Ax) + (-b, Aεδx) + (-b, -b)\} ] \\ =¹&\frac{d}{dε}[\{\|Ax\|²+ \|b\|²+ 2(Ax, -b)\} \\ &+ ε\{2(Ax,Aδx) + 2(-b, Aδx)\} \\ &+ ε²\|Aδx\|² ]\\ =& \{2(Ax,Aδx) + 2(-b, Aδx)\} + 2ε\|Aδx\|². \end{align*} ¹Sorting by powers of ε.

Setting ε=0, yields \begin{align*} (\frac{d}{dε}\|A(x+εδx) - b\|²)\big|_{ε=0} &= 2(Ax,Aδx) + 2(-b, Aδx) \\ &= 2(Ax-b, Aδx)= (2A^\top[Ax-b], δx). \end{align*}

Hence, the derivative is $2A^\top[Ax-b]$.

That is because, $∇f = (∂_{e_1}f, ∂_{e_2}f, …)^T$. So replacing δx with $e_i$ gives: $$∂_{e_i} = {2A^\top[Ax-b]}_i.$$

Higher derivatives can be calculated in the same way: \begin{align*} \frac{d}{dε}(2A^\top[A(x+δxε-b])\big|_{ε=0} &= (2A^\top Aδx)\big|_{ε=0} \\ &=2A^\top Aδx \end{align*} $⇒∇^2f(x) = 2A^\top A.$

**Bumbble Comm** · Accepted Answer

$f$ is an normal real valued function. If you want you can write it componentwise as

$$f(x) = {1\over 2}\sum_j\sum_k p_{jk}x_jx_k + \sum_j q_jx_j + r$$

Now the first double sum contains the $x_jx_k$ term twice if $j\ne k$ and if $j=k$ it becomes an $x_j^2$ term, so the derivate with respect to $x_j$ becomes:

$$f'_j(x) = \sum p_{jk}x_k + q_j$$

Which in matrix notation becomes

$$\nabla f(x) = Px + q$$

Gradient of a matrix?

There are 3 best solutions below

Related Questions in MATRIX-CALCULUS

Trending Questions

Popular # Hahtags

Popular Questions