Gradient of a matrix?

38k Views Asked by At

I was following Stephen Boyd's convex optimisation course and came across the following slide:

enter image description here

Can somebody explain to me how the gradient was calculated for the quadratic and least-squares objective. Is there a general method to find the gradient of a matrix?

3

There are 3 best solutions below

2
On BEST ANSWER

$f$ is an normal real valued function. If you want you can write it componentwise as

$$f(x) = {1\over 2}\sum_j\sum_k p_{jk}x_jx_k + \sum_j q_jx_j + r$$

Now the first double sum contains the $x_jx_k$ term twice if $j\ne k$ and if $j=k$ it becomes an $x_j^2$ term, so the derivate with respect to $x_j$ becomes:

$$f'_j(x) = \sum p_{jk}x_k + q_j$$

Which in matrix notation becomes

$$\nabla f(x) = Px + q$$

2
On

It is common to define $$ \nabla ^2 f=\nabla\cdot\nabla f=\sum_{k=0}^N\partial_k^2 f = \Delta f $$ where $\Delta$ is called the Laplacian Operator. But here it isn't the case.

It seems that here we have $$ \nabla^2f=(\nabla\nabla^T)f=\begin{pmatrix}\partial_1\partial_1f & \partial_1\partial_2f & \cdots &\partial_1\partial_Nf\\\partial_2\partial_1f & \partial_2\partial_2f & \cdots&\partial_2\partial_Nf\\ \vdots & & \ddots & \vdots\\ \partial_N\partial_1f & \cdots & \cdots & \partial_N\partial_Nf \end{pmatrix}=Hess_f $$ where $Hess_f$ is called the Hessian matrix of $f$.

Edit:

It seems that $\nabla^2=\nabla\nabla^T$ is common in optimization like Surb wrote in the comment below.

Therefore it is the best to check where the operator is defined if it isn't obvious from the context. Some books has an explanation of the signs at the end.

0
On

I simply would use the Gâteaux-Derivative. That derivative is the natural expansion of the 1D Derivative $$\frac{d}{dx}f(x) = \lim_{δx→0}f(x+δx)$$to higher dimensions. Since your function maps $f:ℝ^n→ℝ$ we need an arbitrary direction $δx∈ℝ^n$, and a small increment $ε>0$. Using that "$|_{ε=0}$ formulation the Gâteaux-Derivative for your function reads \begin{align*} d(\|Ax-b\|²;[x,δx]) = (\frac{d}{dε}\|A(x+εδx) - b\|²)\big|_{ε=0} \end{align*}

First it is \begin{align*} \frac{d}{dε}\|A(x+εδx) - b\|² =& \frac{d}{dε}[(A(x+εδx) - b, A(x+εδx) - b)] \\ =&\frac{d}{dε}[\{(Ax, Ax)+ (Ax,Aεδx) + (Ax, -b)\} \\ &+ \{(Aεδx, Ax) + (Aεδx, Aεδx) + (Aεδx, -b)\} \\ &+ \{(-b, Ax) + (-b, Aεδx) + (-b, -b)\} ] \\ =¹&\frac{d}{dε}[\{\|Ax\|²+ \|b\|²+ 2(Ax, -b)\} \\ &+ ε\{2(Ax,Aδx) + 2(-b, Aδx)\} \\ &+ ε²\|Aδx\|² ]\\ =& \{2(Ax,Aδx) + 2(-b, Aδx)\} + 2ε\|Aδx\|². \end{align*} ¹Sorting by powers of ε.

Setting ε=0, yields \begin{align*} (\frac{d}{dε}\|A(x+εδx) - b\|²)\big|_{ε=0} &= 2(Ax,Aδx) + 2(-b, Aδx) \\ &= 2(Ax-b, Aδx)= (2A^\top[Ax-b], δx). \end{align*}

Hence, the derivative is $2A^\top[Ax-b]$.

That is because, $∇f = (∂_{e_1}f, ∂_{e_2}f, …)^T$. So replacing δx with $e_i$ gives: $$∂_{e_i} = {2A^\top[Ax-b]}_i.$$

Higher derivatives can be calculated in the same way: \begin{align*} \frac{d}{dε}(2A^\top[A(x+δxε-b])\big|_{ε=0} &= (2A^\top Aδx)\big|_{ε=0} \\ &=2A^\top Aδx \end{align*} $⇒∇^2f(x) = 2A^\top A.$