Gradient of matrix $b^{T}x$

1.2k Views Asked by At

I'm trying to understand my exam solution from the lecturer, but I got confused over one small thing in the solution.


Problem: Consider $f(x)=\frac{1}{2}x^{T}Ax - b^{T}x$

Where $A=\begin{bmatrix}2&1\\1&4\end{bmatrix}$ , $b=\begin{bmatrix}0\\7\end{bmatrix}$ and $x=\begin{bmatrix}x\\y\end{bmatrix}$

Caclulate $\nabla f(x)$.

Solution: From the theory we know that if $A$ is symmetric, we get

$\nabla f(x)= Ax - b^{T}$

$\nabla f(x)=\begin{bmatrix}2x+y\\x+4y-7\end{bmatrix}$


Is it a typo on the answer, where $Ax-b^{T}$ should've been written as $Ax-b$?

If it should've been written as $b$, then I understand why we have $-7$ in the 2nd row.

Thank you

2

There are 2 best solutions below

2
On BEST ANSWER

There is no typo, you need to transpose $b$ if you want to compute its scalar product with $x$ (quick check, treat this as a normal matrix-matrix multiplication: the dimensions would not agree without transposition!).

And yes, $\nabla (b^Tx) = b$.

In this case, since there are only two components, we can compute everything by hand. You can then understand how this generalizes, once you figure out what happens when applying the gradient.

We have:

$$b^Tx = \begin{bmatrix} 0 & 7\end{bmatrix} \cdot \begin{bmatrix} x \\ y \end{bmatrix} = 0x + 7y$$

and since

$$\nabla = \begin{bmatrix} \frac{\partial}{\partial x} \\ \frac{\partial}{\partial y} \end{bmatrix},$$

you can compute

$$\nabla (b^Tx) = \begin{bmatrix} \frac{\partial}{\partial x} (0x + 7y) \\ \frac{\partial}{\partial y} (0x + 7y) \end{bmatrix} = \begin{bmatrix} 0 \\ 7 \end{bmatrix}.$$

2
On

Call a column vector $\vec{x}$. Indicate $x^1$ for the 1st component, $x^2$ for the second, and so on. Square of a component will be $(x^1)^2$ Will be the first component (usually x coponent) squared.

$x^i$ will be used to represent the i_th component or the whole vector.

The transpose of vector will be represented with a lowered index $x_i$ is $\vec{x}^T$

Inner products can only be performed between a column vector and a row vector, so one vector is addressed by lower indices, the other by vectors. Matching indicies imply a sum.

So $b^Tx=b_ix^i=b_xx_x+b_yx_y+b_zx_z$

$\nabla(b^Tx)=\frac{\partial}{\partial x^j}(b_ix^i)$

By the product rule:

$\frac{\partial}{\partial x^j}(b_ix^i)=\frac{\partial b_i}{\partial x_j}x^i+b_i\frac{\partial x^i}{\partial x^j}$

But $\vec{b}$ isa constant, so the derivatives of any component is 0. $\frac{\partial x_j}{\partial x_i}$ is 0 unless the indices match, in which case the value is 1.

So the other term gives us $b_j$, establishing the expected result.