Matrix Calculus in Least-Square method

12.4k Views Asked by At

In the proof of matrix solution of Least Square Method, I see some matrix calculus, which I have no clue. Can anyone explain to me or recommend me a good link to study this sort of matrix calculus?

In Least-Square method, we want to find such a vector $x$ such that $||Ax-b||$ is minimized.

Assume $r=Ax-b$

$\Rightarrow\|r\|^2=x^TA^TAx-2b^TAx+b^Tb$

$\Rightarrow \nabla_x \|r\|^2=2A^TAx-2A^Tb$

In the end we set the gradient to zero and find the minimized solution. I understand the whole idea, but I just don't know how exactly we did matrix calculus here, or say I don't know how to do the matrix calculus here. For example, can anyone tell me how we got those transpose in $\|r\|^2$(By what rule?) and how we got the gradient?(how do we take the gradient exactly in matrix format)?

I'll really appreciate if you can help me out. Thanks!

2

There are 2 best solutions below

4
On

Well the first step is the definition of $||r||^2$. This is easy \begin{align} ||r||^2 & = \langle r,r \rangle = r^T r \\ &= (Ax-b)^T(Ax-b) = (x^TA^T-b^T)(Ax-b) \\ &= x^TA^TAx -x^TA^Tb-b^TAx +b^Tb \\ &= x^TA^TAx -(b^TAx)^T -b^TAx +b^Tb \end{align} Since $(b^TAx)$ is a scalar it holds $(b^TAx)⁼ (b^TAx)^T$ Thus \begin{align} ||r||^2 & = x^TA^TAx -2b^TAx +b^Tb \end{align}

And for the derivatives, you could take a look here. Another approach would be to write out the matrix-vector expressions in sumation form and calculate the derivative, then no matrices are involved.

2
On

Below are the matrix/vector derivative rules, you will need.

$$\dfrac{d(x^TBx)}{d x_i} = \dfrac{d}{dx_i}\left(\sum_{j,k} x_j B_{jk}x_k\right) = \sum_{j} x_j B_{ji} + \sum_{k}B_{ik} x_k = \sum_{k}\left(B^T + B\right)_{ik}x_k$$ Hence, we have $$\dfrac{d(x^TBx)}{d x} = (B^T+B)x$$ Similarly, we have $$\dfrac{d(c^Tx)}{d x_i} = \dfrac{d}{d x_i}\left(\sum_k c_k x_k\right) = c_i$$ Hence, we have $$\dfrac{d(c^Tx)}{dx} = c$$ Now you should be able to get what you want.