What is the correct way to simplify $(Ax-b)^T(Ax-b)$

5.4k Views Asked by At

Let us simplify $(Ax-b)^T(Ax-b)$ step by step, $A \in R^{m \times n}$ matrix, $x \in R^{n \times 1}$, $b \in R^{m \times 1}$

$(Ax-b)^T(Ax-b)$

$= (x^TA^T - b^T)(Ax - b)$ (property of transpose)

$ = x^TA^TAx - x^TA^Tb - b^TAx + b^Tb$ (distribution property)

Now what is the correct way to combine the inner terms

$-x^TA^Tb - b^TAx $?

or equivalently

$x^TA^Tb + b^TAx $?

I find myself constantly asking the following questions

  • Is it legal or illegal to transpose any of the two terms? Why?
  • Which term ($x^TA^Tb$ or $b^TAx $) should you take the transpose of? Why?

I would really appreciate if someone can resolve this for me

2

There are 2 best solutions below

1
On BEST ANSWER

Since both $x^TA^Tb$ and $b^TAx$ are scalars, you can transpose any of them.

You can transpose $x^TA^Tb$ and get $$x^TA^Tb + b^TAx = b^TAx + b^TAx = 2 b^TAx$$ or you can transpose $b^TAx$ and get $$x^TA^Tb + b^TAx = x^TA^Tb + x^TA^Tb = 2 x^TA^Tb$$ and since both results are scalar, they are equal to their respective transposes, we get: $$ 2 x^TA^Tb = 2 (x^TA^Tb)^T = 2 b^TAx.$$

So to summarize, you can take the transposes of either one of them, and should probably take the transpose of $x^TA^Tb$ since you will reach the simpler result $2b^TAx$ faster (it can be considered simpler since there is no transpose of $A$ in it).

0
On

The matrix $A$ can have linearly independent columns only if $m\ge n$. The typical instance of this problem is where $m\gg n = \operatorname{rank} A$. In that case, the small matrix $A^TA$ is invertiable since it's an $n\times n$ matrix of rank $n$. The matrix $B = A (A^T A)^{-1} A^T$ is an $m\times m$ matrix of small rank, $n$. (If $A$ were a square matrix, then $B$ would be the identity matrix.)

Exercise: Prove that $b\mapsto Bb$ is the orthogonal projection of $b$ onto the column space of $A$.

And now on to simplifying: \begin{align} & (Ax-b)^T(Ax-b) \\[10pt] = {} & (Ax - Bb + Bb - b)^T(Ax - Bb + Bb - b) \\[10pt] = {} & (Ax-Bb)^T(Ax-Bb) + \underbrace{(Ax-Bb)^T(Bb-b) + (Bb-b)^T(Ax-Bb)}_\text{This is $0$.} + (Bb-b)^T(Bb-b) \\[10pt] = {} & (Ax-Bb)^T(Ax-Bb) + (Bb-b)^T(Bb-b). \end{align}

In this last form, notice that $x$ can be so chosen that $Ax= Bb$, since $Bb$ is in the column space of $A$. Indeed, that happens precisely when $x = (A^TA)^{-1}A^T b$. That is the one value of $x$ that makes the first term $0$, and thus it is the one value of $x$ that mimimizes the entire quantity. Hence we see that the second term in the last line is actually the smallest possible value of the whole quantity as $x$ varies. (If $A$ has linearly dependent columns and more rows than columns, then there is more than one value of $x$ that makes the first term in the last line vanish.)

This may be considered species of completing the square.