Idea how to solve linear equation for linearly dependent columns

73 Views Asked by At

Consider a matrix $M$ with two linear dependent columns like $\pmatrix{2&2\\2&2\\2&2}$. I have an idea how to find the least squares solution of $Mx = b$: Is it a good approach to add a very small number $\varepsilon$ to one entry of $M$ and $b$, e.g. $\tilde M=\pmatrix{2&2\\2&2+\varepsilon\\2&2}, \tilde b=\pmatrix{4\\3+\varepsilon\\5}$, such that the columns are not linearly dependent anymore and then solve by $$x=(\tilde M'\tilde M)^{-1}\tilde M'\tilde b?$$ Please take reference to my approach, not to others like SVD.

1

There are 1 best solutions below

0
On BEST ANSWER

A Specific Counterexample

No, unfortunately, it's not a good approach. That said, for this very specific $b$, this very specific $M$, and your particular choice of perturbation $\tilde{M}$ (as a function of $\varepsilon$), this calculation will work. If we change $b$, we will find that it does not.

Let $b = (6, 0, 0)^\top$. Then, $$\tilde{M}^\top b = \pmatrix{2&2&2\\2&2+\varepsilon&2}\pmatrix{6\\0\\0} = \pmatrix{12 \\ 12}.$$ We have, $$\tilde{M}^\top \tilde{M} = \pmatrix{2&2&2\\2&2+\varepsilon&2}\pmatrix{2&2\\2&2+\varepsilon\\2&2} = \pmatrix{12&12+2\varepsilon\\12+2\varepsilon&12+4\varepsilon+\varepsilon^2}.$$

The determinant is: $$(\varepsilon^2 + 4\varepsilon + 12) \cdot 12 - (2\varepsilon + 12)^2 = 8\varepsilon^2 > 0,$$ so the inverse is, $$(\tilde{M}^\top \tilde{M})^{-1} = \frac{1}{8\varepsilon^2}\pmatrix{12+4\varepsilon+\varepsilon^2&-12-2\varepsilon\\-12-2\varepsilon&12}.$$ Thus, $$(\tilde{M}^\top \tilde{M})^{-1}\tilde{M}^\top b = \frac{1}{8\varepsilon^2}\pmatrix{12+4\varepsilon+\varepsilon^2&-12-2\varepsilon\\-12-2\varepsilon&12}\pmatrix{12\\12} = \frac{1}{8\varepsilon^2}\pmatrix{24\varepsilon+12\varepsilon^2\\-24\varepsilon}.$$

This is not even convergent! That is, choosing different values of $\varepsilon$ will give you wildly different least squares solutions.

If you are interested less in the $x$, and more in the $Mx$ (i.e. the orthogonal projection onto the columnspace of $M$), then I have some good news, but mostly bad news. The good news is, our non-convergence problem goes away. In fact, our projection doesn't depend on $\varepsilon$ at all:

$$\tilde{M}(\tilde{M}^\top \tilde{M})^{-1} \tilde{M} x = \frac{1}{8\varepsilon^2}\pmatrix{2&2\\2&2+\varepsilon\\2&2}\pmatrix{24\varepsilon+12\varepsilon^2\\-24\varepsilon} = \pmatrix{3\\0\\3}.$$

The (really) bad news is that this is not the projection onto the columnspace of $M$. Indeed, it's not difficult to see that this is not even in the columnspace of $M$. (The correct projection is $(2, 2, 2)^\top$, FYI.)

General Discussion

You might notice that I failed to perturb $b$, as you did in the question. It wouldn't have helped. For a fixed $\varepsilon$, the linear map $c \mapsto (\tilde{M}^\top \tilde{M})^{-1}\tilde{M}^\top c$, like all linear maps with finite-dimensional domain, is continuous, and in fact, Lipschitz. That is, there is some $K$ such that $$\|(\tilde{M}^\top \tilde{M})^{-1}\tilde{M}^\top \tilde{b} - (\tilde{M}^\top \tilde{M})^{-1}\tilde{M}^\top b\| \le K\|\tilde{b} - b\|.$$ So, if we choose $\tilde{b}$ close to $b$, then the resulting LSS will be commensurately close to one another. If we make our perturbation $\tilde{b}$ independent of $\varepsilon$, then we will inevitably end up with another divergent function of $\varepsilon$, no matter how close $\tilde{b}$ is to $b$. If we tie this perturbation to $\varepsilon$, then it's possible that the function will converge, but its result will be erroneous.

So, intuitively why does this not work? At the heart of least squares is finding orthogonal projections onto columnspaces of matrices. We usually care less about the LSS $x$ than we care about $Mx$, the vector in the columnspace of $M$ that is closest to $b$. When $M$ does not have linearly independent columns, $x$ will not be unique, but $Mx$ is always unique.

If we perturb a matrix $M$ with linearly dependent columns, so that its columns become linearly independent, we are adding dimensions to the columnspace of $M$. In our example, the columnspace of $M$ changed from a line $$L = \operatorname{span}\{(2, 2, 2)^\top\} = \operatorname{span}\{(1, 1, 1)^\top\},$$ to a plane: $$P = \operatorname{span}\{(2, 2, 2)^\top, (2, 2+\varepsilon, 2)^\top\} = \operatorname{span}\{(1, 0, 1)^\top, (0, 1, 0)^\top\}.$$ It's a coincidental feature of your particular perturbation that it always generates this same plane. Changing the perturbation will change the plane, and if you don't stick to linear perturbations in one dependent column, then you might find the plane changes depending on $\varepsilon$.

By adding a dimension or more to $\operatorname{colspace} M$, you radically change subspace being projected onto. The size of $\varepsilon$ is irrelevant: we are projecting onto a new subspace (containing the old one, in this case). In this sense, our small $\varepsilon$ does not guarantee that we are in a "nearby" case, with a similar solution. Projecting onto this larger subspace will generally create far better approximations of $b$.

When $b = (3, 4, 5)^\top$, this just so happened not to be the case. The fact of the matter is that the projection onto $P$ and the projection onto $L$ both turned out to be $(4, 4, 4)$. This is what made this particular $b$ make the method apparently work. This was not the case for $b = (6, 0, 0)^\top$, which made it a perfect candidate for a counterexample.

The divergence of $x$ as $\varepsilon \to 0$ is not uncommon either. Recall that $Mx$ is a linear combination of columns of $M$: $x_1$ times the first column, and $x_2$ times the second. Because our projection $p = (3, 0, 3)$ of $b = (6, 0, 0)^\top$ onto $P$ did not lie on $L$, it means that the projection was not simply in the span of $(2, 2, 2)^\top$, and needed some $(2, 2 + \varepsilon, 2)^\top$. But, it takes large multiples of the latter to get away from the line $L$, and the smaller the $\varepsilon$, the larger the multiple required. For similar reasons, divergence of the $x$ terms is to be expected for a counterexample.

On the other hand, convergence of $\tilde{M}x$ is to be expected, provided the columnspaces of $\tilde{M}x$ "converge" (in a sense that I'm not going to elaborate on). In our case, the columnspaces were $P$, regardless of $\varepsilon$. That's why our $\tilde{M}x$ did not depend on $\varepsilon$. But, of course, the result was the projection onto $P$, not $L$, so our convergent result was erroneous.

Hope that helps!