Least Squares - Why closed-form or LSMR solutions different from true $\beta$?

39 Views Asked by At

I wanted to try using the LSMR algorithm so I generated some data and run least squares. How come the LSMR solution and the closed-form one are different from the true $\beta$ I have used to generate the data?

using Distributions: Normal
using IterativeSolvers: lsmr
# Settings
n = 500
k = 50
# Generate data
X = rand(Normal(0.0, 0.1), (n, k)) + rand(Normal(0.0, 0.2), (n, k))
β = randn(k)
y = (X * β) + rand(Normal(0.0, 0.1), n)
# Solutions
closed_form_solution = (X'X) \ (X'y)
lsmr_solution = lsmr(X, y)
# Check solutions
β ≈ closed_form_solution, β ≈ lsmr_solution  # returns false, false
1

There are 1 best solutions below

0
On

I port your program to scipy and the same phenomenon occurs. As hinted by d.k.o, the 2 solutions are sometimes close to each other, but sometimes they are slightly off. It is probably due to the implementation using some clever tricks to speed things up while sacraficing precision. I don't have any knowledge of this so I cannot provide a full answer.

import logging
from numpy import isclose, linalg, random
from scipy.sparse.linalg import lsmr

logging.basicConfig(level=logging.INFO)

# Settings
n = 500
k = 50

# Generate data
X = random.normal(0, 0.1, (n, k)) + random.normal(0, 0.2, (n, k))
beta = random.randn(k)
y = (X @ beta) + random.normal(0, 0.1, n)

# Solutions
closed_form_solution = linalg.solve(X.T @ X, X.T @ y)
lsmr_solution, *unused = lsmr(X, y)

# Check solutions
assert all(isclose(closed_form_solution, lsmr_solution)), \
    '\n%s\nis NOT CLOSE to\n%s' % (closed_form_solution, lsmr_solution)
logging.info('\n%s\nis CLOSE to\n%s' % (closed_form_solution, lsmr_solution))