I have a problem that I think I can solve using a least square approach. In particular, I have a discrete time system (or map if you prefer), with state variable $x$. I want to estimate some parameter $\theta \in \mathbb{R}^m$ starting from a set of observations $x \in \mathbb{R}^n$. The least square set up (obtained after some algebra) is the following:
$$ Y(x) = U(x) \theta,$$
where $Y \in \mathbb{R}^k$ and $U(x) \in \mathbb{R}^{k \times m}$.
I noticed that I'm unable to solve this problem due to numerical problem, i.e.
theta = U \ Y
in Matlab returns some NaNs.
Anyway, if I add a drizzle of noise to $x$, then I'm able to solve the problem with good results in terms of fitting.
Ok, I was a little bit frustrated and I saw that the rank of $U(x)$ was not full, so I thought: "If I add some noise, then I'll make rows/columns of $U(x)$ more linearly independent"... and this worked!!!
Is there some theoretical background that can justify this lucky approach?
I'll be glad to provide more details if needed in order to get a good answer!
Generically, when you add a random $\delta x$ to $x$, $U(x+\delta x)$ is going to be non-singular. This will have the effect of regularizing the solution to your least squares problem, but you won't have any control over the bias in the resulting solution.
You might consider using conventional Tikhonov regularization on this problem, along with techniques such as generalized cross validation to select the regularization parameter.