I am working with the inverses to a set of large sparse matrices (in Matlab). A key indicator for my application is the number of non-zero entries in each row, and I recently discovered that I was getting false results due to numerical problems-- i.e. a value on the order of $10^{-30}$ is still non-zero.
To solve this problem, I decided to screen out numerical artifacts by removing entries in the inverted matrix below a threshold value. I figured I'd detect the threshold by counting up the number of matrix entries in each decade. Where T is the inverse (sparse) matrix:
%% matlab
A = abs(T);
f = full(floor(log10(A(find(A)))));
u = unique(f);
bins = histc(f,u);
Now, when I look at bins I see that my matrices all indeed have bimodal distributions, with the major peak varying in the vicinity of -5 (i.e. $10^{-5}$) and another peak around -35 (i.e. $10^{-35}$).
But, picking a threshold is problematic. With one matrix there's about a 12-decade gap between the peaks (i.e. no entries at all where $10^{-28} < |a| < 10^{-16}$); whereas with another there is no decade without any entries (though there's still two humps in the distribution).
How should I algorithmically choose a threshold? Should I choose the interior minimum? pick a flat value? is this a good approach? Frankly, even picking an interior minimum is annoying and problem-prone, because of potentially noisy data.
Any feedback?
You can test this in Matlab using the
epscommand.epsis the difference between one and the next largest number Matlab can store. For most systems,eps$=2^{-52}\approx2.22\times10^{-16}$. This suggest that anything below about $10^{-15}$ should be treated as zero. Per horchler's suggestion, if you know an approximate magnitude,N, of your non-zero solutions, you can useeps(N)to get a more relevant number, you could then set, for example, anything less than10*eps(N)to zero.If you have lots of small roots, you will have to think about using a different method, or using some fancy techniques to avoid bad floating point errors.