I'm dealing with sets of data that have distributions somewhat like this:
X X
X X X
X X
X X
X X
X X X
X X X
X X
X X X
X X
X X X X
i.e, an approximate normal distribution, with a large amount of noise in the lower values.
I want to programatically "filter out" the noise. As in, I'd like work out a value between the noise values and the rest of the data and remove values below that.
The noise doesn't tend to follow any distribution, and occasionally the frequency of noise values outweighs the frequency of desired values.
How best can I go about doing this?
A number of options: