Question
I posted this question on StackOverflow before but someone advised me to ask it there because it looks like a numerical algorithm problem.
Let us say I've got a list of values which have a common multiple greater than 1. For example, let us take the multiple to be 3 and form a collection of multiples of this value:
harmonicList = [3,6,6,3,3,9,27,3,15,18,9]
Now I add some noise:
harmonicList = [ v + (random() * 2 - 1 ) * 0.1 for v in harmonicList]
I'm searching for an algorithm that will return a number near 1.0 when the items in the list are near to being multiples of a common value but near 0.0 when the numbers are not near to being multiples -- such as, for example when the list is a collection of prime numbers.
Is there such a measure of "near multiplicity"?
Why I want to solve this problem
I'm currently trying to detect Chessboard in a screenshot using Hough Transform. Sometimes the case is ideal, and it works very well :

I would like to detect the cases where there is a lot of aberrations. Thus, my idea is to compute the intersections of the lines detected, and create a collection of length (only if the lines where horizontal or vertical). If the detection was good, I know that there will be a great "harmonicity" inside this collection I can then use that algorithm and a threshold.

You can look at this as a maximum likelihood estimation problem. The general idea is to find parameters which make the model fit the data in the sense that the data have greatest probability under the model for the specified parameters. Maximum likelihood is a very widely used approach for fitting statistical models.
In this case, your model is p(x) = sum of p_1(x, mean = k times a minus b, standard deviation = something small) over k from 1 to number of multiples, where a is the spacing between multiples and b is an offset, and p_1 is the Gaussian density exp(minus 1/2 times ((x minus mean)/(s.d.)) squared)/(s.d. times sqrt(2 pi)).
Compute log likelihood, i.e. log(p(x_1) times p(x_2) ... times p(x_n)) = log(p(x_1)) + ... + log(p(x_n)) where n = number of data which is typically easier to handle numerically than the equivalent likelihood. Note log likelihood is a function of a and b, so maximize log likelihood over a and b to find the "best" values.
EDIT: Here's an implementation of the formulas above. I'm using Maxima, a computer algebra system for this. See: http://maxima.sourceforge.net
Here is the Maxima program:
Here are the data for the example stated in the question:
These are some additional parameters.
k_maxis the maximum number of multiples to look at; that can probably be estimated by looking at the range of the data.sdis the standard deviation of the noise. It should probably be stated as a fraction ofa, the distance between multiples.Here is a contour plot of the log likelihood, given
harmonicListas the data.Here is the resulting contour plot:
Note that the greatest values (a little above zero) are in the vicinity of (a = 3, b = 0) and (a = 3, b = 3).
EDIT 2: Here is my response to the question about answering the question of whether the data could be fit by any multiplier.
You could approach that question like this. The plausibility of "data can be fit by some multiplier, offset pair" can be modeled as p(model | data) which is
p(model | data) = p(data | model) p(model) / (p(data | model) p(model) + p(data | no model) p(no model))
This shows what terms one needs to pull together to get a complete answer, namely p(data | model), p(model), p(data | no model), and p(no model).
About p(model) and p(no model), since it's possible that there is no such multiplier and offset, we have p(model) < 1. This might be estimated as the base rate of data sets which can be modeled by multiplier and offset. About p(data | no model), this is probably something like a uniform distribution.
As for p(data | model), note that p(data | model) = p(data, model) / p(model). Now p(data, model) = integral(p(data | a', b') p(a', b' | model) over all conceivable values a', b' of multiplier a and offset b). Note that p(data | a', b') is, by definition, the likelihood of model = (a', b') given the data. Remember that likelihood = exp(log likelihood).
Here p(a', b' | model) is one's prior over the parameter values, assuming that the data are accounted for by some multiplier, offset pair. Are all parameters equally probable a priori? If so p(a', b' | model) is a constant. My guess is that some values are more probably than others due to the constraints of the problem -- since the images are screenshots, there are probably limits on what the multiplier and offset can be, and probably some values are less probable given those limits.
EDIT 3: Here is the contour plot for
primeListas given in a comment below.