Compute a measure of "near multiplicity" from a list of noisy values

Question

Compute a measure of "near multiplicity" from a list of noisy values

107 Views Asked by Bumbble Comm At 02 Apr 2026 - 9:24

Question

I posted this question on StackOverflow before but someone advised me to ask it there because it looks like a numerical algorithm problem.

Let us say I've got a list of values which have a common multiple greater than 1. For example, let us take the multiple to be 3 and form a collection of multiples of this value:

harmonicList = [3,6,6,3,3,9,27,3,15,18,9]

Now I add some noise:

harmonicList = [ v + (random() * 2 - 1 ) * 0.1 for v in harmonicList]

I'm searching for an algorithm that will return a number near 1.0 when the items in the list are near to being multiples of a common value but near 0.0 when the numbers are not near to being multiples -- such as, for example when the list is a collection of prime numbers.

Is there such a measure of "near multiplicity"?

Why I want to solve this problem

I'm currently trying to detect Chessboard in a screenshot using Hough Transform. Sometimes the case is ideal, and it works very well :

But sometimes not :

I would like to detect the cases where there is a lot of aberrations. Thus, my idea is to compute the intersections of the lines detected, and create a collection of length (only if the lines where horizontal or vertical). If the detection was good, I know that there will be a great "harmonicity" inside this collection I can then use that algorithm and a threshold.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

You can look at this as a maximum likelihood estimation problem. The general idea is to find parameters which make the model fit the data in the sense that the data have greatest probability under the model for the specified parameters. Maximum likelihood is a very widely used approach for fitting statistical models.

In this case, your model is p(x) = sum of p_1(x, mean = k times a minus b, standard deviation = something small) over k from 1 to number of multiples, where a is the spacing between multiples and b is an offset, and p_1 is the Gaussian density exp(minus 1/2 times ((x minus mean)/(s.d.)) squared)/(s.d. times sqrt(2 pi)).

Compute log likelihood, i.e. log(p(x_1) times p(x_2) ... times p(x_n)) = log(p(x_1)) + ... + log(p(x_n)) where n = number of data which is typically easier to handle numerically than the equivalent likelihood. Note log likelihood is a function of a and b, so maximize log likelihood over a and b to find the "best" values.

EDIT: Here's an implementation of the formulas above. I'm using Maxima, a computer algebra system for this. See: http://maxima.sourceforge.net

Here is the Maxima program:

p_1(x, mean, sd) := exp(-(1/2)*((x - mean)/sd)^2)/(sd*sqrt(2*%pi));

p(x) := sum (p_1(x, k*a + b, sd), k, 0, k_max);

log_likelihood(xx, a, b) := lsum (log (p(xx1)), xx1, xx);

Here are the data for the example stated in the question:

harmonicList : [3,6,6,3,3,9,27,3,15,18,9];

harmonicList : map (lambda ([v], v + (random(1.0) * 2 - 1 ) * 0.1), harmonicList);

These are some additional parameters. k_max is the maximum number of multiples to look at; that can probably be estimated by looking at the range of the data. sd is the standard deviation of the noise. It should probably be stated as a fraction of a, the distance between multiples.

k_max : 10;
sd : 0.2;

Here is a contour plot of the log likelihood, given harmonicList as the data.

contour_plot (lambda ([a, b], log_likelihood (harmonicList, a, b)),
              [a, 1, 5], [b, 0, 3],
              [gnuplot_preamble, "set cntrparam levels 12"],
              [grid, 120, 120]);

Here is the resulting contour plot:

Note that the greatest values (a little above zero) are in the vicinity of (a = 3, b = 0) and (a = 3, b = 3).

EDIT 2: Here is my response to the question about answering the question of whether the data could be fit by any multiplier.

You could approach that question like this. The plausibility of "data can be fit by some multiplier, offset pair" can be modeled as p(model | data) which is

p(model | data) = p(data | model) p(model) / (p(data | model) p(model) + p(data | no model) p(no model))

This shows what terms one needs to pull together to get a complete answer, namely p(data | model), p(model), p(data | no model), and p(no model).

About p(model) and p(no model), since it's possible that there is no such multiplier and offset, we have p(model) < 1. This might be estimated as the base rate of data sets which can be modeled by multiplier and offset. About p(data | no model), this is probably something like a uniform distribution.

As for p(data | model), note that p(data | model) = p(data, model) / p(model). Now p(data, model) = integral(p(data | a', b') p(a', b' | model) over all conceivable values a', b' of multiplier a and offset b). Note that p(data | a', b') is, by definition, the likelihood of model = (a', b') given the data. Remember that likelihood = exp(log likelihood).

Here p(a', b' | model) is one's prior over the parameter values, assuming that the data are accounted for by some multiplier, offset pair. Are all parameters equally probable a priori? If so p(a', b' | model) is a constant. My guess is that some values are more probably than others due to the constraints of the problem -- since the images are screenshots, there are probably limits on what the multiplier and offset can be, and probably some values are less probable given those limits.

EDIT 3: Here is the contour plot for primeList as given in a comment below.

Compute a measure of "near multiplicity" from a list of noisy values

Question

Why I want to solve this problem

There are 1 best solutions below

Related Questions in STATISTICS

Trending Questions

Popular # Hahtags

Popular Questions