Count of matching lines in compared patterns - distribution and significance?

22 Views Asked by At

Imagine a number of lines of varying thickness, perpendicular to, and distributed over, a fixed distance, d (in many ways similar to those of a standard barcode).

Imagine that one such pattern (lets call it Q), is compared side by side, and one by one, to a series of similar, randomly generated, patterns.

Before each comparison between Q and the pattern to be compared, the patterns are left-aligned, not by the leftmost line, but such that the leftmost part (the "start") of the interval is aligned, within some tolerance, t.

Each line has two edges (the borders that define the thickness of the line, similar to the two borders between the black fill of a barcode line and the white background).

Suppose we now count the number of matching edges between the two patterns. We do not distinguish between edges defining the transition from background to line or from line to background (i.e. from white to black or black to white, in the barcode example). All are regarded as edges and will be scored as matching if they are within our tolerance distance t, of each other.

1) Suppose the width (w) of the lines is normally distributed and the average w is much less than d (in the order of 500 or so)

2) Suppose the probability of there being a line (or rather the center of a line) at a specific position along d is equal for all positions along d.

3) Suppose the sum of the widths of the lines in a pattern on average takes up 1/3 of d.

4) Suppose t is smaller than the average w (in the order 10 or so)

5) Lines can overlap, in which case the "nested" edges will disappear to us and only the non-nested edges be available for matching.

And now to my questions:

A) How will the number of scored matching edges for each of the comparisons, be distributed?

I started out thinking it would be binomial and maybe approximate a Poisson, but the dependence on the normally distributed w and assumption 5 makes it more complicated (i think).

B) Do you have a suggestion for a way of testing if a certain score is significantly different (higher) than the scores of the other comparisons?

I really hope someone smarter than me can give me some guidance on this, since it has been puzzling me for some time now :-). Please let me know if anything needs to be clarified.

Best Regards

Mads