How to demonstrate the effectiveness of an algorithm?

45 Views Asked by At

I've written an algorithm to solve the following problem:

Goal:

We're given an $m\times n$ matrix of random values picked from a Gaussian distribution with $\mu = 0$ and a known value for $\sigma$.

Iterate simultaneously through all $n$ columns of the matrix until you encounter the first value greater than or equal to some threshold value $t$. Continue iterating in the remaining $n-1$ columns until you've found a second value exceeding threshold $t$ OR you reach some maximum number of rows $r_{max}$ from the first found value.

If you find a second value, continue iterating in the remaining $n-2$ columns until you find a third value exceeding that threshold value $t$ OR you reach the maximum number of rows $r_{max}$ from the first found value. If you don't, continue iterating from this point until you again find some value exceeding threshold $t$, and repeat this process.

If you find a third value, add one to a counter. If you don't find a third value, start over with the new first value picked to be the second found value.

Note that one row can contribute more than one value. So, if the first row has three values exceeding the threshold, we add one to the counter and continue iterating from there until we again find a value exceeding threshold $t$, and we repeat the process. Or, the first row might contribute $2$ values, and the second row $1$. Or one row might contribute $8$ values. That still counts. Each value has to be contributed by a unique column, however.

My Problem:

I've written an algorithm for this (which can be found here, if it should prove useful in answering my question). My problem is, I now need to verify that it works as expected.

Normally, I'd test something like this by producing a small test case, hand-counting the instances, and comparing that to the algorithm's results. With this, however, in order to produce a test case large enough to be useful, there need be so many values that it is impractical to hand-count.

I've tried feeding the algorithm a matrix with some strawman values for $t$ and $r_{max}$, and comparing the count it produces to a count of total the number of values in the matrix exceeding the $t$. The count seemed reasonable to me, but I can't be sure and it isn't sufficient to go off a 'hunch'.


Is there an effective way to test this algorithm? Perhaps the fact that the values are picked from a Gaussian may allow for such a test (I know a lot mpressive things can be done with statistics)? I know programs can be proved using tools like Coq, but my knowledge of that sort of thing is entirely superficial and I'm not sure where I'd begin (and, in any case, that might be overdoing it?).