I've written an algorithm designed to solve the following problem:
An m x n array, where m and n are known, is filled with random picks from a Gaussian distribution, with known values for $\mu$ and $\sigma$.
We now iterate through all n columns of that array simultaneously, until we encounter the first value exceeding some threshold. When we find that value, we then search in all remaining n-1 columns until either 1a) we encounter a second value exceeding the threshold, or 1b) we exceed some maximum number of iterations, max_iter, from the first found value.
In the case of 1a), we will continue iterating in the remaining n-2 columns until either 2a) we encounter a third value exceeding the threshold or 2b) we exceed some maximum number of iterations, max_iter, from the first found value. In the case of 1b), we continue iterating until we find a new first value.
In the case of 2a), we add one to a counter, then jump forward some number $p$ rows and start all over again. In the case of 2b), we start all over, picking the second-found value to be the new first-found value.
In the end, we want a count of how many times we found three such values, within a max_iter window, all exceeding some threshold, such that all three values are contributed by a unique column. Every time such a value is found, we jump forward $p$ rows and start all over again. Note that it is possible that 2, 3, or more values may be contributed by a single row. Only three values (which are chosen is not important at all) within a window of $p$, starting from the first-found value, may contribute to the adding of one to the counter.
My Problem:
I've written this algorithm, however I have no way to verify it. Typically, I'd run something like this on a small test array. In practice, the array will have tens of columns and hundreds of thousands or more of rows. I'm struggling to verify this by hand using smaller examples.
More worrisome, the numbers the algorithm produces are also surprising. With the parameters I'm using (see below, if at all useful), I'm getting just as many "hits" with a $16$ sigma requirement as from a $1$ sigma requirement (i.e. the threshold is that the values must exceed $16$, $1$ sigma, respectively). It could easily be that my expectations don't match reality, it's more of a "gut feeling", however I also cannot put full faith in this without more verification.
Is there some way (i.e. a formula, etc) I could use to get a sense for what sort of numbers I'd expect to see here? Perhaps something else I could do?
My parameters: $90000000$ rows $\times 20$ columns.
$\mu = 0$, $\sigma = 0.0089$