balls in bins type question with specific practical application

25 Views Asked by Bumbble Comm At 27 Mar 2026 - 2:57

I have the problem below, which I'm hoping I've defined clearly enough:-

N events are observed, which may be arranged such that they fall in the range 0 < x < 1, where x represents the timing of a given event. Although the N events correspond to a continuous distribution (of event timings), they are in reality sampled so that the distribution is discrete(?) The values of x are then chopped into 20 boxes of 0.05 length each, such that:

0.00 < box01 <= 0.05, 0.05 < box02 <= 0.10, ... 0.95 < box20 <= 1.00

In this way, there is a discrete number per box, representing the number of observations timed to fit in that part of the event timing 0 < x < 1. Rather like a number of balls dropping into a box - but that analogy hasn't helped me when googling!

I want a statistical threshold test to a given significance level for m > s, where m is the number of (discrete) events in a given box (box1 to box20), given N events distributed through the 20 boxes, and where s is the number of events in a box that m must exceed for a given significance level (e.g., 0.01, 0.001, 0.0001) in order to be considered non-random.

If it helps, for most purposes I expect the number of events, N, to be N = 163; although there will be cases where this number will be lower.

I am assuming that, if random, the N events would be uniformly distributed across the 20 boxes. Ie, any given event having an equal chance of falling into any given box; no box being favoured over any other.

The specifics of this practical problem are:-

There are N observational 'snapshots' of the Kepler telescope data where the continuous flux from a target star falls below a cut-off point. Each snapshot records the flux, and an event is here defined if the flux falls below a threshold. The N events return a discrete time corresponding to the snapshot timing of the event - rather like the frame number on a roll of film. The timing is itself converted to an orbital phase correponding to a test period where phase = mod(timeofevent,testperiod)/testperiod: 0 < phase < 1. The event could be random, or could be caused by a transit of a planet crossing the star causing a dip in the flux.

I then divide the phase-time distribution of events into the twenty boxes as above.

Each given event is therefore a simple count of 0 in each box where it falls outside of and 1 for the given box within which it falls. For example, if the timing were 0.47, then it would add 1 to the count of events in the box 0.45 < box <= 0.50. And so on for the remaining N-1 events. Each box will therefore return an integer number corresponding to the count of the events that fall in that range.

I assume that if there is no (detectable) planet present, the events would be uniformly distributed. (I cannot independently test this, but this seems a reasonable assumption.)

However, if a planet is present with a given orbital period then we see a higher than normal number of events in the box corresponding to the moment of transit. (The exact number of events in a given box being dependent upon the depth of the transit: If the period under test is exact enough, just a single box should contain all the events corresponding to the planetary transit.) Similarly, we see a reduction in the number of events in each other box.

I need a way to quickly test each box for each period to a significance level (0.01, 0.001 or 0.0001 would be good examples): i.e., that given N events distributed among the 20 boxes, I can say if I observe s or more events in any one box that it is significant to said level and so (to that level) unlikely to be random chance and should be flagged to return to test to see if a transit really is observable at that period.

Unfortunately, I don't know the type of probability distribution I am dealing with, or how to calculate the value of s for significance levels 0.01, 0.001, or 0.0001 that m must exceed in order to be flagged to return.

If you need more information or clarification, please let me know.

Many thanks in advance if you can help: I'm not a mathematician and am used to working with spreadsheet functions, so if the answer could be framed in such a way that would be a bonus.

Original Q&A

balls in bins type question with specific practical application

Related Questions in STATISTICS

Related Questions in HYPOTHESIS-TESTING

Trending Questions

Popular # Hahtags

Popular Questions