Finding a threshold between two accumulations

300 Views Asked by At

My question:

I have a list of numbers. This numbers are part of two accumulations, for each accumulation there is some unknown number of values around a specific average I don't know.

How can I find a threshold between those two accumulations, so I can say for every number if it's in accumulation $1$ or $2$?

Calculating the average of the two values forming the biggest jump would not work, it would be too unprecise.

Almost no numbers are the same, so it's originally not a bimodal distribution.

A computer should finally calculate this, so the way of doing this can be long.

The data is made by a human, pressing a button longly or shortly. The computer should detect if he means long or short, independently of the absolute length of the pressure.

Thanks for your advice.

2

There are 2 best solutions below

3
On

I already have an idea: Maybe I could "group" the numbers reducing their "resolution" and then calculate the threshold of the now bimodal distribution. But this "resolution" has to be right, if it's to small, the result would be too unprecise, if it's too high, the result could be totally wrong. I'm interested in your ideas :)

0
On

The general concept you need is called discriminant analysis, pioneered by R. A. Fisher about 80 years ago. You can read about Fisheer's original discriminant analysis in the Wikipedia article. But your particular problem is the simplest possible case of discriminating between only two groups, so something like my simplified procedure suggested below might work.

In order for perfect discrimination to be possible the maximum values for 'short' pulses must be less than the minimum values for 'long' ones. Human subjects may initially have a variety of definitions of 'short' and 'long', so without some instruction, discrimination may not be possible.

You could start each subject's session with a sequence of five or so responses prompted to be 'long' intermixed with five prompted to be 'short'. Then you could see if further familiarization with the procedure is necessary.

A vastly simplified version of Fisher's discriminant analysis would be to take a point halfway between the means of short and long presses $\bar X_s$ and $\bar X_\ell,$ respectively, and see if that completely separates short from long. Because short pulses may have a smaller standard deviation (SD) $S_s$ than long ones $S_\ell,$ it may work better to see if the value $\bar X_s + cd$ is a suitable value for separation, where $d = \bar X_\ell - \bar X_s$ and $c = \frac{S_x}{S_x + X_\ell}.$

However, you have historical data values, $Y_s$'s and $Y_\ell$'s, of short and long pulse lengths, respectively. So, the 'familiarization' period might be shortened by demonstrating an ideal short pulse with a tone of length $\bar Y_s$ and an ideal long one with a tone of length $\bar Y_\ell.$ Then give the subject the opportunity to show a couple of pulses of both lengths. If he/she succeeds, the computer might say "You've got it." And if not, "I can't quite tell the difference, let's try a few more." before launching into the above-mentioned session with five of each type.

Because you haven't said much about the setting in which the long and short pulses are used, my exact suggestions may not be feasible. But the ideas are proven and sound, so I'm sure you can think of a way to modify them to fit your particular needs.