Statistically distinguishing two groups of data

34 Views Asked by At

Setup: I have two groups of data points; call then $A$ and $B$. Each data point in $A$ and in $B$ has an $(x,y)$ position on a set of axes. Further, the elements in each group tend to be grouped together on these axes - for example, lets say that all elements of $A$ lie in a rectangle bounded by $(x_A, y_A)$ and $(x_A',y_A')$, while all elements of $B$ lie in a similar rectangle bounded by $(x_B, y_B)$ and $(x_B',y_B')$. Being arbitrary, the two regions may overlap; further I made them rectangular for simplicity, but they may as easily be ovoid or some other shape. Lastly, each data point has some characteristic uncertainty, $\sigma$.

Question: Suppose I have another data point; we'll call it $z$, with it's own uncertainty $\sigma_z$. I know $z$ belongs to either $A$ or $B$ but I do not know which. How can I quantify my confidence that $z$ is a member of one group or the other?

1

There are 1 best solutions below

0
On

You can have $\sigma_z$ define some circle $C_z$ around $z$ with radius, dependent on $\sigma_z$ and weight of the circle as some decaying function of the distance from $z$.

Now compute the weights $w_A, w_B$ of the intersections of $C_z$ with regions of $A$ and $B$, respectively, and define a decision metric based on these two, picking the larger of the weights. Example confidence in $A$ could be $$ \frac{w_A}{w_A+w_B} $$