Given two overlapping datasets (think Venn Diagram), what is the point/plane called that separates the middle from either side?

973 Views Asked by At

Terminology

Or, if you prefer a typed out version:

Given the following two datasets:

[Category A]: 1,3,4,7,8,9,13 ... 28,30
[Category B]: 29,32,33,37 ... 61,62,63

Plotted like so:

 A-AA--AAA---AA--A--A-AAA-B-ABA-BB-A-B-B-B-BB-B--BBB-B--BBBB-BBB
0---------10--------20--------30--------40--------50--------60--------70--------80--------90--------100

One might say the Vector is the point (or plane or boundary) in between the most As and Bs. Or rather, the statistical point that separates the two datasets (A and B). Like so:

A-AA--AAA---AA--A--A-AAA-B-ABA-BB-A-B-B-B-BB-B--BBB-B--BBBB-BBB
                              ^-- Vector (or "optimum hyperplane", or boundary)

But what is the point called where there are no more As greater than called? Or at least the point where it's statistically "unlikely" (in soft terms) that any point above this number include a point belonging to the A category.


 A-AA--AAA---AA--A--A-AAA-B-ABA-BB-A-B-B-B-BB-B--BBB-B--BBBB-BBB
|--Probably A-----------|---????---|--Probably B----------------|
  What's this called? --^     ^    ^-- What's this called?
                              ^-- Vector (or "plane", or "boundary")

In other words:

 A-AA--AAA---AA--A--A-AAA-B-ABA-BB-A-B-B-B-BB-B--BBB-B--BBBB-BBB
|--Probably A---------|-Questionable-|--Probably B--------------|
<-- More likely A ------Maybe A or B----------- More likely B -->

Therefore if given an un-labeled or uncategorized points 1, 2, and 3:

 ---------1?------------------2?------------3?------------------
|--Probably A---------|-Questionable-|--Probably B--------------|
<-- More likely A ------Maybe A or B----------- More likely B -->

From this we can share that 1 is likely A, 2 is unknown since it's so close to the vector (maybe a probability of 50%), and 3 is B.

I know the term for the point in the middle (vector). But what are the terms for the other points that separate the questionable area from the clear area?

If the vector is the point between the most As and Bs, what is the point where there are no more As called?

1

There are 1 best solutions below

0
On

In convex analysis, there is a separating hyperplane between two disjoint convex bodies. However, it looks like you are trying to do classification. There are techniques in classification that use a similar notion to attempt to separate two or more datasets. These are called maximum margin hyperplanes. See Support Vector Machine.

EDIT: You are also asking about points that lie in the margin (the supposed gap between the sets). When the data sets cannot be cleanly separated, it is typically referred to as a soft margin. This is also mentioned in the Wikipedia article I linked to.