Finding simplest function to distinguish 2 sets

84 Views Asked by At

I wish to find a function that distinguishes $2$ sets. I have m data values in form of n-tuples out of which k are supposed to be mapped to a value less than $0$ and other m-k are supposed to be mapped to a value greater than or equal to $0$. My main Aim is that The function needs to be simple to compute, so not neccessary polynomial.(anything better than a $(m-1)$ degree polynomial in the worst case).

For example for the data$(m=6,k=3,n=2)$;

$A((1,3), (2,5), (12,67))$

$B((3,4), (14,20),(4,6))$

i.e the latter tuples $(x,y)$ belong to set $B$ and the former $3$ belong to set $A$.

Here, my dream (or at least a very good) function would be $f(y,x)=(y/x) -2$ . Which sends A to positive and B to negative values.

Of course i can have a trivial polynomial fit of degree 5 but that thing gets messy when m is large. Since there is lot of freedom on values and nature of function, I m certain something better is achievable. But I am not sure how to do this. And if not a general solution is available, even for the case of n=2 or 3 variables will be very very much appreciated. Even related links without explanation will be of great help.

Thankyou

2

There are 2 best solutions below

3
On

One naïve approach.

You have $k$ points in $\Bbb R^n$ (the set $A$) on which the fucntion is negative and $m-k$ points in $\Bbb R^n$ (the set $B$) on which the function is positive. We also suppose that $A\cap B=\emptyset$.

Take $$g_A(x) = dist (x,A) = \min_{y\in A} (\|x-y\|)$$ $$g_B(x) = dist (x,B) = \min_{y\in B} (\|x-y\|)$$ As for the norm, you can take whichever you like on $\Bbb R^n$.

Now your $f$ can be taken as $$f(x)=g_A(x)-g_B(x).$$ If I'm not mistaken, the cost of this function should be $\mathcal O(mn)$.

Of course, for a known set of points this function is not optimal, but it always works.

2
On

I have the idea that the function can only be constructed if the sets $A$ and $B$ are well-known. If that is the case you can do it with $1_A-1_B$ where $1_X$ denotes the characteristic function of set $X$.

I would not be surprised if I am overlooking something here. If so then please let me know. Of course I will delete my answer in that case.