I need to identify the vector between two sets of data.
The goal is to correctly "guess" whether a new piece of data is in group A or B based on if it is greater or lesser than this plane.
What is the optimum way to calculate this plane?
Here is a sample dataset and accompanying plot with approximated vector (by eye).
GROUP A
[375,505,965,1126,2384,2426,4041,4389,4474,4758,4758,4839,4846,4944,5010,5010,5010,5010,5010,5010,5010,5010,5010,5042,5153,5190,5261,5411,5411,5432,5736,5755,5893,5982,5982,6036,6112,6195,6198,6229,6300,6350,6406,6502,6526,6533,6571,6599,6686,6689,6742,6786,6796,6845,6849,6876,6896,6901,6913,6943,7006,7062,7076,7079,7100,7222,7268,7283,7294,7299,7338,7357,7368,7370,7409,7443,7444,7482,7496,7539,7553,7711,7713,7721,7735,7740,7741,7773,7781,7803,7803,7804,7808,7823,7853,7865,7876,7877,7892,7897,7925,7932,7932,7938,7963,7985,7992,8003,8003,8005,8014,8015,8020,8028,8031,8040,8046,8050,8054,8061,8071,8072,8075,8083,8114,8129,8132,8157,8169,8194,8204,8213,8215,8237,8238,8243,8248,8254,8255,8255,8259,8267,8268,8279,8290,8302,8304,8313,8318,8324,8353,8359,8380,8387,8389,8389,8391,8426,8432,8446,8447,8461,8464,8473,8493,8501,8506,8523,8550,8561,8565,8573,8582,8583,8616,8625,8625,8634,8649,8675,8685,8685,8692,8704,8705,8707,8719,8722,8730,8733,8740,8746,8762,8762,8768,8787,8818,8830,8830,8833,8846,8855,8864,8866,8868,8875,8877,8883,8886,8895,8900,8945,8960,8961,8981,8985,9004,9006,9032,9033,9053,9059,9094,9096,9097,9106,9121,9147,9217,9349,9461]
GROUP B
[12,16,29,32,33,35,39,42,44,44,44,45,45,45,45,45,45,45,45,45,47,51,51,51,57,57,60,61,61,62,71,75,75,75,75,75,75,76,76,76,76,76,76,79,84,84,85,89,93,93,95,96,97,98,100,100,100,100,100,102,102,103,105,108,109,109,109,109,109,109,109,109,109,109,109,109,110,110,112,113,114,114,116,116,118,119,120,121,122,124,125,128,129,130,131,132,133,133,137,138,144,144,146,146,146,148,149,149,150,150,150,151,153,155,157,159,164,164,164,167,169,170,171,171,171,171,173,174,175,176,176,177,178,179,180,181,181,183,184,185,187,191,193,199,203,203,205,205,206,212,213,214,214,219,224,224,224,225,225,226,227,227,228,231,234,234,235,237,240,244,245,245,246,246,246,248,249,250,250,251,255,255,257,264,264,267,270,271,271,281,282,286,286,291,291,292,292,294,295,299,301,302,304,304,304,304,304,306,308,314,318,329,340,344,345,356,359,363,368,368,371,375,379,386,389,390,392,394,408,418,438,440,456,456,458,460,461,467,491,503,505,508,524,557,558,568,591,609,622,656,665,668,687,705,728,817,839,965,1013,1093,1126,1512,1935,2159,2384,2424,2426,2484,2738,2746,2751,3006,3184,3184,3184,3184,3184,4023,5842,5842,6502,7443,7781,8132,8237,8501]
UNKOWN
[5000,4000,2000,6000,8000]

Based on this plane calculation, to which group do each of the UNKNOWN set belong?
if UNKNOWN[point] > vector plane then GROUP A
if UNKNOWN[point] < vector plane then GROUP B
How can I calculate this plane?
A quick way would be to construct a histogram of both distributions to model the probability distribution of each group. At some point in the middle, the left tail of Group A and the right tail of Group B will contain the same fraction of their respective distributions. This is the point at which you're equally confident that you're in the tail of one and part of the "main distribution" of the other.
A more rigorous way would be to go back to what you're measuring, see what kind of distribution you should be getting for the measurements in the groups, looking at what population(s) you're drawing from, forming hypotheses, selecting confidence intervals, etc.