Best -least bad- regression for $ (x,y) $ points where $ y=0$ or $y=1 $.

39 Views Asked by At

I have a series of $(x,y)$ points where:

$ 0 < x < 1 $

and

$y=0$ or $y=1$

I want to aproximate $y$ values from a given $x$. I know this is a case where correlation is going to be low, but what would be the best approach?


I have tried Cuadratic Regression:

enter image description here


And Cubic Regression:

enter image description here


But I stoped to try a higher polynomial approach as I need a line that do not cross $y=0$ and $y=1$ limits (at x points range), and I think I may do not understand well the problem.

Should I continue calculating higher polynomial solutions or it would be better to try other regression type?

1

There are 1 best solutions below

0
On BEST ANSWER

It is a classical set up for a logistic regression as a classification model. Namely, you can model it as $$ \hat{p}_i \equiv \widehat{\mathbb{P}(Y_i = 1)}=\frac{1}{1 + e^{-\beta_0 - \beta_1 x}}. $$ The output of the model are numbers between $0$ and $1$ that are interpreted as probabilities. You can set a cut-off (there are algorithms to choose the best cut-off to minimize the misslasification rate, like using the ROC curve) $\tau$ and then $ \hat{y}_i = 1$ if $\hat{p}_i \ge \tau$ and $y_i = 0$ otherwise.