Evaluate predictions by comparing to actual outcomes - but no categories to use

40 Views Asked by At

I have a dataset like so (left column produced by an estimation algorithm, right column being what happened in reality). [EDIT: Please note that every event is distinct (i.e., we are not repeatedly testing the same event, but every time we are making a prediction about some new kind of thing that may or may not happen).]

event probability        actual outcome (whether event occurred)
0.939658077              TRUE
0.705453465              FALSE
0.310251296              TRUE
0.385363009              FALSE
0.660532932              FALSE
0.290306978              TRUE
0.484473665              FALSE
0.01615261               FALSE
0.898152645              TRUE
0.389938993              TRUE
0.032598374              FALSE
0.599836035              FALSE
0.428701779              TRUE
0.7787285                TRUE
0.14356366               FALSE
0.65105148               FALSE
0.418174021              FALSE
0.724846388              TRUE
0.844266775              TRUE
0.437018647              TRUE
...                      ...

How can I evaluate the quality of the prediction algorithm? (Assume data set size is large enough.)

Thanks!!

EDIT: So, for example, if the estimated probability is 0.5, the model is saying it doesn't know what to predict, so in a way there is 0 error whatever the outcome. And the model could estimate a 0.9 probability of the event occurring, and once in ten times you would still expect it to not occur. However, over the full dataset, if the model keeps saying 0.1 and then the event usually occurs, and if it keeps saying 0.9 and the event usually does not occur, then it's performing poorly.

1

There are 1 best solutions below

3
On BEST ANSWER

Confusion Matrix and ROC curves probably will suit you.

Check this out: https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5