I have a dataset like so (left column produced by an estimation algorithm, right column being what happened in reality). [EDIT: Please note that every event is distinct (i.e., we are not repeatedly testing the same event, but every time we are making a prediction about some new kind of thing that may or may not happen).]
event probability actual outcome (whether event occurred)
0.939658077 TRUE
0.705453465 FALSE
0.310251296 TRUE
0.385363009 FALSE
0.660532932 FALSE
0.290306978 TRUE
0.484473665 FALSE
0.01615261 FALSE
0.898152645 TRUE
0.389938993 TRUE
0.032598374 FALSE
0.599836035 FALSE
0.428701779 TRUE
0.7787285 TRUE
0.14356366 FALSE
0.65105148 FALSE
0.418174021 FALSE
0.724846388 TRUE
0.844266775 TRUE
0.437018647 TRUE
... ...
How can I evaluate the quality of the prediction algorithm? (Assume data set size is large enough.)
Thanks!!
EDIT: So, for example, if the estimated probability is 0.5, the model is saying it doesn't know what to predict, so in a way there is 0 error whatever the outcome. And the model could estimate a 0.9 probability of the event occurring, and once in ten times you would still expect it to not occur. However, over the full dataset, if the model keeps saying 0.1 and then the event usually occurs, and if it keeps saying 0.9 and the event usually does not occur, then it's performing poorly.
Confusion Matrix and ROC curves probably will suit you.
Check this out: https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5