I am trying to figure out a particular percentage of confidence so that I can minimize false positives and false negatives and get the most optimal result set.
Here is what I am doing...
I am currently working on a machine learning project. It is a logistic regression model and the response I am getting is categorical in nature. I have two categories rightnow. Lets say 'A' and 'B'.
During testing the machine responds with a percentage of confidence for both the labels. Output Example: A:0.7689 B: 0.2311
I trained my machine on 4000 data items for each classifier and predicted 200 data items (test data) and received the following kind of result:
LevelOfConfidenceRange RightPredictionsinthisRange WrongPredictionsinthisRange
50%-60% 2 3
60%-70% 5 0
70%-80% 7 2
80%-85% 4 1
85%-90% 10 0
90%-95% 49 4
95%-100% 109 4
Now, I want to figure out a threshold percentage (e.g. 87%) so that I output certain number of predictions in which correct predictions are maximized and wrong predictions are minimized and I still provide a sufficient amount of results. That is I'll be only considering the outputs predicted by the machine if the level of confidence provided by it is greater than Threshold percentage (87%) only. I'll be rejecting the results that have been predicted with a level of cofidence less than Threshold percentage.
Currently I am forming such a table and then deciding my threshold percentage.
I want to know if there is a sophisticated/proper Mathematical method to figure this threshold percentage.
I would appreciate any help!
Regards Savya Saachi