Confidence interval based anomaly detection

262 Views Asked by At

My question is related to confidence intervals and outlier detection. Here is the scenario, imagine that my distribution is fully normal distributed with 0 means, 1 scale. Then I can represent this distribution in python using scipy like;

from scipy.stats import norm

mean = 0
var = 1
dist = norm(mean, var)

Then, for example, I have also been able to compute confidence intervals for 95% like this;

dist.interval(0.95)

where the result is (-1.959963984540054, 1.959963984540054)

These points represent the critical points for my distribution with a specific (95%) confidence interval. According to that information, I can calculate the area under the curve for these critical points like this;

dist.cdf(-1.959963984540054), dist.cdf(1.959963984540054)

where CDF represents the cumulative distribution function in scipy and the result for those points is 0.025, 0.975.

And here is my question; when I want to test a random point -eg 7.25- whether fit my previous distribution or not, I can calculate the z-score - (7.25 - mean) / var -.

Then I can interpret that if the calculated z-score is bigger than for example 3.0, I can say that my point (7.25) doesn't fit the given distribution. (3.0 is a complete hyperparameter, I'm absolutely aware of this situation)

Do I have a chance to calculate CDF for this point (7.25) which equals 0.9999999999999899, then compare this ~0.9999 point with previously calculated area pieces of information which are 0.025 and 0.975 to say that because my CDF(7.25) equals ~0.9999 is not resides in 0.025 and 0.975 regions, my 7.25 point can be identified as an outlier?

Thanks a lot