A measure similar to variance that's always between 0 and 1?

Question

A measure similar to variance that's always between 0 and 1?

1.5k Views Asked by Bumbble Comm At 10 May 2026 - 7:03

Consider the following histogram, obtained from around 1000 measures of distance.

As you can observe, most of the data appears near the mean arond the value 5-10. I also have some isolated samples far away at values 100, 160.

1) Is there any statistical measure I can use to detect when this happens? Sometimes there are no outliers and I'm trying to detect such cases. I was thinking of thresholding variance, but I'm looking for a measure with a value in a fixed interval (e.g. always 0 to 1).

2) I'm trying to get an interval like the one in red that only includes the measures around the mean. I'm looking for a method that works for different histograms with a similar shape (number of readings and values can vary, but shape is always similar). Could you suggest me a method?

Original Q&A

There are 3 best solutions below

user14972 On 26 Jun 2018 - 10:38

To answer the title question, if $|X - X_0| \leq 1$, then the variance of $X$ has to be bounded by $1$. So you could use any real-valued function that collapses the range of $X$ down to an interval of radius 1.

For example, you could measure

$$ \mathrm{Var}\left( \frac{2}{\pi} \arctan(X - X_0) \right) $$

(this answer does not attempt to address any of the contents of the post)

Bumbble Comm On 27 Jun 2018 - 4:45

One example of such functions is the exponential family:

$$f(v) = \exp[-v^k/s^k]$$

You input variance, which is in $[0,+\infty]$ and you get out something which is $[0,1]$

If variance is $0$ you get $1$ out and
the larger variance the closer you will get to $0$.
$s$ and $k$ are both parameters you can steer how fast to shrink to $0$.

If you want the opposite you can just take $1-f(v)$ instead.

**Bumbble Comm** · Accepted Answer

In your case, I think variance is not the right approach (see the Note at the end). Perhaps you could consider using boxplots for 'outlier detection'.

Here is a brief example using exponential data, which tend to have outliers. (The exponential distribution is often used to model waiting times for events or lifetimes of electronic components.) Consider the data below, generated using R statistical software. Twenty observations are rounded to one place and sorted:

 x = sort(round(rexp(20, .01), 1));  x
 [1]   0.2   0.7   2.6  14.7  28.3  31.1  39.3  45.0  48.7  56.5
[11]  63.0  77.0  77.7  80.2  81.9  96.8 103.6 110.9 157.2 245.1

Sample statistics are shown below. Roughly speaking the lower quartile 30.40, the median 59.75, and the upper quartile 85.62 divide the sorted data into four 'chunks' of five observations each. The interquartile range IQR $= Q_3 - Q_1 = 55.225$ is the width of the box in a boxplot and an important measure of variability for detecting outliers.

summary(x);  sd(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.20   30.40   59.75   68.03   85.62  245.10 
[1] 58.4176  # standard deviation
[1] 55.225   # inter-quartile range

The ends of the box in a boxplot are at the quartiles, the median is marked by a heavy bar inside the box.

boxplot(x, horizontal=T, col="skyblue2")

The largest observation 245.1 is noted as an outlier, and plotted separately in the boxplot. It is noted as an outlier because it is greater than $Q_3 + 1.5(\text{IQR}) = 168.46.$ (This is known as the `1.5 IQR criterion'. This criterion is popular, but there are others.)

Please note that there is nothing "wrong" with observation 245.1. As I said earlier, it is the nature of exponential data to have outliers. (It would probably be best to keep the outlier when doing data analysis.)

For data such as yours, I suppose the straggling observations far above your red bracket would be marked as outliers. (Then you would have to consider for your data what circumstances might have produced these outliers, and how the outliers should be handled in data analysis.)

Most statistics books and many online sites have additional information about boxplots, outliers, and how to regard outliers in data analysis.

Note: Variances (and standard deviations) do not work well for outlier detection. If $X_i$ is an outlier, then the term $(X_i - \bar X)^2$ in the variance can be unusually large. So measuring the distance of an observation from $\bar X$ in terms of standard deviations can be misleading because the outlier itself has a large effect on the variance (and hence, the standard deviation). By contrast, outliers do not have much effect on the size of the interquartile range (IQR). Thus IQR is more effective in outlier detection.

In the example, changing the last observation from 245.1 to 100.0 reduces the standard deviation of the sample from 58.42 to 41.96, but does not change the IQR at all.

A measure similar to variance that's always between 0 and 1?

There are 3 best solutions below

Related Questions in STATISTICS

Related Questions in STANDARD-DEVIATION

Related Questions in VARIANCE

Trending Questions

Popular # Hahtags

Popular Questions