Consider finite real functions on an interval $x \in [0, 1]$. Let's call an example function $f(x)$. It is known that such a function is a sum of non-negative slowly-varying part we will call signal, and additive white (i.i.d) noise with zero mean. The function is experimentally sampled over $N$ regular intervals. The goal is to find a metric $\mathcal{S}\{f\}$ that would report if a function is strongly localized within some part of the interval and by how much. For example:
- The metric will be low if $f(x)$ has a shape of a narrow gaussian
- The metric will be avg if $f(x)$ is uniformly-distributed over the interval
- The metric will be high if $f(x)$ has two peaks at both edges of the interval
Naive approach I have tried: Define $\mathcal{S}$ as sample variance
- Normalize the function $p_i = \frac{f_i}{\sum_i f_i}$
- Calculate sample mean $\mu_x = \sum_i p_i x_i$
- Calculate sample variance $\mathcal{S} = \sigma^2_x =\sum_i p_i (x_i - \mu_x)^2$
Problems with my approach:
- Calculation of empirical probability $p$ is ill-defined in presence of noise, as $f(x)$ may be negative. This problem can to an extent be solved by low-pass-filtering the data to minimize the effects of white noise, and then subtract the minimum from the result.
- Sample variance is extremely sensitive to outliers that are far from the mean. For example, adding a small (e.g. 2% of maximum) constant to a gaussian-shaped signal causes the sample variance to jump several orders of magnitude. This behaviour is very undesirable. I want a measure which would be more robust to outliers. Perhaps the metric should not put as much weight on the tails as variance does.
- Sample variance is biased if $\mu_x$ is not in the middle of the interval. For example, if a gaussian-shaped curve peaked at the middle will have a lower sample variance than the one peaked near an edge, because a longer part of one of the tails will be able to fit into the interval.
Question: Propose a metric that is similar in nature to sample variance, but is better at representing spread of some non-negative quantity over a finite interval.