Stable measure of spread of a function over a finite interval

34 Views Asked by At

Consider finite real functions on an interval $x \in [0, 1]$. Let's call an example function $f(x)$. It is known that such a function is a sum of non-negative slowly-varying part we will call signal, and additive white (i.i.d) noise with zero mean. The function is experimentally sampled over $N$ regular intervals. The goal is to find a metric $\mathcal{S}\{f\}$ that would report if a function is strongly localized within some part of the interval and by how much. For example:

  • The metric will be low if $f(x)$ has a shape of a narrow gaussian
  • The metric will be avg if $f(x)$ is uniformly-distributed over the interval
  • The metric will be high if $f(x)$ has two peaks at both edges of the interval

Naive approach I have tried: Define $\mathcal{S}$ as sample variance

  1. Normalize the function $p_i = \frac{f_i}{\sum_i f_i}$
  2. Calculate sample mean $\mu_x = \sum_i p_i x_i$
  3. Calculate sample variance $\mathcal{S} = \sigma^2_x =\sum_i p_i (x_i - \mu_x)^2$

Problems with my approach:

  1. Calculation of empirical probability $p$ is ill-defined in presence of noise, as $f(x)$ may be negative. This problem can to an extent be solved by low-pass-filtering the data to minimize the effects of white noise, and then subtract the minimum from the result.
  2. Sample variance is extremely sensitive to outliers that are far from the mean. For example, adding a small (e.g. 2% of maximum) constant to a gaussian-shaped signal causes the sample variance to jump several orders of magnitude. This behaviour is very undesirable. I want a measure which would be more robust to outliers. Perhaps the metric should not put as much weight on the tails as variance does.
  3. Sample variance is biased if $\mu_x$ is not in the middle of the interval. For example, if a gaussian-shaped curve peaked at the middle will have a lower sample variance than the one peaked near an edge, because a longer part of one of the tails will be able to fit into the interval.

Question: Propose a metric that is similar in nature to sample variance, but is better at representing spread of some non-negative quantity over a finite interval.