How to test if numerical function describes a valid probability distribution?

86 Views Asked by At

Suppose I can query a function $f$, but I don't have its closed form. We know the following things about $f$:

  • $f(x) \geq 0$ for all $x$
  • $f$ is continuous
  • Additionally, I can choose whether $f(x) \leq c$ for all $x$. (otherwise it will have some unknown bound)
  • Additionally, I can choose whether the support for $f$ is $[-1,1]$. (otherwise it will be defined for all reals)

If I want to interpret $f$ as a pdf, then I need to find a way to test (or have some confidence level about) if its integral sums to $1$, which is clearly intractable even if (unless?) I limit the support. Or are there other ways of doing this?

If I want to interpret $f$ as a cdf, then I'd make $f$ be bounded above by $1$ to make my life easier. Then, I'd need to find a way to test (or have some confidence level about) if it is nondecreasing. How would I go about doing this?

Are there other easier-to-deal with interpretations of the function $f$ which, under some transformation, give me something that can be turned into a probability distribution?

2

There are 2 best solutions below

0
On BEST ANSWER

There is a very natural way of associating to $f$ a probability density if we use all of your assumptions, along with an additional one: $f$ is not the zero function. Then, with non-negativity, continuity, support being $[-1,1]$ and $f(x) \leq 1$ for $x\in[-1,1]$, we can conclude $Z = \int_{-1}^1 f(x) dx$ exists, is finite, and is strictly positive (because $f \neq 0$). The function $$p(x) = \frac{1}{Z}f(x)$$ is a probability density associated to $f(x)$. The question is then: how can we generate samples distributed according to $p$ if we don't know $Z$?

For this, we can use the Metropolis-Hastings algorithm, a special case of which I've transcribed here:

Start with some number $X_0$ such that $f(X_0) > 0$. Then, for $t \geq 0$,

  1. Generate an independent proposal $X'$ from the uniform distribution on $[-1,1]$,
  2. Compute the acceptance probability $\alpha = f(X')/f(X_t)$
  3. With probability $\alpha$, set $X_{t+1} = X'$ and otherwise set $X_{t+1} = X_t$.

Then the distribution of $X_t$ converges to $p(x) dx$ as $t\rightarrow\infty$.

One question is: how do you know you will ever select a point $X'$ such that $f(X') > 0$? But if $f$ is continuous, not identically zero, and supported on $[-1,1]$ then there is an open interval $I \subset [-1,1]$ such that $f(x) > 0$ for all $x \in I$. Thus by generating independent random variables $X'$ from $\mathrm{Unif}([-1,1])$, we will eventually draw one in $I$.

As Robert Israel points out, you can't say anything about the integral of $f$ (i.e., the normalizing constant $Z > 0$) -- so what's the catch? The catch is you have no idea how fast or slow $X_t \Rightarrow p(x)dx$.

1
On

In the pdf case, if you sample a finite number of points it is always possible that you happened to pick the points where $f = 0$, so of course it is impossible to say anything nontrivial about the integral of $f$. Even if you sample points at random, it is possible that nearly all the area under the curve is contained in some tall, narrow "spike" that is very unlikely to be sampled.
And of course no numerical method is going to be able to tell you whether the integral is exactly $1$. However, if you are given a bounded interval containing the support of $f$ and an upper bound on $f$, you can devise a sampling method that will estimate the integral to within $\epsilon$ with probability $> 1 - \epsilon$.

EDIT: Let's say the support of $f$ is contained in $[a,b]$ and you know $f \le c$. If $X$ has uniform distribution in $[a,b]$, then $f(X)$ has mean $\mu = \int_a^b f(x)/(b-a)$ and variance $< c^2$. If you take a random sample $X_1, \ldots, X_n$ of size $n$ from the uniform distribution on $[a,b]$, $\widehat{\mu} = (f(X_1) + \ldots + f(X_n))/n$ has mean $\mu$ and variance $\sigma^2 \le c^2/n$. Using Chebyshev's inequality, with $k = \epsilon \sqrt{n}/c$,

we find $$ {\text Pr}(|\widehat{\mu} - \mu| > \epsilon) < {\text Pr}(|\widehat{\mu} - \mu| > k \sigma) < \frac{1}{k^2} = \frac{c}{\epsilon^2 n}$$ and the right side $< \epsilon$ if $n > c/\epsilon^3$.

This method is not meant to be optimal.