Continuous mixture of uniform distribution with uniformly distributed lower bound

663 Views Asked by At

I thought this was going to be easy, but it seems not to be...

Informally, the problem arises because I need to characterize the situation in which one random variable can have any value between 0 and 1, with equal probability, and a second can also have any value in this range, as long as it is larger than the first. I would like to know the marginal distribution on the second variable.

So,
$u_1 \sim Unif(0,1)$ , and
$u_2|u_1 \sim Unif(u_1,1)$

Marginalizing $u_2$ over the range of $u_1$ gives:

EDITED, with thanks to 'Did': $$\varphi(u_2) = \int_0^1 \varphi(u_2|u_1)\varphi(u_1) du_1 = \int_0^{u_2} \frac {1}{1-u_1} du_1$$ The antiderivative of the integrand is standard textbook: $$\int \frac {1}{1-u_1} du_1 = -ln(1-u_1)$$ I found this post on a similar problem: $X|U=u\sim \mathsf{Unif}(0,u)$ and $U\sim\mathsf{Unif}(0,1):$ Are $X, U$ are independent? in which the answer by Graham Kemp makes me believe the correct answer is $$\varphi(u_2)=-ln(1-u_2)$$.
It is easy enough to simulate the process and generate $u_2$ samples, and the result seems to confirm this. The resulting distribution looks very much like a Beta distribution with $3 \over \sqrt 2$ for $\alpha$ and $1 \over \sqrt 2$ for $\beta$, with first moment close enough to 0.75, which seems reasonable considering the law of iterated expectations.

So here are my questions:

  1. Does this particular mixture distribution have a name?
  2. Why is $-ln(1-x)$ that close to $Beta({{3} \over {\sqrt 2}},{{1} \over {\sqrt 2}})$, or, if you prefer, $-ln(x)$ to $Beta({{1} \over {\sqrt 2}},{{3} \over {\sqrt 2}})$ ? Here is a graph showing function values for -ln(x) and beta distribution density with $\sqrt{0.5}$ and $\sqrt{4.5}$ for $\alpha$ and $\beta$. Might this close relation be shown by derivation?
  3. The answer might be a beta distribution and the general context is reminiscent of Dirichlet processes and distributions. Is there a link? Is the joint distribution of $u_1$ and $u_2$ Dirichlet?

An answer to any of these three items would be of great help.

1

There are 1 best solutions below

0
On

I will just summarize some findings or answers I came across, in case it matters for anyone else.

  1. I have looked up the expressions $-ln(1-x)$ and its symmetric counterpart $-ln(x)$ in many places, and I did not find any names for it, at least not as a density function. $-ln(x)$ is surprisal (in nats) if x is a probability, but that is quite distinct from a distribution.
  2. There is no particular connection between $-ln(x)$ and the beta distribution, although I was impressed by how close they get for some values of $\alpha$ and $\beta$. It is clear that they are different because it is impossible to match the raw distribution moments.

    My original $(\alpha,\beta)$ pair, $({{1}\over{\sqrt2}},{{3}\over{\sqrt2}})$ was based on visual inspection of the central values produced by sampling from the $-ln(x)$ distribution, but calculating the raw moments for $-ln(x)$ and finding the beta distribution with the same moments gave me the pair $(\alpha,\beta)$ = $({5\over7},{15\over7})$, which also fits the target distribution quite well, at least visually. Upon closer inspection, both approximations overshoot the density significantly near 0, then dip a bit under the target density, especially near 0.04. The approximation is best from 0.20 onwards.

    The ideal way to quantify this approximation would be through KL-divergence, but the integral $$\int_{0}^{1}{-ln(x){ln\Bigl({{-ln(x)}\over{x^{\alpha-1}(1-x)^{\beta-1}}}}Beta(\alpha,\beta)\Bigr)dx}$$ turns into a complete mess and moreover seems to diverge. Even before comparing, the entropy of $-ln(x)$ leads to the logarithmic integral, inconveniently with one at the upper limit. The analytical alternative, which would be to simply calculate the absolute distance and integrate it over the support, requires identification of the roots for that difference, which is equally hard. I took a numerical approach, and calculated, besides the absolute difference, also the relative absolute difference (that is, the distance between the two density functions divided by the target density). Since both density functions integrate to unity in their domain, it is convenient to express these distances as proportions. I got approximately 3.49% and 3.68% for the integrated absolute distances for the moment-matched and the "multiples of $1/\sqrt2$" beta distributions, and 4.27% and 3.75% for the same beta distributions when calculated relative to target density.

  3. I am still not sure about the relationship to Dirichlet processes, but the joint distribution is not a Dirichlet distribution. For once, if it were, the marginal distributions should be beta, and the distribution of $u_2$ isn't.