I thought this was going to be easy, but it seems not to be...
Informally, the problem arises because I need to characterize the situation in which one random variable can have any value between 0 and 1, with equal probability, and a second can also have any value in this range, as long as it is larger than the first. I would like to know the marginal distribution on the second variable.
So,
$u_1 \sim Unif(0,1)$ , and
$u_2|u_1 \sim Unif(u_1,1)$
Marginalizing $u_2$ over the range of $u_1$ gives:
EDITED, with thanks to 'Did':
$$\varphi(u_2) = \int_0^1 \varphi(u_2|u_1)\varphi(u_1) du_1 = \int_0^{u_2} \frac {1}{1-u_1} du_1$$
The antiderivative of the integrand is standard textbook: $$\int \frac {1}{1-u_1} du_1 = -ln(1-u_1)$$
I found this post on a similar problem: $X|U=u\sim \mathsf{Unif}(0,u)$ and $U\sim\mathsf{Unif}(0,1):$ Are $X, U$ are independent? in which the answer by Graham Kemp makes me believe the correct answer is $$\varphi(u_2)=-ln(1-u_2)$$.
It is easy enough to simulate the process and generate $u_2$ samples, and the result seems to confirm this. The resulting distribution looks very much like a Beta distribution with $3 \over \sqrt 2$ for $\alpha$ and $1 \over \sqrt 2$ for $\beta$, with first moment close enough to 0.75, which seems reasonable considering the law of iterated expectations.
So here are my questions:
- Does this particular mixture distribution have a name?
- Why is $-ln(1-x)$ that close to $Beta({{3} \over {\sqrt 2}},{{1} \over {\sqrt 2}})$, or, if you prefer, $-ln(x)$ to $Beta({{1} \over {\sqrt 2}},{{3} \over {\sqrt 2}})$ ? Here is a graph showing function values for -ln(x) and beta distribution density with $\sqrt{0.5}$ and $\sqrt{4.5}$ for $\alpha$ and $\beta$. Might this close relation be shown by derivation?
- The answer might be a beta distribution and the general context is reminiscent of Dirichlet processes and distributions. Is there a link? Is the joint distribution of $u_1$ and $u_2$ Dirichlet?
An answer to any of these three items would be of great help.
I will just summarize some findings or answers I came across, in case it matters for anyone else.
My original $(\alpha,\beta)$ pair, $({{1}\over{\sqrt2}},{{3}\over{\sqrt2}})$ was based on visual inspection of the central values produced by sampling from the $-ln(x)$ distribution, but calculating the raw moments for $-ln(x)$ and finding the beta distribution with the same moments gave me the pair $(\alpha,\beta)$ = $({5\over7},{15\over7})$, which also fits the target distribution quite well, at least visually. Upon closer inspection, both approximations overshoot the density significantly near 0, then dip a bit under the target density, especially near 0.04. The approximation is best from 0.20 onwards.
The ideal way to quantify this approximation would be through KL-divergence, but the integral $$\int_{0}^{1}{-ln(x){ln\Bigl({{-ln(x)}\over{x^{\alpha-1}(1-x)^{\beta-1}}}}Beta(\alpha,\beta)\Bigr)dx}$$ turns into a complete mess and moreover seems to diverge. Even before comparing, the entropy of $-ln(x)$ leads to the logarithmic integral, inconveniently with one at the upper limit. The analytical alternative, which would be to simply calculate the absolute distance and integrate it over the support, requires identification of the roots for that difference, which is equally hard. I took a numerical approach, and calculated, besides the absolute difference, also the relative absolute difference (that is, the distance between the two density functions divided by the target density). Since both density functions integrate to unity in their domain, it is convenient to express these distances as proportions. I got approximately 3.49% and 3.68% for the integrated absolute distances for the moment-matched and the "multiples of $1/\sqrt2$" beta distributions, and 4.27% and 3.75% for the same beta distributions when calculated relative to target density.