Probability distribution of a random variable - Interview

294 Views Asked by At

this question is from a interview a friend had and I was curious how to solve it.
The question is:

I have two random variables, $a$ and $b$.
$a=rand(1,100)$
$b=rand(a,100)$
The rand function generates a number within the range of the numbers in the parentheses, with equal probability for each number.
How is $b$ distributed (what is its probability distribution function), and if I want b to be uniformly distributed between $a$ and $100$, just as a is uniformly distributed between 1 and $100$, what should I do?
My aspiration is for them to be uniformly distributed; it's enough to make $b$ distributed more uniformly than it is now (I'm allowed to use any operator/function, etc.)?

My attempt:

My approach to solving this involves trying to use a new function, $c=rand(1,a)$, so that they might complement each other. I attempted to use modulo operations and an if statement, meaning if the number is not within the range, then generate a new number.

Had a few other attempts, reached:

$a$ distribute uniformly and $b$ not distribute as something "known", using the total of expectation value, but it is not clear.

Will be happy to receive help, thanks.

2

There are 2 best solutions below

9
On BEST ANSWER

Interviewer:

Assume $a\sim U[0,1]\,,$ and conditional on $a\,,$ $$b\sim U[a,1]\,.$$ What is the unconditional distribution of $b\,?$

Candidate (sweating):

\begin{align} &\mathbb P\big\{b\le y\big\}=\int_0^1\mathbb P\big\{b\le y\,\big|\,a=x\big\}\,dx =\int_0^1\frac{y-x}{1-x}1_{\{x\le y\}}\,dx=\int_0^y\frac{y-x}{1-x}\,dx\\[2mm] &=-y\ln(1-x)\Big|_{x=0}^{x=y}+\Big[x+\ln(1-x)\Big]_{x=0}^{x=y}\\[2mm] &=-y\ln(1-y)+y+\ln(1-y)\\[2mm] &=y+(1-y)\ln(1-y)\,. \end{align} This is the unconditional CDF of $b\,.$

The PDF of $b$ is $$ \frac{d}{dy}P\big\{b\le y\big\}=2-\ln(1-y)\,. $$ The plot of it resembles the one in Joseph's answer.

To work out a possibility to turn this $b$ into a random variable that is unconditionally uniform on $[c,1]$ where $c$ is fixed (I deliberately don't denote this by $a$) one can use the well-known method to

  • stick $b$ into its own CDF by which it produces a $U[0,1]$ variable and then

  • stick that into the inverse CDF of the desired distribution.

The CDF of $U[c,1]$ is $$ x\mapsto \frac{x-c}{1-c} $$ which has inverse $$ y\mapsto (1-c)y+c\,. $$ Therefore, $$ (1-c)\Big\{b+(1-b)\ln(1-b)\Big\}+c $$ is uniform on $[c,1]\,.$

2
On

Initially this problem seems a little confusing. I initially mistook $P(B=b|A=a)$ for $P(B=b)$ and thought $P(B=b)$ was already uniform! In such a situation to get a handle on the problem it can be very useful to try out some simple and extreme examples until you see if any patterns show up.

This is especially useful in an interview situation as it allows you to talk through the problem even when you are not sure how to solve it. In this case testing out a few simple and extreme examples can get us pretty much the whole way there. Lets see what we need to happen for $b$ to equal $100$:

$P(B=100) = P(B=100|A=a)P(A=a)$

So we get some value for $a$ with probability $\frac{1}{100}$, then there is a second probability for $b$ given the $a$ we drew. Let's try a case where $a = 100$:

$P(B=100) = 1 \cdot \dfrac{1}{100}$

i.e. we draw randomly from the range 100-100 so we must draw 100.

or when $a = 99$:

$P(B=100) = (0\cdot\dfrac{1}{2} + \frac{1}{2}) \cdot \frac{1}{100}$

In this case $B$ can be either $100$ or $99$ each with a half probability. We only care about the case where $b = 100$.

continuing this pattern on, we see:

$P(B = 100) = \frac{1}{100}\sum_{i=1}^{100} \frac{1}{i}$

i.e. gives the probability via the law of total probability:

$P(B=b) = \sum_{a\in A} P(B=b|A=a)P(A=a)$

Now, what about $P(B=99)$. If $a=100$, then $P(B=99)$ is zero. For $a=99$, the probability is $\frac{1}{2}$ (by the same logic that $P(B=100)$ is $\frac{1}{2}$ when $a = 99$). Indeed, it will follow the same pattern as $P(B=100)$, but starting one higher as the $a=100$ case is now excluded:

$P(B=99) =\frac{1}{100} \sum_{i=2}^{100} \frac{1}{i}$

Let's check the pattern at the other extreme, $P(B=1)$. In this case $a$ must equal $1$ because if it is any higher we cannot draw $1$ for $b$:

$P(B=1) = \frac{1}{100} \sum_{i=100}^{100} \frac{1}{i} = \frac{1}{100}\frac{1}{100}$ as expected.

So we can write the probability distribution for $B$ as:

$P(B=b) = \frac{1}{100} \sum_{i=(101 - b)}^{100} \frac{1}{i}, \ b = 1, 2, ... 100.$

Checking it is a valid probability distribution

Let's check this is a valid probability distribution that sums to 1 over the range $b \in [1, 100]$ (we can define it as $0$ otherwise).

$\sum_{k=1}^{100} P(B = k) = \frac{1}{100} \sum_{k=1}^{100} \sum_{i=(101 - k)}^{100} \frac{1}{i}$

Looking at the sum, it's a bit of a mess and I can't see any nice identities that work in this case. Again, let's expand the summation terms and see exactly what we are dealing with.

We saw earlier that for $P(B=100)$ the terms are $\frac{1}{1} + \frac{1}{2} + \frac{1}{3} + ... + \frac{1}{100}$. For $P(B=99$) we have all of these terms except $\frac{1}{1}$, this pattern continues until $P(B=1)$ has only 1 term, $\frac{1}{100}$. This means we have:

$1\cdot\frac{1}{1} + 2\cdot\frac{1}{2} + ... + 100 \cdot \frac{1}{100} = 100$

With the $\frac{1}{100}$ factor in front of that double summation, our sum ends up as $1$ as desired.

So, now we have the distribution of $B$ well specified.

Next, we want to make a transformation so the distribution of $B$ is uniform. One approach to be to rescale the probabilities of the distribution $P(B=b)$ based on the value of $a$, I will leave this up to you to play around with.

Visualising in python

The distribution of $B$ (with python code) looks like:

distribution image

import matplotlib.pyplot as plt

def dist(b):
    sum_ = 0
    for i in range(101-b, 101):
        print(i)
        sum_ += 1/i
    return sum_ * (1/100)

y = [dist(k) for k in range(1, 101)]
x = range(1, 101)

plt.plot(x, y)
plt.xlabel("b")
plt.ylabel("P(B=b)")
plt.show()