Given the inter-arrival distribution of busses, what is the distribution of time until your bus arrives?

562 Views Asked by At

This is one of those "share your knowledge" posts. I have an answer (which I'm going to post shortly), but would greatly appreciate if people could point out any issues with it or provide alternate methods of concluding faster.

We know the inter-arrival distribution of busses at a bus-stop follows a distribution with PDF: $f_S(s)$. Now, you arrive at the bus-stop at a time that is random. For the sake of this question, let's say you started at the time the busses started operating back in the day and drew a large uniform random number, $J$ to pick when you would come to the stop. This way, the process is well into its lifetime.

What is the distribution of time until you see the first bus?

2

There are 2 best solutions below

6
On

We know that $f_S(s)$ is the PDF of the interarrival times of the original process. Consider the figure below. Let's say the time at which you arrive at the bus stop lies between two events. Let's label them #$0$ and #$1$. The time between these two events is $T$. What is the PDF of this $T$? We know that larger intervals are more likely to harbor the end point of $J$ inside them. And since $J$ are uniform, the likelihood increases linearly with the size of the interval. So, the PDF of $T$ will be proportional to $tf_T(t)$. If we include the normalizing term, this PDF becomes:

$$g_T(t) = \frac{tf_S(t)}{E(S)}$$

Where $E(S)$ is given by:

$$E(S) = \int_{t=0}^{\infty} s f_S(s) ds$$ Now, we want the distribution of $X$, the time from the end of $J$ to event #$1$.

Point process

Consider:

$$P(x<X<x+\delta x) = \int_{t=x+\delta x}^{\infty} P(x<X<x+\delta x | T=t) g_T(t) dt$$

$$ = \int_{t=x+\delta x}^{\infty} \frac{\delta x}{t} \frac{t f_T(t)}{E(T)}$$ $$ = \frac{\delta x}{E(T)} P(T>x+\delta x)$$ Taking $\delta x$ to the other side and taking limits:

$$\lim_{\delta x \to 0} \frac{P(x<X<x+\delta x)}{\delta x} = \frac{P(T>x)}{E(T)}$$

$$h_X(x) = \frac{P(T>x)}{E(T)}$$

In other words, the PDF of $X$ is proportional to the survival function of $T$.

Consider the case of the Poisson process. We know here that $X$ must be exponentially distributed. And indeed, for the exponential distribution, the PDF is proportional to the survival function. In fact, since the exponential distribution is the only one that satisfies this property, we see that $X$ will have the same distribution as $S$ only for the Poisson process.

0
On

Instead of interpreting $J$ as a random variable, I will instead perceive $J$ as a very large and predetermined positive real number.

For each $n\in \mathbb{N}$ let $T_n$ denote the arrival time of the $n^{\text{th}}$ bus as the station. How can we find the distribution of $T_n$ for a fixed $n$? Here's one way. Set $$S_1=T_1 \\ S_2=T_2-T_1 \\ \vdots \\ S_n=T_n-T_{n-1}$$ Note the random vector $(S_1,S_2,\ldots,S_n)\sim f_{S_1S_2 \dots S_n}$ where $$f_{S_1S_2\dots S_n}(s_1,s_2,\ldots,s_n)=f_{S}(s_1)\times f_{S}(s_2)\times \dots \times f_{S}(s_n)$$ Next consider the change of variable $$(t_1,t_2,\ldots,t_n)=(s_1,s_1+s_2,\ldots,s_1+s_2+\dots +s_n)$$ We get $(T_1,T_2,\ldots,T_n)\sim f_{T_1T_2\ldots T_n}$ where $$f_{T_1T_2\ldots T_n}(t_1,t_2,\ldots,t_n)=f_{S_1S_2\ldots S_n}\big(t_1,t_2-t_1,\ldots,t_n-t_{n-1}\big)\Bigg|\frac{\partial(s_1,s_2,\ldots,s_n)}{\partial(t_1,t_2,\ldots,t_n)}\Bigg|$$ Since $\Bigg|\frac{\partial(s_1,s_2,\ldots,s_n)}{\partial(t_1,t_2,\ldots,t_n)}\Bigg|=1$, $$f_{T_1T_2\ldots T_n}(t_1,t_2,\ldots,t_n)=f_{S_1S_2\ldots S_n}\big(t_1,t_2-t_1,\ldots,t_n-t_{n-1}\big)$$ Note $(T_1,T_2,\ldots,T_n)$ is supported on the set $$\{(t_1,t_2,\ldots,t_n)\in (0,\infty)^n:t_1 \leq t_2 \leq \ldots \leq t_n\}$$ To obtain the distribution of $T_n$ we'll simply "integrate away" the previous $n-1$ variables to get the corresponding marginal: $$f_{T_n}(t_n)=\int_0^{t_n} \int_{t_1}^{t_n} \ldots \int_{t_{n-3}}^{t_n}\int _{t_{n-2}}^{t_n}f_{T_1T_2\ldots,T_n}(t_1,t_2,\ldots,t_n)\mathrm{d}t_{n-1}\mathrm{d}t_{n-2}\ldots \mathrm{d}t_2 \mathrm{d}t_1$$

Next, define a random variable $N$ by $$N=\min\{k\geq 1:T_k\geq J\}-1$$ It's easy to see that $N\sim p_N$ counts the number of bus arrivals on the time interval $[0,J)$. Moreover, $$p_N(0)=P(T_1\geq J)=\int_J^{\infty}f_{S}(x)\mathrm{d}x$$ For $n\geq 1$ we get with the total law of probability $$\begin{eqnarray*} p_N(n) & = & P(T_n<J,T_{n+1}\geq J) \\ & = & \int_0^J P(T_n<J,T_{n+1}\geq J|T_n=x)f_{T_n}(x)\mathrm{d}x \\ & = & \int_0^J \bigg[\int_{J-x}^{\infty}f_{S}(y)\mathrm{d}y\bigg]f_{T_n}(x)\mathrm{d}x \\ & = & \ \int_0^J \int_{J-x}^{\infty}f_{S}(y)f_{T_n}(x)\mathrm{d}y\mathrm{d}x \end{eqnarray*}$$ Now take $F_{T_N|N}$ as the conditional cdf of the arrival time of the last bus arrival on $[0,J)$ given the number of arrivals i.e. $$F_{T_N|N}(t|n)=P(T_N\leq t|N=n)$$ The conditional distribution of $T_N$ given $N$ is supported on $[0,J)$. If $n\geq 1$ and $t\in [0,J)$ are fixed, then \begin{eqnarray*} F_{T_N|N}(t|n) & = & P(T_N\leq t|N=n) \\& = & P(T_n\leq t|N=n) \\ & = & \frac{P(T_n\leq t,N=n)}{p_N(n)} \\ & = & \frac{P(T_n\leq t,T_{n+1}\geq J)}{p_N(n)} \\ & = & \frac{\int_0^tP(T_n \leq t,T_{n+1}\geq J|T_n=x)f_{T_n}(x)\mathrm{d}x}{p_N(n)} \\ & = & \frac{\int_0^t\bigg[\int_{J-x}^{\infty}f_{S}(y)\mathrm{d}y\bigg]f_{T_n}(x)\mathrm{d}x}{p_N(n)}\\ & = & \frac{\int_0^t \int_{J-x}^{\infty}f_{S}(y)f_{T_n}(x)\mathrm{d}y\mathrm{d}x}{p_N(n)}\end{eqnarray*} Differentiating with respect to $t$ yields our pdf for $T_N|N:$ $$f_{T_N|N}(t|n)=\frac{f_{T_n}(t)\int_{J-t}^{\infty}f_{S}(y)\mathrm{d}y}{p_N(n)}$$

Take $X$ to be the amount of time you wait for the bus to arrive after showing up at time $J$. Fix $x\geq 0$. Then $$\begin{eqnarray*} P(X\leq x) & = & \sum_{n=0}^{\infty}P(X\leq x|N=n)p_N(n) \\ & = & \int_J^{x+J}f_{S}(s)\mathrm{d}s +\sum_{n=1}^{\infty}P(X\leq x|N=n)p_N(n) \\ & = & \int_J^{x+J}f_{S}(s)\mathrm{d}s+\sum_{n=1}^{\infty}\int_0^JP(X\leq x|N=n,T_N=t)f_{T_N|N}(t|n)p_N(n)\mathrm{d}t \\ & = & \int_J^{x+J}f_{S}(s)\mathrm{d}s+\sum_{n=1}^{\infty} \int_0^J \bigg[\int_0^{x+J-t}f_{S}(s)\mathrm{d}s\bigg]f_{T_N|N}(t|n)p_N(n)\mathrm{d}t \\ & = & \int_J^{x+J}f_{S}(s)\mathrm{d}s+\sum_{n=1}^{\infty} \int_0^J \int_0^{x+J-t}f_{S}(s)f_{T_N|N}(t|n)p_N(n)\mathrm{d}s \mathrm{d}t \end{eqnarray*}$$ Taking a derivative yields the pdf of $X$: $$f_{X}(x)=f_{S}(x+J)+\sum_{n=1}^{\infty}\int_0^Jf_{S}(x+J-t)f_{T_N|N}(t|n)p_N(n)\mathrm{d}t$$