Finding Size-Bias Distributions

1.6k Views Asked by At

For a RV $W$ with mean $\mu$, let $W^*$ denote the $W$-size biased distribution (so that $EG(W^*)=\frac{E(WG(W))}{\mu}$ for all functions $G$ for which the expectations exist).

I would like to learn how to find a formula for $W^* $ when $W=X_1+\cdots +X_n$ is a sum of dependent RVs. From "Multivariate Normal Approximations by Stein's Method and Size Bias Couplings", https://arxiv.org/pdf/math/0510586v1.pdf page 3:

We replace $X_I$ with $X^* _I$ where the random index is chosen independently with $\mathbb{P}(I=i)=\frac{EX_i}{\sum EX_j}$, and adjusting the remaining variables to their conditional distribution given the new value of $X_I$.

I cannot quite understand this statement. I want to be able to apply this to find $W^*$ for particular examples. For example, I'd like to know how to find $W^*$ when $X_i \sim Be(p_i)$ for $i=1,\ldots, n$.

1

There are 1 best solutions below

7
On BEST ANSWER

You need to assume your original random variable $W$ is nonnegative and has $0<E[W]<\infty$. Let us consider the discrete case under this assumption...

Discrete case

Suppose $W$ takes values in some discrete set of nonnegative values $\mathcal{W}$. For each $w \in \mathcal{W}$ let $P[W=w]$ be the corresponding probability mass. We want to find the mass function of the new random variable $W^*$, which also takes values on $\mathcal{W}$. We have for each $w \in \mathcal{W}$:

$$P[W^*=w] = E[1\{W^*=w\}] \overset{(a)}{=} \frac{E[W 1\{W=w\}]}{E[W]} = \frac{wP[W=w]}{E[W]} $$

where we have used $G(W) = 1\{W=w\}$ in step (a). You can see that $P[W^*=w]$ indeed defines a valid probability mass function because it is nonnegative and satisfies $\sum_{w \in \mathcal{W}} P[W^*=w]=1$. So, you can pick $W^*$ via this new mass function, and we indeed have $E[G(W^*)] = \frac{E[WG(W)]}{E[W]}$ for general functions $G$.

Intuition from renewal theory

If you know renewal theory, you can understand $W^*$ this way: Generate nonnegative i.i.d. random variables $\{W_j\}_{j=1}^{\infty}$, all with probability mass function $P[W=w]$ for $w \in \mathcal{W}$. This defines a renewal system with inter-arrival times given by the $W_j$ values (so a timeline $t \geq 0$ consists of back-to-back renewal periods). Intuitively, you obtain $W^*$ by sampling the renewal system "uniformly over all times $t\geq 0$." In particular, $W^*$ is the "size of the renewal period as seen by a random arrival" and has mass function $$P[W^*=w] = \lim_{t\rightarrow\infty} \frac{1}{t}\int_0^t 1\{\mbox{Renewal frame at time $t$ has size $w$}\} dt $$ where the right-hand-side converges to $\frac{wP[W=w]}{E[W]}$ with prob 1 by the renewal-reward theorem.

If this intuition is helpful, then as a thought experiment, if $W=\sum_{i=1}^n X_i$, you can view each renewal period $W_j$ as chopped up into $n$ mini-periods, where the $i$th mini-period of $W_j$ has size $X_{ji}$. Now, sample the renewal system at a "random time," observe the particular index $J$ of the renewal period you are in, and the particular mini-index $I$ of the mini-period you are in. Indeed (by renewal theory) this index $I$ has distribution $P[I=i] = \frac{E[X_i]}{\sum_{k=1}^n E[X_k]}$ for $i \in \{1, ... n\}$. You can now either just take $W^*=W_J=\sum_{i=1}^n X_{Ji}$ for the particular renewal interval $J$ you are in, or you can randomly generate the $X_{Ji}$ values according to their joint conditional distribution given the observed $X_{JI}$ value. If this makes no sense and/or if you are not familiar with renewal theory, the next part may be more helpful (although it is also more calculation-intensive).

Generating $W^*$ from $W=\sum_{i=1}^n X_i$

Suppose that for each $i \in \{1, ..., n\}$, the $X_i$ random variable itself takes values in some discrete set $\mathcal{X}_i$ with mass function $P[X_i=x_i]$ for $x_i \in \mathcal{X}_i$. The components of the vector $(X_1, ..., X_n)$ can be dependent, so generally we have a joint mass function: $$P[(X_1,...,X_n)=(x_1,...,x_n)] \quad \forall (x_1,...,x_n) \in \mathcal{X}_1 \times \cdots \times \mathcal{X}_n$$

Randomly pick an index $I \in \{1, ..., n\}$ via mass function $$P[I=i] = \frac{E[X_i]}{\sum_{k=1}^n E[X_k]}.\tag{1}$$ Given $I=i$, generate $X_i^* \in \mathcal{X}_i$ via mass function $$P[X_i^*=x_i] = \frac{x_iP[X_i=x_i]}{E[X_i]}.\tag{2}$$ Given, also, that $X_i^*=x_i$, generate $(X_1^*, X_2^*, ..., X_n^*)$ via the joint conditional mass function $$P[(X^*_1, ..., X^*_n) = (x_1, ..., x_n)|I=i, X_i^*=x_i] = P[(X_1, ..., X_n)=(x_1,..., x_n)|X_i=x_i],\tag{3}$$ which can be computed from the full joint mass function for $(X_1, ..., X_n)$. Define $W^*=\sum_{m=1}^n X_m^*$, where the random $X_m^*$ values are obtained from the chosen random vector $(X_1^*, ..., X_n^*)$. It remains to show that $W^*$ has the desired distribution. We have for each $w \in \mathcal{W}$: \begin{align} &P[W^*=w] \\ &= \sum_{i=1}^n \sum_{x_i \in \mathcal{X}_i}P[W^*=w|I=i, X^*_i=x_i]P[I=i]P[X^*_i=x_i|I=i] \quad \mbox{[law of total probability]} \\ &= \sum_{i=1}^n \sum_{x_i \in \mathcal{X}_i}P[W=w|X_i=x_i]\frac{E[X_i]}{\sum_{k=1}^nE[X_k]} \frac{x_iP[X_i=x_i]}{E[X_i]} \quad \mbox{[the three definitions (1), (2), (3)]}\\ &= \sum_{i=1}^n \sum_{x_i \in \mathcal{X}_i}P[W=w|X_i=x_i]\frac{x_iP[X_i=x_i]}{E[W]} \quad \mbox{[recall $W=\sum_{k=1}^n X_k$]}\\ &= \frac{1}{E[W]}\sum_{i=1}^n \sum_{x_i \in \mathcal{X}_i} x_i E[1\{X_i=x_i\}1\{W=w\}]\\ &=\frac{1}{E[W]}E\left[ \sum_{i=1}^n\sum_{x_i \in \mathcal{X}_i} x_i 1\{X_i=x_i\}1\{W=w\}\right]\\ &= \frac{1}{E[W]}E\left[1\{W=w\}\sum_{i=1}^n X_i \right] \quad [\mbox{since $\sum_{x_i \in \mathcal{X}_i} x_i 1\{X_i=x_i\}=X_i$} ]\\ &= \frac{1}{E[W]}E[W 1\{W=w\}]\\ &= \frac{wP[W=w]}{E[W]} \quad \Box \quad \mbox{[since $E(W1\{W=w\}) = E(w1\{W=w\}) = wE(1\{W=w\})$]} \end{align}

Example

Suppose $n=2$ and $W=X_1+X_2$ where: \begin{align} P[(X_1,X_2)=(0,0)] &= \alpha \\ P[(X_1,X_2)=(0,1)] &= \beta \\ P[(X_1,X_2)=(1,0)] &= \gamma \\ P[(X_1,X_2)=(1,1)] &= \delta \end{align} where $\alpha, \beta, \gamma, \delta$ are nonnegative and sum to 1. Then $W \in \{0, 1, 2\}$ and \begin{align} P[W=0] &= \alpha\\ P[W=1] &= \beta + \gamma \\ P[W=2] &= \delta\\ E[W] &= \beta + \gamma + 2\delta\\ P[X_1=1]=E[X_1] &= \gamma + \delta \\ P[X_2=1]=E[X_2] &= \beta + \delta \end{align} Thus, $W^*$ has mass function: \begin{align} P[W^*=0] &= \frac{0P[W=0]}{E[W]} = 0\\ P[W^*=1] &= \frac{1P[W=1]}{E[W]} = \frac{\beta + \gamma}{\beta + \gamma + 2\delta}\\ P[W^*=2] &=\frac{2P[W=2]}{E[W]} = \frac{2\delta}{\beta + \gamma + 2\delta} \end{align} For this example, the simplest way of generating $W^*$ is flip a biased coin with $P[heads]=(\beta + \gamma)/(\beta + \gamma + 2\delta)$ and choose $W^*=1$ if heads and $W^*=2$ else.

On the other hand, you can generate $W^*$ this way: randomly pick an index $I \in \{1, 2\}$ with probabilities: \begin{align} P[I=1] &= \frac{E[X_1]}{E[X_1]+E[X_2]} = \frac{\gamma + \delta}{(\gamma+\delta)+(\beta+\delta)}\\ P[I=2] &= \frac{E[X_2]}{E[X_1]+E[X_2]} = \frac{\beta + \delta}{(\gamma + \delta)+(\beta + \delta)} \end{align}

CASE 1: If $I=1$ then generate a random $X_1^*$ via mass function $P[X_1^*=1]=\frac{1P[X_1]}{E[X_1]} = 1$. That is, just choose $X_1^*=1$. Then, generate $(X_1^*,X_2^*)$ via conditional probabilities $P[(X_1,X_2)=(x_1,x_2)|X_1=1]$. So we just generate $X_2^* \in \{0,1\}$ with probabilities:
\begin{align} P[X_2^*=1|\mbox{Case 1}] &= P[(X_1,X_2)=(1,1)|X_1=1] =\frac{P[(X_1,X_2)=(1,1)]}{P[X_1=1]} = \frac{\delta}{\gamma + \delta}\\ P[X_2^*=0|\mbox{Case 2}] &= P[(X_1,X_2)=(1,0)|X_1=1] = \frac{P[(X_1,X_2)=(1,0)]}{P[X_1=1]}=\frac{\gamma}{\gamma+\delta} \end{align} and then we form $W^*=X_1^*+X_2^* = 1 + X_2^*$.

CASE 2: If $I=2$ then generate a random $X_2^*$ via mass function $P[X_2^*=1]=\frac{1P[X_2]}{E[X_2]} = 1$. So just choose $X_2^*=1$. Now generate $X_1^* \in \{0,1\}$ via: \begin{align} P[X_1^*=1| \mbox{Case 2}] &= P[(X_1,X_2)=(1,1)|X_2=1] = \frac{P[(X_1,X_2)=(1,1)]}{P[X_2=1]} = \frac{\delta}{\beta + \delta}\\ P[X_1^*=0 | \mbox{Case 2}] &= P[(X_1,X_2)=(0,1)|X_2=1] = \frac{\beta}{\beta + \delta} \end{align} and then define $W^*=X_1^*+X_2^*=X_1^*+1$.

Overall, this alternative method seems more complex in the case when $n=2$, but it can be less complex for large $n$, since computing the probability masses for $W$ requires lots of summations, whereas computing the conditional masses $P[(X_1, ..., X_n) = (x_1, ..., x_n)|X_i=1]$ can be easier, particularly in the special case when the $X_i$ values are mutually independent. If $\{X_i\}_{i=1}^n$ are mutually independent and binary, then you just generate $I \in \{1, ..., n\}$, and if $I=i$ you choose $X_i^*=1$ and independently choose $X_j^*$ for each $j \neq i$ according to the marginal mass functions $P[X_j=1], P[X_j=0]$.