Almost sure convergence of nonrandom sample

118 Views Asked by At

This is a question about almost sure convergence. Consider the following set-up:

  1. There are $B$ banks. Each has size $S_{b}$, which follows a size distribution $f_{S}$ with mean E[S]. $f_{S}$ is positively skewed (but with finite variance).

  2. The probability that firm $f$ borrows from bank $b$ follows a Bernoulli distribution: $P(a_{b}=1)=p_{b}=\frac{\tilde{S}_{b}}{Z}B^{-\zeta}$. Where:

    $B$ is the number of banks.

    $\tilde{S}_{b}$ is the size of bank $b$ to the average bank size: $\frac{S_{b}}{E[S]}$.

    $Z$ is a scaling parameter.

    $\zeta \in (0,1)$.

This set-up implies that, everything else equal, larger banks lend to more firms (compared to smaller banks).

The question is about to what the following converges: \begin{equation} \frac{1}{\sum_{b=1}^{B}a_{b}(B)}\sum_{b=1}^{B}a_{b}(B)S_{b}\xrightarrow{a.s.} \end{equation}

It will not be the population mean E[S], because our sampling procedure tends to have larger banks in the sample.

Thanks

1

There are 1 best solutions below

6
On

Sampling based on bank size skews the sampled mean. Here is a simplified setting that shows what tends to happen:

Let $\{S_1, S_2, S_3, \ldots \}$ be i.i.d. bank sizes with finite mean $E[S]$. Let $\{a_1, a_2, a_3, \ldots\}$ be a sequence of Bernoulli variables where $a_b$ depends only on $S_b$. Assume $Pr[a_b=1|S_b]=g(S_b)$ for some given function $g(s) \in [0,1]$ (an example is $g(s) = s/S_{max}$, assuming $S_b \in [0, S_{max}]$ for some given maximum value $S_{max}$, which says the probability of sampling depends on the bank size). Then $\{a_1, a_2, a_3, \ldots\}$ is an i.i.d. sequence, also $\{a_1S_1, a_2S_2, a_3S_3, \ldots\}$ is an i.i.d. sequence, and by the law of large numbers we have (with prob 1):

$$ \lim_{B\rightarrow\infty} \left[\frac{\sum_{b=1}^Ba_bS_b}{\sum_{b=1}^Ba_b}\right] = \lim_{B\rightarrow\infty} \left[\frac{\frac{1}{B}\sum_{b=1}^Ba_bS_b}{\frac{1}{B}\sum_{b=1}^Ba_b}\right] = \frac{E[a_1S_1]}{E[a_1]}$$

We get $E[a_1S_1]/E[a_1] = E[S_1]$ only if $a_1$ and $S_1$ are uncorrelated, which is typically not the case. If $a_1$ and $S_1$ are positively correlated then $E[a_1S_1]/E[a_1] > E[S_1]$, which is intuitive since it means we tend to sample when the bank size is larger than the mean.

If $g(s) = s/S_{max}$ for $s \in [0, S_{max}]$ then: $$ \frac{E[a_1S_1]}{E[a_1]} = \frac{E[S_1^2]}{E[S_1]} = E[S_1] + \frac{Var(S_1)}{E[S_1]}$$


Your question setting is more complicated because the $a_b(B)$ coefficients depend on $B$, and the standard LLN cannot be used. Assume $\{S_1, S_2, ...\}$ is an iid sequence with mean $E[S]$ and with $S_b \in [0, S_{max}]$ for some finite max size $S_{max}$. For each given $B$, define $\{a_1(B), a_2(B), ..., a_B(B)\}$ as Bernoulli variables where $a_b(B)$ depends only on $S_b$, and has: $$ Pr[a_b(B)=1|S_b] = \frac{S_bB^{-\xi}}{ZE[S]} $$ This is a valid probability whenever the constant $Z$ satisfies $ZE[S] \geq S_{max}$. Notice that: $$ E[a_b(B)] = E[E[a_b(B)|S_b]] = \frac{B^{-\xi}}{Z} $$ Define scaled values $\tilde{a}_b(B) = ZB^{\xi}a_b(B)$. Thus, $E[\tilde{a}_b(B)]=1$. Then: $$ \frac{\sum_{b=1}^Ba_b(B)S_b}{\sum_{b=1}^Ba_b(B)} = \frac{\sum_{b=1}^B\tilde{a}_b(B)S_b}{\sum_{b=1}^B\tilde{a}_b(B)} = \frac{\frac{1}{B}\sum_{b=1}^B\tilde{a}_b(B)S_b}{\frac{1}{B}\sum_{b=1}^B\tilde{a}_b(B)} $$ You want to show that, with prob 1: \begin{align} &\lim_{B\rightarrow\infty} \frac{1}{B}\sum_{b=1}^B\tilde{a}_b(B)=1\\ &\lim_{B\rightarrow\infty} \frac{1}{B}\sum_{b=1}^B\tilde{a}_b(B)S_b= \frac{E[S^2]}{E[S]} \end{align} Notice that no relationship between $a_1(B)$ and $a_1(B+1)$ has been defined, yet, the Borell-Cantelli lemma is a way to proceed even without this definition (see below).

For preliminary intuition, notice that the means of the numerator and denominator coincide with the conjectured limiting values: \begin{align} &E\left[\frac{1}{B}\sum_{b=1}^B\tilde{a}_b(B)\right] = 1\\ &E\left[\frac{1}{B}\sum_{b=1}^B\tilde{a}_b(B)S_b\right] = \frac{E[S^2]}{E[S]} \end{align}


For the first limit: Fix $\epsilon>0$. Then for each $B$: \begin{align}Pr\left[\left|\frac{1}{B}\sum_{b=1}^B (\tilde{a}_b(B)-1)\right|>\epsilon\right] &\leq \frac{E\left[ \left(\sum_{b=1}^B (\tilde{a}_b(B)-1)\right)^4 \right]}{\epsilon^4B^4} \\ &= \frac{BE[(\tilde{a}_1(B)-1)^4] + 3B(B-1)E[(\tilde{a}_1(B)-1)^2]^2}{\epsilon^4 B^4}\\ &\leq \frac{BO(B^{3\xi}) + O(B^2)O(B^{2\xi})}{\epsilon^4B^4} \\ & \leq \frac{O(1)}{\epsilon^4 B^{2(1-\xi)}} \end{align} If $\xi \in [0,1/2)$ then $2(1-\xi) > 1$ and $\sum_{B=1}^{\infty}\frac{1}{B^{2(1-\xi)}} < \infty$, so Borel-Cantelli implies (with prob 1, and assuming $\xi \in [0, 1/2)$): $$ \lim_{B\rightarrow\infty} \frac{1}{B}\sum_{b=1}^B\tilde{a}_b(B) = 1 $$ Now, it might be the case that we also get convergence for $\xi \in [1/2, 1)$ (I don't know), but then $\sum_{B=1}^{\infty}\frac{1}{B^{2(1-\xi)}} = \infty$ and the above proof fails. We need another way to prove, I also suspect you would need to specify the relationship between $a_1(B)$ and $a_1(B+1)$ if you want to explore $\xi \in [1/2, 1)$. The standard LLN proof of using finite variance and nonnegativity does not seem to work here since the coefficients $a_b(B)$ are doubly indexed.