Estimation of $\operatorname{var}[P(X > 20\mid p)]$ while $X\mid p\sim \text{Bin}(100, p)$ and $p \sim \text{Beta}(60, 30)$

60 Views Asked by At

I have $p \sim \text{Beta}(60, 30)$.

And I have to estimate $\operatorname{var}[P(X > 20) | p]$ while $X \sim \text{Binomial}(100, p)$.

I think I can sample $p$, calculate $P(X > 20)$ by sampling for $X$ for each sample of $p$, derive an array of calculations, and take variance of that array(using a software)

But this involves two processes of sampling, so I wonder if there is anyway to tackle this analytically. Is there any one to help me out?

1

There are 1 best solutions below

2
On

I would have thought this variance is going to be very close to $0$. You seem to have

  • $\mathbb E[p] = \frac23$ and $\mathbb P\left(X \gt 20 \mid p=\frac23\right) \approx 1 - 1.2 \times 10^{-21}$,
  • $\mathbb P\left(p \le \frac12\right) \approx 0.00067$ and $\mathbb P\left(X \gt 20 \mid p=\frac12\right) \approx 1 - 5.6 \times 10^{-10}$.

So as a random variable $\mathbb P(X \gt 20)$ is usually very close to $1$, making its variance small

Here is an numerical integration approach in R, which also estimates the variance of $\mathbb P(X \le 20)$, which should be the same but may avoid some precision errors in calculation. Essentially it splits the unit interval into a number of equal length subintervals, find the probability $p$ lies in each, finds $P(X \gt 20)$ conditioned on the value in the middle of each subinterval and estimates the variance from these

varest <- function(cases, n=100, T=20, alpha=60, beta=30){ 
  p <- ((1:cases) - 1/2) / cases
  probp <- pbeta(p + 1/2/cases, alpha, beta) - pbeta(p - 1/2/cases, alpha, beta) 
  probXgtT <- 1 - pbinom(T, n, p)
  probXleT <- pbinom(T, n, p)
  meanprobXgtT <- sum(probp * probXgtT)
  meanprobXleT <- sum(probp * probXleT)
  varprobXgtT  <- sum(probp * probXgtT^2) - meanprobXgtT^2
  varprobXleT  <- sum(probp * probXleT^2) - meanprobXleT^2
  c(varprobXgtT, varprobXleT)
  }

Trying different numbers of subintervals from $2$ to $2^{20}$ (just over a million) would give

> for (cases in 2^(1:20)){
+   print(varest(cases))
+   }
[1] 1.477861e-05 1.477861e-05
[1] 1.053391e-11 1.053376e-11
[1] 7.167600e-13 7.167463e-13
[1] 2.009504e-14 2.007053e-14
[1] 5.551115e-15 5.528506e-15
[1] 3.663736e-15 3.565419e-15
[1] 2.997602e-15 3.152373e-15
[1] 3.108624e-15 3.053590e-15
[1] 2.997602e-15 3.029168e-15
[1] 2.997602e-15 3.023079e-15
[1] 3.108624e-15 3.021558e-15
[1] 2.886580e-15 3.021178e-15
[1] 3.108624e-15 3.021083e-15
[1] 3.108624e-15 3.021059e-15
[1] 2.997602e-15 3.021053e-15
[1] 2.997602e-15 3.021052e-15
[1] 2.997602e-15 3.021052e-15
[1] 2.997602e-15 3.021051e-15
[1] 2.997602e-15 3.021051e-15
[1] 2.997602e-15 3.021051e-15

which suggests to me that $\operatorname{var}[P(X > 20)] \approx 3 \times 10^{-15}$

Using the same method would suggest that $P(X > 20) \approx 1-2.76 \times 10^{-11}$