Is it possible to calculate this special variance?

60 Views Asked by At

If i want to estimate the probability, that a random variable $X$ with any continuous distribution takes some value $>a$, i could estimate this with a sample from the correct distribution $X_1,...,X_N$, using the estimator: $\frac{1}{N}\sum_{i=1}^{N} \mathbb{I}_{(X_i > a)}$.

Now i want to calculate the variance of the estimator. Normally this would depend on the choice of distribution, but here $a$ is dependent on the distribution and is always a value, so that $P(X > a)=0.1$. Now because of this, i think the Variance of the estimator should always be the same for any distribution and so i should be able to calculate it, depending on N.

If i try i get this far:

$Var[\frac{1}{N}\sum_{i=1}^{N} \mathbb{I}_{(X_i > a)}] = \frac{1}{N^2}Var[\sum_{i=1}^{N} \mathbb{I}_{(X_i > a)}] = \frac{1}{N^2}(\mathbb{E}[(\sum_{i=1}^{N} \mathbb{I}_{(X_i > a)})^2]-(N*0.1)^2)$

Is it possible to continue from here?

1

There are 1 best solutions below

0
On BEST ANSWER

To facilitate analysis, let us denote these indicators you are using as the random variables $K_i \equiv \mathbb{I}(X_i > a)$. Assuming random sampling, regardless of the underlying distribution of $X_1,...,X_N$, you have:

$$K_1,...,K_N \sim \text{IID Bern}(\theta) \quad \quad \quad \theta \equiv \mathbb{P}(X_i > a).$$

So your estimator has a scaled binomial distribution:

$$\hat{\theta} \equiv \bar{K}_N \equiv \frac{1}{N} \sum_{i=1}^N K_i \sim \frac{1}{N} \cdot \text{Bin}(N,\theta).$$

Its mean and variance are given respectively by:

$$\mathbb{E}(\hat{\theta}) = \theta \quad \quad \quad \mathbb{V}(\hat{\theta}) = \frac{\theta (1-\theta)}{N}.$$

This estimator is really just the standard sample estimator for the unknown probability in a set of Bernoulli data. It is unbiased and has variance that converges to zero as $N \rightarrow \infty$. It has good convergence properties and is a commonly used estimator for this type of problem. What is important to note here is that the underlying distribution of the values $X_1,...,X_N$ doesn't actually matter --- once we look at the indicators, these have a Bernoulli distribution with probability equal to the underlying probability of interest.