Calculating the probability that one analyst is correct over another

97 Views Asked by At

I have a question which goes like this.

Two analysts are in dispute about some data they expect to arise in an experiment. In total, they will receive 20 observations. One analyst believes that these should be a random sample from an exponential distribution with mean 1. The second analyst believes instead that the data come from a normal distribution with mean 2 and standard deviation 1. They come to you for advice on how to use the data to resolve their dispute.

(you can assume that the sum of 20 independent observations from a unit exponential distribution has a Gamma(20,1) distribution)

Your first suggestion is to calculate the average of the observations. You will endorse the first analyst’s view if the average is less than 1.5, and endorse the second analyst’s view of the average is greater than this.

Calculate the probability that you will endorse the second analyst’s view if, in fact, the first analyst is correct.

Is it correct to apply the central limit theorem here since we only have 20 observations? So far I have considered standardizing the sum of the random variables and using the CLT but unsure if this was correct.

If anyone could point me in the right direction, I will be extremely thankful.

1

There are 1 best solutions below

8
On

First, if you have a random sample $X_i$ of size $n = 20$ from $\mathsf{Exp}(\text{rate}= 1),$ then the sample mean has $\bar X \sim \mathsf{Gamma}(\text{shape}=20,\text{rate}=20).$

Consider the following simulation in R statistical software, where the sample average is denoted a.

m = 10^5; n = 20; x = rexp(m*n)
MAT = matrix(x, nrow=m)  # 100,000 by 20 matrix of std exponential data
a = rowMeans(MAT)        # 100,000 sample means
mean(a);  sd(a)
## 0.9994324    # aprx E(samp mean) = 1 
## 0.2242217    # aprx SD(samp mean) = 1/sqrt(20)
hist(a, prob=T, col="skyblue2")
curve(dgamma(x,20,20), add=T, lwd=2, col="blue")

enter image description here

Second, here are plots of the density functions for $\bar X \sim \mathsf{Gamma}(20,20)$ in blue and $\bar Y \sim \mathsf{Norm}(\mu=2, \sigma=1/\sqrt{20}).$

curve(dnorm(x, 2, 1/sqrt(20)), 0,3, col="maroon", lwd=2, lty="dashed", ylab="Density")
curve(dgamma(x, 20, 20), add=T, lwd=2, col="blue")

enter image description here

So it does seem reasonable to distinguish between the distributions according as the sample mean is above or below $1.5.$

Specifically, the probability of judging the population distribution to be normal when in fact it is exponential is $P(\bar X > 1.5) = 0.022.$

1 - pgamma(1.5, 20, 20) 
## 0.02187347

The separation is better than I expected, so no great harm would be done using a less-than perfect normal approximation to the above probability. The approximate probability is about 0.013.

1 - pnorm(1.5, 1, 1/sqrt(20))
## 0.01267366

But I persist in my campaign to use software to get exact probabilities instead of using questionable normal approximations.