Suppose there is a Normal Distribution with $\mu$= 5 and $\sigma$ = 5. We have 100 random samples from this distribution.
- Person 1 is given the 100 samples at once and is told to estimate $\mu$ and $\sigma$
- Person 2 is given the these same 100 samples but broken into sets of 10. Person 2 estimates $\mu$ and $\sigma$ from each set, and then averages all estimates.
To write the estimation strategies more concisely:
- Person 1 ($n=100$):
$$\hat{\mu} = \frac{1}{n}\sum_{i=1}^{n} x_i$$ $$\hat{\sigma}^2 = \frac{1}{n-1}\sum_{i=1}^{n} (x_i - \hat{\mu})^2$$
- Person 2 ($n=100$, $m=10$):
$$\hat{\mu}_{\text{avg}} = \frac{1}{m}\sum_{j=1}^{m} \hat{\mu}_j = \frac{1}{m}\sum_{j=1}^{m} \left(\frac{1}{n}\sum_{i=1}^{n} x_{ij}\right)$$
$$\hat{\sigma}^2_{\text{avg}} = \frac{1}{m}\sum_{j=1}^{m} \hat{\sigma}^2_j = \frac{1}{m}\sum_{j=1}^{m} \left(\frac{1}{n-1}\sum_{i=1}^{n} (x_{ij} - \hat{\mu}_j)^2\right)$$
Here is an R simulation for this situation:
mu <- 5
sigma <- 5
n_samples <- 100
n_pairs <- 10
n_iterations <- 100
# vectors
person1_means_diff <- numeric(n_iterations)
person1_vars_diff <- numeric(n_iterations)
person2_means_diff <- numeric(n_iterations)
person2_vars_diff <- numeric(n_iterations)
for (i in 1:n_iterations) {
# Generate samples
samples <- rnorm(n_samples, mean = mu, sd = sigma)
# Person 1
person1_means_diff[i] <- abs(mean(samples) - mu)
person1_vars_diff[i] <- abs(var(samples) - sigma^2)
# Person 2
pair_means <- sapply(split(samples, rep(1:(n_samples/n_pairs), each = n_pairs)), mean)
pair_vars <- sapply(split(samples, rep(1:(n_samples/n_pairs), each = n_pairs)), var)
person2_means_diff[i] <- abs(mean(pair_means) - mu)
person2_vars_diff[i] <- abs(mean(pair_vars) - sigma^2)
}
df_person1_diff <- data.frame(Iteration = 1:n_iterations, Mean = person1_means_diff, Variance = person1_vars_diff)
df_person2_diff <- data.frame(Iteration = 1:n_iterations, Mean = person2_means_diff, Variance = person2_vars_diff)
plot_diff <- function(df, person) {
p1 <- ggplot(df, aes(x = Iteration)) +
geom_line(aes(y = Mean), color = "red") +
labs(title = paste("Absolute Differences in Means for", person), x = "Iteration", y = "Absolute Difference in Mean") +
theme_bw()
p2 <- ggplot(df, aes(x = Iteration)) +
geom_line(aes(y = Variance), color = "blue") +
labs(title = paste("Absolute Differences in Variances for", person), x = "Iteration", y = "Absolute Difference in Variance") +
theme_bw()
list(p1, p2)
}
My Question: For finite sample sizes, do there exist any Probability Distribution Functions (e.g. https://en.wikipedia.org/wiki/Mixture_distribution) or situations where Person 1's strategy will significantly outperform Person 2's strategy?
Thanks!
References:
