How to calculate the expectation and variance of moment estimator of uniform distribution $U(a,b)$?

266 Views Asked by At

We know that the moment estimator of the parameter in the uniform distribution $U(a,b)$ is $$ \hat{a}=\overline X-\sqrt{3}S,\quad \hat{b}=\overline X+\sqrt{3}S $$ Where $\overline X$ is the sample mean and $S$ is $\sqrt{\frac1n \sum_{i=1}^n (X_i -\overline X)^2}$, now I want to calculate $E(\hat{a}),E(\hat{b}),Var(\hat{a}),Var(\hat{b})$, but it seems that this problem is not so easy to deal with...

I have been thinking for a long time and still don't know how to solve it...

Thank you in advance for your help!

2

There are 2 best solutions below

2
On

You may already have found $E(\bar X) = \mu = (a+b)/2.$ To find $E(S^2)$ is not so easy.

It may help to write the numerator of $S^2$ as

$$\sum(X_i - \bar X)^2 = \sum(X_i^2 - 2\bar X X_i + \bar X^2)\\ = \sum x_i^2 - 2\bar X\sum X_i + n\bar X^2\\ = \sum X_i^2 - 2n\bar X^2 + n\bar X^2 =\sum X_i^2 - n\bar X^2.$$ where all sums are taken over $i = 1,\dots,n$ and remembering that $n\bar X= \sum X_i.$

Then you only have to find $E(\bar X^2)$ and $E(\sum X_i^2).$ I will let you deal with that part on your own.


Addendum:

For the specific case, with $a = 1, b=2, n = 12,$ the following simulation in R gives gives reasonably good approximations for various relevant quantities. You can check some of your general formulas to see if they roughly match simulated results for the parameters above.

With a million iterations, most simulated values should be accurate to a couple of decimal places. However, remember that method-of-moments estimators are not always unbiased. $E(\bar X) = (a+b)/2$ is unbiased for $\mu,$ but the estimate of $\sigma$ involves nonlinear operations and so it is biased.

set.seed(2020)
a = 1;  b = 2;  n = 12
B = 10^6;  m = m.2 = s.2 = numeric(B)
for(i in 1:B) {
  x = runif(n, a, b)
  m[i] = mean(x);  m.2[i] = mean(x^2)
  s.2[i] = ((n-1)/n)*var(x)
  }
mean(m);  mean(m.2);  mean(s.2);  mean(sqrt(s.2))
[1] 1.500153    # aprx E(X-bar) = (a+b)/2 = 1.5
[1] 2.333788    # aprx E(X-bar-sq)
[1] 0.07637581  # aprx E(S^2) = 1/12
[1] 0.2733935   # aprx E(S)
mean(m) + sqrt(3)*mean(sqrt(s.2))
[1] 1.973684    # aprx b.est;  b=2 
0
On

Investigating this empirically, we may as well look at the case $X \sim U(0,1)$ and the distribution of $Y= \overline X-\sqrt{3}S$. You will then have $E(\hat{a})=a+(b-a)E(Y)$, $E(\hat{b})=b-(b-a)E(Y)$ and $Var(\hat{a})=Var(\hat{b})=(b-a)^2\, Var(Y)$.

In the simple case of $n=1$, you have $Y=X$ so uniform on $[0,1]$ with $E[Y]=\frac12$ and $Var(Y)=\frac1{12}$

When $n=2$ it seems $Y$ has an almost triangular distribution between $-\frac{\sqrt{3}-1}{2} \approx -0.366$ and $1$, with a mode (not quite sharp) somewhere near $0.01$, and mean of $\frac12-\frac1{\sqrt{12}} \approx 0.21$ and variance $\frac1{12}$ again.

For large $n$, I think $Y$ can range between (almost) $-0.5$ and $1$, with an expectation slightly above about $\frac{0.29}{n}$ and variance slightly above $\frac{0.13}{n}$

My R code

library(matrixStats)
set.seed(1)
cases <- 10^5
maxn <- 100
a <- 0
b <- 1
matdat <- matrix(runif(cases*maxn,a,b),ncol=maxn)
matcumsums <- rowCumsums(matdat)
matcumsums2 <- rowCumsums(matdat^2) 
matmeans <- matcumsums * matrix(rep(1/(1:maxn),cases), ncol=maxn, byrow=TRUE)
matvars <- matcumsums2 * matrix(rep(1/(1:maxn),cases), ncol=maxn, byrow=TRUE) - 
           matmeans^2 
matesta <- matmeans - sqrt(3*matvars)
colMeans(matesta)
colVars(matesta)

with simulated means

  [1] 0.499624655 0.210202738 0.127434881 0.089991392 0.069324295 0.056699796
  [7] 0.047447203 0.041174866 0.036002970 0.032069160 0.029030755 0.026417754
 [13] 0.024179518 0.022296740 0.020700905 0.019385791 0.018197157 0.017314217
 [19] 0.016347837 0.015474388 0.014661617 0.013974090 0.013369880 0.012716072
 [25] 0.012189324 0.011770728 0.011293438 0.010840350 0.010421719 0.010034514
 [31] 0.009639677 0.009391561 0.009078119 0.008845000 0.008574553 0.008314190
 [37] 0.008065365 0.007863782 0.007673246 0.007450993 0.007269078 0.007110111
 [43] 0.006928896 0.006765058 0.006600819 0.006453230 0.006287381 0.006188128
 [49] 0.006090569 0.005929131 0.005797132 0.005634333 0.005570216 0.005467236
 [55] 0.005362655 0.005299918 0.005214456 0.005124611 0.005045968 0.004954728
 [61] 0.004895074 0.004825454 0.004750904 0.004663461 0.004595883 0.004490434
 [67] 0.004406731 0.004342739 0.004294927 0.004246458 0.004178983 0.004110876
 [73] 0.004049843 0.003993215 0.003956870 0.003902800 0.003861287 0.003789137
 [79] 0.003726787 0.003680667 0.003633526 0.003558665 0.003518717 0.003484000
 [85] 0.003450912 0.003409833 0.003379188 0.003352407 0.003301703 0.003282139
 [91] 0.003235420 0.003211106 0.003150513 0.003125351 0.003084038 0.003070595
 [97] 0.003035180 0.003000492 0.002974085 0.002942333

and simulated varainces

  [1] 0.083775940 0.083787889 0.055965331 0.040656462 0.031385999 0.025486059
  [7] 0.021427148 0.018523830 0.016216256 0.014447130 0.013033191 0.011860605
 [13] 0.010884667 0.010062527 0.009335277 0.008735194 0.008198547 0.007726129
 [19] 0.007313706 0.006934076 0.006596030 0.006290006 0.006000930 0.005746835
 [25] 0.005512934 0.005295265 0.005101631 0.004910501 0.004735188 0.004571302
 [31] 0.004416566 0.004277066 0.004145073 0.004021727 0.003903547 0.003786156
 [37] 0.003682305 0.003580856 0.003488230 0.003398641 0.003315400 0.003237306
 [43] 0.003156615 0.003079142 0.003008861 0.002939991 0.002875689 0.002818242
 [49] 0.002761688 0.002700429 0.002646291 0.002594507 0.002541885 0.002492609
 [55] 0.002448299 0.002405762 0.002364762 0.002324767 0.002284448 0.002247144
 [61] 0.002206634 0.002172539 0.002139429 0.002103797 0.002072896 0.002038866
 [67] 0.002006683 0.001976749 0.001948909 0.001921107 0.001894383 0.001866487
 [73] 0.001841805 0.001817190 0.001792718 0.001768731 0.001744652 0.001722186
 [79] 0.001697249 0.001676544 0.001655759 0.001634675 0.001616250 0.001596556
 [85] 0.001577407 0.001558821 0.001541288 0.001522475 0.001506370 0.001491121
 [91] 0.001474489 0.001457634 0.001443099 0.001428441 0.001413804 0.001398885
 [97] 0.001384747 0.001370784 0.001356471 0.001341688

and multiplying these by $n$ or $n-1$ gives the following chart for means of $Y$

plot(colMeans(matesta)*((1:maxn)), ylim=c(0.2,0.55))
points(colMeans(matesta)*((1:maxn)-1),col="red")

enter image description here

and for variances

plot(colVars(matesta)*(1:maxn), ylim=c(0.08,0.18))
points(colVars(matesta)*((1:maxn)-1),col="red")

enter image description here