Finding the variance of a statistic.

245 Views Asked by At

$X_1,\cdots,X_n$ are independent random variables from $N(\mu,\sigma^2)$ distribution. Define $$T=\frac{1}{2(n-1)}\sum_{i=1}^{n-1}(X_{i+1}-X_i)^2$$ I have shown that it is an unbiased estimator of the variance. I need to compare its variance to that of the sample variance. Now how do I find $Var(T)$?

Finding $E(T^2)$ simply by squaring the above expression and then tking expectation is becoming very clumsy!

2

There are 2 best solutions below

8
On

Simplifying the problem by writing $X_i = \mu + \sigma Y_i$ where $Y_1, Y_2, ...$ are IID $N(0,1)$ variables.

$$T = \frac{\sigma^2}{2(n-1)}\sum_{i=1}^{i=n-1}(Y_{i+1}-Y_i)^2 $$ $$\implies E[T^2] = \frac{\sigma^4}{4(n-1)^2}\{\sum_{i=1}^{i=n-1}E(Y_{i+1}-Y_i)^4+\sum_j\sum_{i \neq j}E(Y_{i+1}-Y_i)^2 E(Y_{j+1}-Y_j)^2\}$$

$$= \frac{\sigma^4}{4(n-1)^2} \{12(n-1)+4(n-1)(n-2)\} = \frac{\sigma^4}{(n-1)} \{n+1\}$$

$$\implies Var(T) = \frac{2\sigma^4}{n-1}$$

0
On

Method 1: Inductive

If $X_1,..., X_n$ are independent $N(0,\sigma^2)$ random variables, then their joint pdf, say $f(x_1,...,x_n)$ is:

$$f(x_1,...,x_n) = \prod _{i=1}^n \frac{1}{\sqrt{2 \pi } \sigma } \exp \left(-\frac{x_i^2}{2 \sigma ^2}\right)$$

[ As the problem at hand involves differences of Normal random variables, I have set the mean $\mu$ to zero without loss of generality.] For given values of $n$, we can now quite easily calculate $E[T^2]$. For example, if $n = 4$, we have:

and $T$ is:

Then, we seek $E[T^2]$:

where I am using the Expect function in the mathStatica package for Mathematica to do the nitty-gritties for me.

Repeating the same calculation $E[T^2]$ for $n= 2, 3,..., 9$ yields the sequence:

$$\{\frac{3\sigma^4}{1}, \frac{9\sigma^4}{4}, \frac{17\sigma^4}{9}, \frac{27\sigma^4}{16}, \frac{39\sigma^4}{25}, \frac{53\sigma^4}{36}, \frac{69\sigma^4}{49}, \frac{87\sigma^4}{64} , \dots\}$$

which, by induction, yields the general solution:

$$E[T^2] = \frac{n^2+n-3}{(n-1)^2} \sigma^4$$

The numerator sequence ${3,9, 17,27,39,53,\dots}$ is the pronic sequence $n(n+1)$ shifted by 3: http://en.wikipedia.org/wiki/Pronic_number

Finally:

$$\begin{align*}\displaystyle \operatorname{Var}(T) &= E[T^2] - (E[T])^2 \\ &= E[T^2] - \sigma^4 \\ & = \frac{3n-4}{(n-1)^2} \sigma^4 \end{align*}$$


Method 2: Zonal Polynomials

$$\color{red}{I~showed~the~problem~to~my~co-author,~Murray~Smith,~who~came~up~with~the~ following~wonderful~solution~using~zonal~polynomials. \\Here~is~Murray's~solution:}$$

The problem falls into the category of expectations of a quadratic form in normal variables. It's just a matter of putting it into that context by defining an appropriate set of random variables. Once that is done, the required expectations (here only the first two) are easy enough to write down. The statistic being studied is:

$$T=\frac{1}{2(n-1)}\sum_{i=1}^{n-1}\left( X_{i+1}-X_{i}\right) ^{2}$$

where vector $X=(X_{1},...,X_{n})^{\prime }\sim N(\mu i_{n},\sigma ^{2}I_{n}) $ and $i_{n}$ and $I_{n}$ denote, respectively, the $n$ -dimensional vector of units and the $n$-dimensional identity matrix.

Define these variables:

$$Y_{i}=\frac{X_{i}-\mu }{\sigma }\sim N(0,1) \qquad \qquad \text{ for }i=1,...,n$$

and

$$Z_{i}=Y_{i+1}-Y_{i}\sim N(0,2) \qquad \qquad \text{ for }i=1,...,n-1$$

where it should be noted that $Cov(Z_{i},Z_{i+1})=Cov(Z_{i},Z_{i-1})=-1$ and all higher order covariances are zero-valued. In matrix terms:

$$Z=(Z_{1},...,Z_{n-1})^{\prime }\sim N(0_{n-1},D)$$

where $0_{n-1}$ is the $(n-1)$ zero vector and the $(n-1)\times (n-1)$ matrix:

$$D=\left[ \begin{array}{rrrrrr} 2 & -1 & 0 & & & 0 \\ -1 & 2 & -1 & & & \\ 0 & -1 & 2 & \ddots & & \\ 0 & 0 & \ddots & \ddots & -1 & 0 \\ & & & -1 & 2 & -1 \\ 0 & & & 0 & -1 & 2% \end{array} \right]$$

Then, the statistic:

$$\begin{eqnarray*} T &=&\frac{\sigma ^{2}}{2(n-1)}Z^{\prime }Z \\ &=&\frac{\sigma ^{2}}{2(n-1)}A^{\prime }DA \end{eqnarray*}$$

where $A=D^{-1/2}Z\sim N(0_{n-1},I_{n-1}).$ The moments $E\left[ A^{\prime }DA\right] $ and $E\left[ \left( A^{\prime }DA\right) ^{2}\right] $ are given respectively by:

\begin{equation*} tr(D) \end{equation*}

and \begin{equation*} 2tr(D^{2})+\left( tr(D)\right) ^{2} \end{equation*} where $tr(\cdot )$ denotes the matrix trace operator (sums the elements on the leading diagonal). These expressions correspond to the first two top-order zonal polynomials in $D.$ For our particular form for $D,$ the above equate to $2(n-1)$ and $4(n-1)^{2}+12(n-3)+20.$ From these it is easy to get \begin{equation*} E\left[ T\right] =\sigma ^{2} \end{equation*} and \begin{equation*} Var\left( T\right) =\frac{(3n-4)\sigma ^{4}}{(n-1)^{2}} \end{equation*} ... exactly the same as Method 1 above.


Notes:

I should perhaps add that Murray and I are authors of the mathStatica software used in the first method. Also, the above results have also been checked with Monte Carlo simulation in Mathematica. This can be done, say for $n = 7, \sigma = 1$, with:

data = Table[ 1/(2 (7 - 1)) Total[Differences[
               RandomVariate[NormalDistribution[0, 1], 7]]^2], {50000}];
Moment[data, 2]

which returns a Monte Carlo estimate for $E[T^2]$:

1.45119