What is the expected 4th order cumulant for a Gaussian variable sample?

472 Views Asked by At

I understand that theoretically for a Gaussian (Normal) distribution, the 3rd order and higher cumulants are all 0. However, when I am calculating the 4th order cumulant for each column of a matrix of normally distributed numbers, I am obtaining a graph as shown belowenter image description here, with the cumulant value wildly oscillating around 0 (the oscillations grow much larger as the variance grows). Is this to be expected?

1

There are 1 best solutions below

0
On

I'm probably going to get this wrong, but here goes! I am assuming that you are calculating 5000 different estimates of the cumulant? So, the heights of each line are the individual estimates?

You don't say how you are computing your estimate of the cumulant, but the `standard' estimators are the k-statistics. See:

https://mathworld.wolfram.com/k-Statistic.html

For the 4th order cumulant, you can calculate $k_4$, which is an unbiased estimator of the 4th order cumulant $\kappa_4$.

$k_4$ is a statistic and its value will depend on the sample it is calculated on. Use a different sample and you'll get a different estimate. Since it's an unbiased statistic (from the link) we know that $E(k_4) = \kappa_4 = 0$ for a Gaussian random variable. However, it's the expectation of $k_4$ that is zero, not the actual values. This is why the values in your plot are centred around zero.

How much are they spread around zero? Well, the Mathworld link handily has the formula (15) for the variance of $k_4$ and the ONLY term in this which is non-zero is

$$\frac{ 24(n+1) \kappa_2^4}{(n-1)(n-2)(n-3)}$$

and actually, for a Gaussian, the second cumulant $\kappa_2 = \sigma^2$, which is the variance of the Gaussian. (It's the only non-zero term because all of the others contain cumulants of order 3 or higher, which are all zero for the Gaussian).

So, to find out why your graph oscillates roughly between -0.1 and 0.1 I guessed that your $n=100$ (you're taking samples of size 100) and that you used standard normal random variables so $\kappa_2 = \sigma^2 = 1$. If you substitute those values into formula above then, using the "usual" confidence interval formula (point $\pm$ twice the square root of the sampling variance) gives $(-0.1015, +0.1015)$ as an approx 95% confidence interval, which seems pretty close to what your plot is showing.

Since this confidence interval and variance of $k_4$ depends on $\sigma^2$ if you increase the value of the Gaussian random variable's variance this will increase the size of the values in your plot.

Another way of displaying the information would be in a histogram or density estimate plot as I don't think your estimates have meaning as a time series plot (but I'm being unnecessarily picky, so sorry about that).