Higher moments of linear regression residuals?

86 Views Asked by Bumbble Comm At 11 May 2026 - 6:20

Background

In the following linear regression with i.i.d $\epsilon_i$ $(i = 1, \cdots, n)$ with mean 0 finite variance $\sigma^2$, \begin{align*} Y_i = X_i^\intercal\beta + \epsilon_i \end{align*} we know the least-squares estimator for $\beta$ is \begin{align*} \hat{\beta}=(X^\intercal X)^{-1}X^\intercal Y \end{align*} where $X = (X_1, \cdots, X_n)^\intercal \in \mathbb{R}^{n\times p}$, $Y = (Y_1,\cdots, Y_n)^\intercal \in \mathbb{R}^n$. Now, we can easily derive the variance of $\hat{\epsilon} = Y - X\hat{\beta}$ with \begin{align*} \text{Var}(\hat{\epsilon}) &= \text{Var}((I-H)Y) \\ &\overset{\heartsuit}{=} (I-H)\text{Var}(Y)(I-H)^\intercal \\ &= \sigma^2(I-H)(I-H)^\intercal \\ &\overset{\spadesuit}{=} \sigma^2(I-H) \end{align*} where $H = X(X^\intercal X)^{-1}X^\intercal$ is a projection matrix of rank $p$, thus justifying equality $(\spadesuit)$. Hence, this allows us to find an unbiased estimator for $\sigma^2$, since \begin{align*} \text{Trace}(\text{Var}(\hat{\epsilon})) = \sigma^2\text{Trace}(I-H) = \sigma^2(n-p) \implies \widehat{\sigma^2}=(n-p)^{-1}\|Y - \hat{Y}\|_2^2 \end{align*} An attempt for higher moments

How would we generalize this computation in coming up with unbiased estimators for the third centralized moments \begin{align*} \mu_3 \overset{\text{def}}{=} \mathbb{E}\epsilon^3_i \end{align*} or beyond? Here's my attempt. Define \begin{align*} \mathcal{S}(\hat{\epsilon}) = \mathbb{E}(\hat{\epsilon}_i - \mathbb{E}\hat{\epsilon}_i)^{\otimes 3} \end{align*} I don't understand tensor products quite well, but I'm guessing \begin{align*} \mathcal{S}(\hat{\epsilon}) = (I-H) \underset{(I-H)}{\mathcal{S}(Y)}(I-H)^\intercal = \mu_3(I-H) \underset{(I-H)}{I_{n\times n \times n}}(I-H)^\intercal \end{align*} in the same way as equality $(\heartsuit)$ under "Background". I included the additional underset $(I-H)$ to elicit the idea that we are now "matrix" multiplying along the additional dimension induced from the 3-dimensional tensor. But, I don't quite understand what it means to multiply in this direction. Am I on the right track? Are there other matrix tricks I could use?

Edit: I found that that tensor product is simply a 3-mode product, resulting in \begin{align*} [\mathcal{S}(\hat{\epsilon})]_{ijk} = \mu_3 \sum_{v=1}^{n}M_{iv}M_{jv}M_{kv} \end{align*} where $M = I - H$ is still a projection matrix. How to proceed from here?

Original Q&A

Higher moments of linear regression residuals?

Related Questions in LINEAR-ALGEBRA

Related Questions in STATISTICS

Related Questions in TENSOR-PRODUCTS

Related Questions in PROJECTION-MATRICES

Trending Questions

Popular # Hahtags

Popular Questions