What types of noise allow linear regression interpolation to be unbiased?

Question

What types of noise allow linear regression interpolation to be unbiased?

61 Views Asked by Bumbble Comm At 11 May 2026 - 3:13

In a statistics book I learned that if there are $n$ random variables $X_{t(0)},\dots,X_{t(n-1)}$ (with $t(i)\in\mathbf{R}$) which are independently distributed with distribution $\mathcal{N}(at(i)+b,1)$ (I believe the variance can also vary), then the variable $Y$ constructed by performing linear regression interpolation at time $s$ on the $X_{t(i)}$ has mean $as+b$.

They didn't give a proof, any reference where I can find one? And what if the noise is not normally distributed, would this approach still work?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

For simplicity I will consider the case where $b = 0$, though you can expand to include this.

Then given $(t_i)_{i=1}^n$ we have the assumption $x_i = x_{t(i)}$

$$ x_i = \mathcal{N}(a t_i, 1) = a t_i + \xi_i, \qquad \xi_i \sim \mathcal N(0,1).$$

For a regression model without intercept (i.e. $b = 0$), the regression line is determined by the approximation $\hat a$ which is given by the formula

$$ \hat a = \frac{\sum_{i=1}^n x_i t_i}{\sum_{i=1}^nt_i^2}.$$

So we would like to show that this is unbiased, i.e. that $\mathbf E[\hat a] = a$. To see this we plug in the formula for $x_i$ given above

$$ \begin{aligned} \mathbf E[\hat a] & = \frac{1}{\sum_{i=1}^n t_i^2} \sum_{i=1}^n t_i \mathbf E[x_i] \\ & = \frac{1}{\sum_{i=1}^n t_i^2} \sum_{i=1}^n t_i(at_i + \mathbf E[\xi_i] ) \\ & = \frac{1}{\sum_{i=1}^n t_i^2} \sum_{i=1}^n a t_i^2 \\ & = a \end{aligned} $$ which is as required.

Note that in the above we did not require that the errors are normally distributed: in fact so long as the model can be formalised to have errors with mean $0$ then the linear regression above will be unbiased.

What does change if either

The error is not normal.
The error is normal, but is dependent on $t_i$

is whether the linear regression line remains the best linear unbiased estimate: i.e. the one with lowest variance. This is generally not the case.

As an example (which I do not prove), if we have

$$ x_i = a t_i + \sqrt{t_i} \xi_i, \qquad \xi_i \sim \mathcal N(0,1),$$

then as before the linear regression line remains unbiased, however the alternate estimate

$$ \tilde a = \frac{\sum_i{x_i}}{\sum_i{t_i}}$$

is the best linear unbiased estimate (BLUE): and in particular

$$\text{Var}(\tilde a) \leq \text{Var}(\hat a).$$

Notes

In the above treatment I have made the simplifying assumption that the $t_i$ are fixed and non-random. This simplifies the detail, but has little overall impact on the answer since we would instead consider the conditional expectation $\mathbf E[\hat a| \underline t]$.

What types of noise allow linear regression interpolation to be unbiased?

There are 1 best solutions below

Related Questions in PROBABILITY-THEORY

Related Questions in STATISTICS

Related Questions in NORMAL-DISTRIBUTION

Related Questions in LINEAR-REGRESSION

Trending Questions

Popular # Hahtags

Popular Questions