generalized method of moments and the case when solving linear regression with two error conditions

Question

generalized method of moments and the case when solving linear regression with two error conditions

793 Views Asked by Bumbble Comm At 01 Apr 2026 - 2:22

So, I am slowly getting introduced to generalized method of moments (GMM), but I am getting confused over some issues, and this is one of them:

I heard that GMM solves the problem that an estimator may not be able to satisfy both conditions, that is $E(x_t\epsilon_t) = 0$ and $E(\epsilon_t) = 0$. But I am having a hard time understanding a GMM estimator function created in this case - so we can create two super-functions (above two) that are zero - and how are they combined to form a single zero function that GMM requires?

In other words, in slide 13 of http://homepage.univie.ac.at/robert.kunst/gmm.pdf, there is OLS as GMM, but I am having a hard time understanding how a function is being created. Can anyone explain this?

Edit: OK, so it seems that what I am really having a problem is this: in OLS, it is often said that we need to satisfy the above equation and variance conditions (that expectation of variance always the same.). But in GMM usage of OLS, while instrumental variables $k_t$ are used so that $E(k_t \epsilon_t) = 0$, there are no other further conditions imposed. So, what's going on?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2013-10-27 16:12:19

Let's consider OLS first, then see how it can be viewed as a special case for GMM. I will not discuss GMM in general, just instrumental variables (IV). IV is an immediate generalization of OLS when the regressors are no longer predetermined.

OLS

No statistical assumptions

If we remove all statistical assumptions (except that the underlying model is linear), then the problem is a linear algebraic one:

$$ Y = X \beta, $$

where $Y \in \mathbb{R}^n$, $X \in \mathbb{R}^{n \times p}$ (so there are $p$ regressors and $n$ observations). Assuming $X$ is full-rank, then the least-squares solution, from just linear algebra, is

$$ \hat{\beta} = (X^TX)^{-1}X^TY. $$

Small sample properties

Now consider the linear model

$$ y_i = X_i \beta + \epsilon_i, $$

where $(x_i, \epsilon_i)$, $i = 1, \cdots, n$, are drawn from a probability space ("the model" or "DGP") $( \Pi\;_1 ^n \mathbb{R}^{p + 1}, \mu)$.

Statistical assumptions one needs to put on the model to get good small sample properties for $\hat{\beta}$:

Full-rank assumption. The random matrix $X$ must be full-rank $\mu$-a.s. for $\hat{\beta}$ to be well-defined.
Strict exogeneity, the conditional expectation $E[\epsilon|X] = 0 \in \mathbb{R}^n$. This implies $\hat{\beta}$ is unbiased.
Conditional homoskedasticity, $Var(\epsilon|X) = \sigma^2 I \in \mathbb{R}^{n \times n}$. This makes $\hat{\beta}$ BLUE. It has the smallest variance among linear estimators which are unbiased across all parameters $\beta$.
If assumption 3. is strengthened so that $\epsilon \in \mathbb{R}^n$ is multivariate normal, then the estimation problem is parametric and $\hat{\beta}$ becomes a MLE. Because the only source of error in $\hat{\beta}$ is $\epsilon$, this distributional assumption also specifies the distributions of test statistics like t- and F-statistic.

Large sample properties

Now replace the model by $( \Pi\;_1 ^{\infty} \mathbb{R}^{p + 1}, \mu)$ and consider the behavior of $\hat{\beta}$ as the sample size $n \rightarrow \infty$. Again start with the linear algebra and rewrite

$$ \hat{\beta} = (\frac{X^TX}{n})^{-1}\frac{X^TY}{n}. $$

The terms $(\frac{X^TX}{n})$ and $\frac{X^TY}{n}$ are just sample means. To connect sample mean to the true mean, one needs assumptions on the model.

Strict stationarity. The sequence of random vectors $(x_i, \epsilon_i)$, $i = 1, 2, \cdots$, needs to form a strictly stationary "time series". This makes the expectations $E(x_i^T x_i) \in \mathbb{R}^{p \times p}$ and $E(x_i \epsilon_i) \in \mathbb{R}^p$ independent of $i$.
The variance-covariance matrix $E(x_i^T x_i) \in \mathbb{R}^{p \times p}$ is positive-definite and $E(x_i \epsilon_i) = 0 \in \mathbb{R}^p$. This second condition $E(x_i \epsilon_i) = 0$ is the one that makes OLS a special case of IV. It is weaker than strict exogeneity.
Ergodicity. The sequence of random vectors $(x_i, \epsilon_i)$, $i = 1, 2, \cdots$, needs to form an ergodic "time series".

Assumptions 1. and 3. allow you to use the Law of Large Numbers on the estimation error

$$ \hat{\beta} - \beta = (\frac{X^TX}{n})^{-1}\frac{X^T\epsilon}{n}. $$

Because sample means converge to true mean ($\mu$ a.s.), you have consistency of OLS.

IV

So for OLS, key assumptions on the relationship between regressors $x_i$ and error $\epsilon_i$ that give you nice properties are

Strict exogeneity, $E(\epsilon|X) = 0$, for small samples,
Predetermined regressors, $E(x_i \epsilon_i) = 0$, for large samples. (This is weaker than strict exogeneity.)

But in econometrics sometimes you may not have these. For example, in your model

$$ y_i = X_i \beta + \epsilon_i, $$

there may be error in measurement for the regressor. Instead of observing $X_i$, you observe $W_i = X_i + \eta_i$. It's not hard to see that $W_i$ will not be orthogonal to $\eta_i$ as required by predetermined-ness.

This is where IV comes in. Suppose you have another set of variables $z_i \in \mathbb{R^q}$ such that (assuming now $\{ (z_i, x_i, \epsilon_i) \}$ is still ergodic and stationary)

$E(z_i \epsilon_i) = 0$. This is the moment condition of the GMM.
$E(z_i x_i^T) \in \mathbb{R}^{q \times p}$ is full-rank.

Then the IV estimator

$$ \hat{\beta}_{IV} = (X^T Z Z^T X)^{-1} X^T Z Z^T Y $$

is consistent by exactly the same arguments used for consistency of OLS. If you replace $z_i$ by $x_i$, IV becomes OLS.

generalized method of moments and the case when solving linear regression with two error conditions

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in ECONOMICS

Related Questions in STATISTICAL-INFERENCE

Trending Questions

Popular # Hahtags

Popular Questions