Meaning of "asymptotic distribution" of estimator in Generalized Linear Models.

76 Views Asked by At

We have a generalized linear model. Then maximum-likelihood estimator of parameters has asymptotic distribution $N(\beta, J^{-1})$: $$\hat \beta \rightarrow_d N(\beta, J^{-1}),$$

  • $\beta$ - true parameters (we know them in some cases, e.g. when doing simulations)
  • $J$ - Fisher information matrix $J = X^T S X$, where $X$ is plan matrix of size $n \times p$ and $S$ is something (assume we know how to calculate this term).

Say, that we do simulations and given fixed $X$ (we could subset $X$ to use more or less rows though) we generate $k$ times new $Y$ and based on that we estimate $\hat \beta$. I am not sure if I understand correctly what is asymptotic distribution here. It's either:

a. incresing number of observations (taking larger subsets of $X$ rows) guarantees that empirical $\hat \beta$ covariance matrix is closer and closer to $J^{-1}$ or

b. repeating experiment more times (this time $k$ gets bigger) guarantees that averaged empirical covariance matrix of $\hat \beta$ gets closer and closer to $J^{-1}$.

I suspect that a. is the true answer and (b.) is also somehow true, but with some changed detail and that (b.) is related to CLT theorem while (a.) is not. Could anyone make it clear?

1

There are 1 best solutions below

5
On

If I understand your question correctly, it seems you mix two concepts here:

  1. Asymptotic properties of estimated parameters;
  2. Bootstrap.

The asymptotic distribution is based on the population/sample concept. Let's say you have two random variables $(X, Y)$ (not the data), and the population model is $Y=\alpha + \beta X + \epsilon$. From this population model, you get some samples $(x_1, y_1), (x_2, y_2), \dots, (x_n, y_n)$. You want to estimate the population parameters $\alpha$ and $\beta$ using these samples, say they are $a$ and $b$, or $\hat{\alpha}$ and $\hat{\beta}$. What the asymptomic normality says is that as the sample size $n \rightarrow \infty$, the joint distribution of $a$ and $b$ are converging to a normal distribution. In reality, we only have data. Population is something we assume they exist. From this point of view, (a) is correct.

When you fix data $x$ and generating $k$ sets of data $y$, that's bootstrap. For each set of data $y$, you get an estimate of $\hat{a}$ and $\hat{b}$. In total, you will have $k$ sets of $\hat{a}$ and $\hat{b}$, then you can make inference using these $\hat{a}$s and $\hat{b}s$, e.g. the median, percentile. And hopefully (or proved under certain conditions), the joint distribution of $\hat{a}$ and $\hat{b}$ is converging to that of $a$ and $b$, as $k \rightarrow \infty$.