Model selection

136 Views Asked by Bumbble Comm At 10 May 2026 - 5:55

You are given $N$ variables $x(1,t), \dots, x(N,t)$ which we observe at different discrete points in time $t$. Our goal is to explain a variable $y(t)$ using linear regression, that is,

$$y(t) = a + \sum_{k=1}^N I(k) b(k) x(k,t) + e(t),$$

where $e(t)$ is an error term, $I(k)$ is an indicator variable that equals $1$ if the variable $x(k,.)$ is included in the model, zero otherwise, $b(k)$ are the regression coefficients, $a$ is a constant. Assume properties are such that ordinary least squares can be applied to estimate the coefficients $b(k)$.

As an example, assume we have $N=4$ and choose $I(1) = I(4) = 1$ and $I(2) = I(3) = 0$, the resulting model is $$y(t) = a + b(1)x(1,t) + b(4)x(4,t) + e(t).$$ In the following, we consider the general case with $N$ variables.

a) Assume that Ben in our team developed a statistical test that you can apply to your models. The test creates a test statistics for each model and the true distribution of the test statistics is known. The null hypothesis is that the model is "not good", the alternative is that the model is "good". You apply the test to all of the $K$ models that you constructed and choose the confidence interval $1-a$ equal to $95\%$.

(i) If $K$ is very large, how many models do you expect to be considered "good" by the test? (ii) Assume that $100\%$ of your models that you test are "not good". How many models will be considered "good" in this case?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 13 Apr 2018 - 7:19

With $N$ variable you have total of $ 2 ^ N$ models, including the null model (intercept only) and the model with all $N$ variables. In the question $ 2 ^ N = K$. Let $X$ be the number of rejected null hypotheses in $K$ tests. If the tests are independent, then under $H_0$, $ X \sim Bin( K, \alpha)$, hence the expected number of falsely declared "good" models out of $K$ possible models is $$ \mathbb{E}[X] = K\alpha. $$ Namely, under $H_0$ you expect to falsely reject $H_0$ in $\alpha \%$ of the tests.

Model selection

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in MATHEMATICAL-MODELING

Related Questions in LINEAR-REGRESSION

Trending Questions

Popular # Hahtags

Popular Questions