I have a question I am little unsure about
We have linear regression model $y_{i}=B_{0}+B_{1}x_{i1}+B_{2}x_{i2}+u_{i}$ where $x_{i1}$ is an endogenous regressor and $x_{i2}$ is an exogenous regressor (control variable)
We have test regression (1) $x_{i1} = \pi_{0}+\pi_{1}z_{i1}+\gamma_{2}x_{i2}$ where we assume that $z_{i1}$ is a relevant, exogenous instrument.
Why is the control variable $x_{i2}$ included in the test regression (1)? In other words, why is it more useful to know that we reject $H_{0}: \pi_{1} = 0$ in $x_{i1}=\pi_{0}+\pi_{1}z_{i1}+\gamma_{2}x_{i2}+v_{i}$ than that we reject $H_{0}: \pi_{1} = 0$ in $x_{i1} = \pi_{0}+\pi_{1}z_{i1}+v_{i}$?
So my thinking was that if we include $x_{i2}$ in the test regression (1), then were ensuring that $x_{i1}$ is being regressed on all exogenous regressors because then we obtain fitted values $\hat{x}_{i1}$ which in the second stage of 2SLS, we use to obtain a consistent estimater $\hat{B}_{2SLS}$. If we simply use a test regression $x_{i1}=\pi_{0}+\pi_{1}z_{i1}+v_{i}$ then $z_{i1}$ is no longer exogenous because our exogenous control $x_{i2}$ is now contained within the error term $v_{i}$. As a result, $x_{i1}$ is no longer being regressed on only exogenous variables. In this case, when we obtain fitted values $\hat{x}_{i1}$, these will simply be used in the second stage of 2SLS to give us an estimator that is no longer consistent $\hat{B}_{2SLS}$.
But I am not sure if I am on the correct lines. Hope you guys can help.
Many thanks!
$X_2$ might be a variable that affects both $X_1$ and $Z$. (i.e., a parent of the instrument $Z$ and the exposure $X_1$). If you don't control for it, you may observe a spurious correlation between $Z$ and $X_1$ that is created via $X_2$, even when there is no direct effect/association between $X_1$ and $Z$. In such a case, your instrument is useless/invalid, though you rejected the null hypothesis of $\pi_1 = 0$.