I'm working on Exercise 3.2 from Elements of Statistical Learning. It asks to find a $95\%$ confidence interval for a linear regression prediction (ordinary least squares are used) using two different methods.
In the first part we're looking for the confidence interval for a single prediction $x_0^T \hat{\beta}$, as far as I understand it's basically
$$x_0^T \hat{\beta} \pm z_{0.975} \cdot \hat{\sigma} \sqrt{x_0^T \left(\mathbf{X}^T \mathbf{X}\right) x_0},$$
where $\mathbf{X} \in \mathbb{R}^{N \times (p + 1)}$ - design matrix, $z_{0.975} \approx 1.96$ quantile of standard normal distribution and
$$\hat{\sigma} = \frac{1}{N - p - 1} \sum_{i = 1}^N \left( y_i - \hat{y}_i \right)^2$$
In the second part, we're looking for a confidence interval generated by a confidence set for the whole vector $\beta$:
$$C_\beta = \left\{ \beta: \; f(\beta) = (\beta - \hat{\beta})^T \mathbf{X}^T \mathbf{X} (\beta - \hat{\beta}) \leq \hat{\sigma} \chi_{p+1}^{2 \; (0.975)}\right\},$$
where $\chi_{p+1}^{2 \; (0.975)}$ is a $0.975$ quantile of $\chi_{p + 1}^2$. Here I'm having troubles. First of all, how to calculate confidence interval based on $C_\beta$? My guess was that it's equivalent to finding the following:
$$\min_{\beta \in C_\beta} x_0^T \beta \quad \text{and} \quad \min_{\beta \in C_\beta} x_0^T \beta$$
which at the same time is equivalent to
$$\begin{cases} \alpha x_0 = \nabla f (\beta) \\ \beta \in \partial C_\beta \end{cases}$$
for a fixed $x_0$ (here I used Lagrange multipliers method for solving an optimization problem). I've solved this and got the following confidence interval for $x_0^T \hat{\beta}$:
$$x_0^T \hat{\beta} \pm \hat{\sigma} \sqrt{\chi_{p+1}^{2 \; (0.975)} x_0^T \left(\mathbf{X}^T \mathbf{X}\right) x_0}$$
Is this right? My main concern is that it's just a $\chi^2$ correction of the first confidence interval, where we used estimate of $\sigma$ but not its true value, though is it indeed a confidence interval generated by $C_\beta$? And if it's not, could someone help with understanding what it actually is?
Thanks!
If your model is $Y=X\beta+\varepsilon$ with $\varepsilon\sim N_n(0,\sigma^2 I_n)$, then mean response at the point $ x_0$ (a particular row in $X$) is $$y_0= x_0^T\beta$$
The above is estimated from the fitted model by $$\hat y_0= x_0^T\hat\beta$$
(Note that you are looking for confidence intervals for $x_0^T\beta$, not for $x_0^T\hat\beta$ as you write.)
Verify that $E(\hat y_0)= x_0^T\beta$ and $\operatorname{Var}(\hat y_0)=\sigma^2 x_0^T(X^T X)^{-1} x_0$.
If $\sigma$ is specified, it is true that a $95\%$ confidence interval for $y_0$ would have limits approximately
$$\hat y_0\mp 1.96\sqrt{\sigma^2 x_0^T(X^T X)^{-1} x_0}$$
But if you estimate $\sigma$ by $\hat\sigma$, then the confidence interval should logically have limits $$\hat y_0\mp t_{0.025,n-p-1}\sqrt{\hat\sigma^2 x_0^T(X^T X)^{-1} x_0}\,\,,$$
where $t_{\alpha;n-p-1}$ is the $(1-\alpha)$th quantile of a $t$ distribution with $n-p-1$ degrees of freedom (assuming you have $p+1$ parameters to estimate and $n$ is the total number of observations).
As for the second method, I don't see what else to write for the confidence set apart from the trivial $$\{x_0^T\beta : \beta\in C_{\beta}\}$$