I'm going through Rao's Linear statistical inference and its application. On page 223, Rao states
... (the minimum of sum of square) is attained at $\pmb{\beta} = \widehat{\pmb{\beta}}$ and is $\textbf{unique}$ for all solutions $\widehat{\pmb{\beta}}$ (that satisfying the normal equation). A least squares estimator of the parametric function $\textbf{P}'\pmb{\beta}$ is defined to be $\textbf{P}'\widehat{\pmb{\beta}}$ where $\widehat{\pmb{\beta}}$ is any solution of the normal equation.
My questions here are:
- I'm confused about what the meaning of uniqueness here. Seems the LS estimator is not necessarily to be unique when the matrix $\textbf{X}$ is singular.
- Is the $\textbf{P}$ here referring to the vector of $\textbf{X}$ (i.e like a single observation) in the model? It is really confusing to me in the following argument on the same page at point (i).
This is addressed at the top of page 223, it is stated that the observational equation $Y=X\beta$ is not consistent in general, but the normal equations $X^TX \beta = X^T Y$ always admit a solution, i.e. there is always some $\beta$ such that the normal equation is satisfied.
They are not claiming that $\hat{\beta}$ is unique, they are claiming that the minimum value of the function $f(\beta) = (Y-X\beta)^T (Y-X\beta)$ is uniquely given by $f(\hat{\beta})$ where $\hat{\beta}$ is any solution to the normal equations.
P here is a fixed matrix (a constant matrix), and the idea is to estimate $P^T\beta$ by $P^T \hat{\beta}$ where $\hat{\beta}$ is any solution to the normal equation. In other words, we use the plug-in estimator.