Let $D = \{(x_1, y_2), (x_2, y_2), \ldots , (x_n, y_n)\}$ where $x_i \in \mathbb{R}^d$ and $y_i \in \mathbb{R}$. One may use linear regression to predict $y$ as $w^Tx$ for some parameter vector $w \in \mathbb{R}^{d}$. Consider matrix $X \in \mathbb{R}^{n\times d}$ with $x_i$ as rows and the vector $y \in \mathbb{R}^n$ as the vector of $y_i$. Given an OLS loss function $$ \arg\min_w \hat{R}(w)=\arg\min_w \sum^n_{i=1}(y_i - w^Tx_i)^2$$
a) Show that for $n < d$ that the OLS loss function does not admit a unique solution
b) Under what assumptions on $X$ does this equation admit a unique solution $w^*$?
i. There exists a unique solution if $n \geq d$ and the columns of $X$ are independent
ii. There exists a unique solution $\iff$ $X^TX$ is invertible
iii. For $n > d$, there will always be a unique solution if $X$ is full rank.
I'm a little unsure how to write a proof on this question. My initial idea is to derive a unique solution under the assumption that $X^TX$ is invertible and use the properties of linear systems to that a) will have $\mathrm{rank}(w^*) < d$ and thus be inconsistent and b) will hold under assumptions ii. Is there a better/more formal way for proving a) and b)?
Note that $$R (w) = \|y - X w \|_2^2$$ Differentiating with respect to $W$ we get $$\nabla_w R = 2 X^T (y-Xw)$$ Setting the gradient to zero, we get the normal equation $$X^TXw = X^Ty \qquad (\star)$$ Recall that $\mathrm{rank}(X^TX) = \mathrm{rank}(X)$. Thus $X^TX$ is invertible if and only if $X$ is full rank (i.e. it has rank $d$).
Now let me answer your questions.
If $n < d$, then $\mathrm{rank}(X) \leq n < d$, and so $(X^TX)$ is not invertible. Thus, if the system $(\star )$ has a solution, then in fact it has infinitely many of them. In practice, if this happens, one may use the pseudoinverse of $X^TX$ instead.
This happens when $X^TX$ is invertible, which happens when $X$ has full rank. The conditions (i), (ii), and (iii) are equivalent ways of saying this.