Suppose I have set of samples $(x_i,y_i), 1 \leq i \leq n$. I am interested in solving the following optimization problem: $$ \min \sum_{i=1}^n (y_i-a^\top x_i)^2, \quad \text{s.t } \|a\|_{2} = 1. $$
If we assume that $\sum_i x_i x_i^\top$ is invertible, I am wondering if one can prove that the solution to the above optimization problem is $$ a^\ast=\frac{\left(\sum_i x_i x_i^\top \right)^{-1} (\sum_i x_i y_i)}{\|\left(\sum_i x_i x_i^\top \right)^{-1} (\sum_i x_i y_i)\|} $$ Does the above solution still hold if we relax the constraint to be $\|a\|_{2} \leq 1$? Assuming that the above solution does not hold, in this inequality constrained case, does running a projected gradient descent guaranteed to find the true minimum since the problem is convex?

No, it is pretty much never the solution, so you will not be able to prove that