I have the following equation which I wish to solve:
$$\frac{\partial}{\partial \mathbf w}(\mathbf y - \mathbf X\mathbf w)^T(\mathbf y - \mathbf X \mathbf w) = 0$$
Here $\mathbf y_{n*1}, \mathbf X_{n*2},\mathbf w_{2*1},$
My solution (done on paper because MathJax is a bit difficult for me to use):
Also, is my reasoning for step 4 correct?

Line $3$ to line $4$, note that $$ \frac{\partial}{\partial w} (y^TXw) = X^Ty, $$ then you'll get the right answer $$ \hat{w} = (X^TX)^{-1}X^Ty. $$
Explicit derivation: Note that $$ y^TXw = w_1\sum_{i=1}^ny_i + w_2\sum_{i=1}^ny_ix_{1i}+\cdots+w_p\sum_{i=1}^ny_ix_{pi}, $$ taking derivative w.r.t vector $w$, $w \in \mathbb{R}^p$, will result in a gradient, i.e., vector with $p$ rows and $1$ column, namely $$ \begin{pmatrix} \sum y_i \\ \sum y_i x_{1i}\\ \vdots \\ \sum y_i x_{pi} \end{pmatrix}, $$ where the $j$th row is the derivative of $y^TXw$ w.r.t. $w_j$. Now, as $X^T$ is $p\times n$ and $y$ is $n \times 1$, hence $X^Ty$ is $p \times 1$ as required.