$$\frac{d}{dw} [w^TX^TXw - 2w^TX^Ty+y^Ty] = 2(X^TXw-X^Ty)$$
I do not understand how the RHS was obtained -- are there certain matrix differentiation properties which can be used to show this? Why does differentiating w.r.t. $w$ get rid of the $w^T$ (and not $w$) from each of the terms?
Since$$ \frac{\partial}{\partial x}(x^TBx)=(B+B^T)x $$ The first term in your problem gives $$ w^T(X^TX+X^TX)=2X^TX w $$
The last term simplifies to $\boldsymbol{0}$.
By noting that $$ \frac{\partial x^T a}{\partial x}=a $$ We generalize this to the matrix $A$ instead of $a$ so the middle term gives: $$ \frac{\partial}{\partial w}{(-2w^T X^T y)}=-2X^Ty $$