I am following the derivation in this question. So we have: $$\begin{align} \text{RSS}(\beta) & = (y-X\beta)^T(y-X\beta) & (1) \end{align}$$
Differentiate w.r.t to $\beta$ and $\beta^T$, we have:
$$\begin{align} &\dfrac{\partial\text{RSS}}{\partial\beta} = -2X^{T}(y-X\beta) &&& (2)\\ &\dfrac{\partial^2\text{RSS}}{\partial \beta\text{ }\partial \beta^{T}} = 2X^{T}X\text{.} &&& (3) \end{align}$$
I understand the first differentiation from this explanation. However, I do not know how you can go from $(2)$ to $(3)$. Could someone explain?
Also, how do you get this? $$y^TX\beta=\beta^TX^Ty$$
TIA!
For the second part, they are each other's transpose:
$$(y^TX\beta)^T=((y^TX)\beta)^T=\beta^T(y^TX)^T=\beta^T(X^Ty)=\beta^TX^Ty$$
And since they are scalar, we have have the equality.
Ignoring the $-2X^\top y$ term, your question essentially boils down to "why is $\frac{\partial}{\partial \beta} A \beta = A$ true?" where $A=X^\top X$. You can check this by thinking about the matrix multiplication $A\beta$ and taking partial derivatives with respect to each $\beta_i$.