In finding the Residual Sum of Squares (RSS) We have:
\begin{equation} \hat{Y} = X^T\hat{\beta} \end{equation}
where the parameter $\hat{\beta}$ will be used in estimating the output value of input vector $X^T$ as $\hat{Y}$
\begin{equation} RSS(\beta) = \sum_{i=1}^n (y_i - x_i^T\beta)^2 \end{equation}
which in matrix form would be
\begin{equation} RSS(\beta) = (y - X \beta)^T (y - X \beta) \end{equation}
differentiating w.r.t $\beta$ we get
\begin{equation} X^T(y - X\beta) = 0 \end{equation}
My question is how is the last step done? How did the derivative get the last equation?
This is standard multiplication and differentiation rules for matrices.
We have
$$RSS(\beta) = (y - X \beta)^T (y - X \beta) = (y^T - \beta^TX^T)(y - X \beta) \\ =y^Ty-y^TX \beta-\beta^TX^Ty+\beta^TX^TX \beta$$
Then $$\frac {\partial RSS(\beta)}{\partial \beta} = -X^Ty-X^Ty+2X^TX\beta$$
the last term because the matrix $X^TX$ is symmetric.
So $$\frac {\partial RSS(\beta)}{\partial \beta} =0 \Rightarrow -2X^Ty+2X^TX\beta =0 \Rightarrow -X^Ty+X^TX\beta = 0$$
$$\Rightarrow X^T(-y + X\beta) = 0\Rightarrow X^T(y-X\beta)=0$$