In statistics, the residual sum of squares is given by the formula
$$ \operatorname{RSS}(\beta) = (\mathbf{y} - \mathbf{X}\beta)^T(\mathbf{y} - \mathbf{X}\beta)$$
I know differentiation of scalar functions, but how to I perform derivatives on this wrt $\beta$? By the way, I am trying to take the minimum of RSS wrt to $\beta$, so I am setting the derivative equal to 0.
I know somehow product rule has to hold. So here I have the first step
$$-\mathbf{X}^T(\mathbf{y}-\mathbf{X}\beta) + (\mathbf{y}-\mathbf{X}\beta)^T(-\mathbf{X})= 0$$
First you can remove the transposition sign from the first bracket:
$RSS=(\mathbf{y}^T - \beta ^T \mathbf{X} ^T)(\mathbf{y} - \mathbf{X}\beta)$
Multiplying out:
$RSS=y^Ty-\beta ^T \mathbf{X} ^Ty-y^TX\beta+\beta^TX^T X\beta $
$\beta ^T \mathbf{X} ^Ty$ and $y^TX\beta$ are equal. Thus
$RSS=y^Ty-2\beta ^T \mathbf{X} ^Ty+\beta^TX^T X\beta$
Now you can differentiate with respect to $\beta$:
$\frac{\partial RSS}{\partial \beta}=-2X^Ty+2X^T X\beta=0$
Dividing by 2 and bringing the first summand to the RHS:
$X^T X\beta=X^Ty$
Multiplying both sides by $(X^T X)^{-1}$
$(X^T X)^{-1}X^T X\beta=(X^T X)^{-1}X^Ty$
$(X^T X)^{-1}X^T X= I$ (Identity matrix).
Finally you get $\beta=(X^T X)^{-1}X^Ty$
Equality of $\beta ^T \mathbf{X} ^Ty$ and $y^TX\beta$
I make an example:
$\left( \begin{array}{c c} b_1 & b_2 \end{array} \right) \cdot \left( \begin{array}{c c c} x_{11} & x_{21} \\ x_{12} & x_{22}\end{array} \right) \cdot \left( \begin{array}{c c} y_1 \\ y_2 \end{array} \right) $
$=\left( \begin{array}{c c} b_1x_{11}+b_2x_{12} & b_1x_{21}+b_2x_{22} \end{array} \right) \cdot \left( \begin{array}{c c} y_1 \\ y_2 \end{array} \right)$
$=b_1 x_{11}y_1+b_2 x_{12}y_1+b_1x_{21}y_2+b_2x_{22}y_2\quad (\color{blue}{I})$
$\left( \begin{array}{c c} y_1 & y_2 \end{array} \right) \cdot \left( \begin{array}{c c c} x_{11} & x_{12} \\ x_{21} & x_{22}\end{array} \right) \cdot \left( \begin{array}{c c} b_1 \\ b_2 \end{array} \right) $
$=\left( \begin{array}{c c} y_1x_{11}+y_2x_{21} & y_1x_{12}+y_2x_{22} \end{array} \right) \cdot \left( \begin{array}{c c} b_1 \\ b_2 \end{array} \right)$
$=y_1 x_{11}b_1+y_2 x_{21}b_1+y_1x_{12}b_2+y_2x_{22}b_2 \quad (\color{blue}{II})$
$\color{blue}{I}$ and $\color{blue}{II}$ are equal.
Derivative Rules
$\frac{\partial \beta ^T X ^T y }{\partial \beta }=X^Ty$
$\frac{\partial \beta^T X^T X \beta }{\partial \beta }=2X^TX\beta$