For linear regression in the form
$$|y\rangle = X|\beta\rangle +|\epsilon\rangle $$
Where,
$$|y\rangle = \begin{bmatrix} y_1 \\\vdots \\ y_n\end{bmatrix}, \ X=\begin{bmatrix} 1 & x_1 \\ \vdots & \vdots \\ 1 & x_n\end{bmatrix}, \ |\beta\rangle = \begin{bmatrix}\alpha \\ \delta \end{bmatrix}, \ |\epsilon\rangle = \begin{bmatrix} \epsilon_1 \\ \vdots \\ \epsilon_n \end{bmatrix} $$
We calculate the best fit using least squares,
$$\langle \epsilon|\epsilon\rangle = (\langle y | - \langle \beta|X^T)(|y \rangle-X|\beta \rangle$$
calculating the derivative we get
$$\frac{\partial \langle \epsilon|\epsilon\rangle }{\partial \langle \beta|} = -2X^T| y \rangle + (X^{T} X + XX^{T})| \beta \rangle $$
Now this is the bit im confused on, in material I find online they simplify $ (X^{T}X +XX^{T})| \beta \rangle = 2 X^TX| \beta \rangle$ so that we end up with (after setting the derivative to zero) $|\beta \rangle = (X^TX)^{-1}X^T|y \rangle$ however this is only the case when X is symmetric in the case of X shown above this is not necessarily true correct? Furthermore, for $X$ above $XX^T$ is a $n\times n$ matrix so $XX^T |\beta \rangle$ doesn't make sense?
So I quickly realised my mistake
$$ \frac{\partial \langle\beta|X^TX|\beta \rangle}{\partial\langle \beta| } = (X^TX+(X^TX)^T)|\beta\rangle = 2X^TX|\beta\rangle$$
I basically forgot when transposing $X^TX$ that they also switch positions at the same time i.e. $(X^TX)^T = X^TX$.