Consider a vector y containing the y values of some putative linear relation in the form of $y=ax+b$ and matrices A and B: $$ A=\left(\begin{matrix} x_1 & 1\\ x_2 & 1 \\ ... & ...\\ x_i & 1 \\ \end{matrix}\right) $$
$$B= \left(\begin{matrix} a\\ b \end{matrix}\right)$$
In very basic linear regression theory it is stated that when the residuals are $ r=AB-y$, then $ SSQ = r^Tr = (AB-y)^T (AB-y)$, which is logic. However, then we can simplify this to $ y^Ty-2B^TA^Ty+B^TA^TAB$. I don't really understand how we get this two times B transposed A transposed y, I would simply get: $y^Ty-B^TA^Ty+ABy^T+B^TA^TAB$ instead. When I tried to fill in some numbers for the matrices A B en vector y, I indeed find that $ ABy^T $ equals $ A^TB^Ty $. What is the reasoning/theory behind this?
If you use a linear regression, you only obtain an estimate of $B$. Let $\widehat{B}$ be this estimate. The vector of residuals is: $r = y - A\widehat{B}$.
Using your notations, if $A$ is an $n \times p$ matrix, $A B$ is an $n \times 1$ matrix. We also have that $y^{T}$ is a $1 \times n$ matrix. This means that $A B y^{T}$ is an $n \times n$ matrix. Instead of $+ ABy^{T}$, you should have $- y^{T} A B$, which is a $1 \times 1$ matrix.
You could then write: $$y^{T} A B = (y^{T} A B)^{T} = B^{T} A^{T} y.$$