$Y$ and $\beta$ are $1 \times n$ matrices and $W$ is a diagonal $n \times n$ matrix.
What is the best way to think about how to simplify this expression and its derivative to get the expression below? What are the simple rules I should remember to get this?
$2A^TWA\beta − 2A^TW^TY$
Throughout I implicitly sum over repeated indices. You're differentiating a scalar $Z_i W_{ij}Z_j,\,Z_i:=Y_i-A_{ik}\beta_k$ with respect to a vector $\beta$, giving a vector whose $k$th component is obtained by differentiating with respect to $\beta_k$. Since $\partial_k:=\tfrac{\partial}{\partial\beta_k}\implies\partial_k Z_i=-A_{ik}$, the product rule obtains $\partial_k(Z_i W_{ij}Z_j)=-A_{ik}W_{ij}Z_j-Z_i W_{ij}A_{jk}$. Now you can use the rules of matrix multiplication to rewrite this neatly as $-[Z^T(W+W^T)A]_k$, making the derivative $-Z^T(W+W^T)A=(A\beta-Y)^T(W+W^T)A$.
Now for a sanity check, which is worthwhile with any calculation this complex. If $W$ were a scalar instead, we'd have $\partial_k Z^TWZ=W\partial_k (Z^TZ)=-2Z^TWA$. But when we reinstate $W$'s matrix status, we note its antisymmetric part doesn't even contribute to the scalar we're differentiating, so without loss of generality $W$ should be replaced throughout with its symmetric part $(W+W^T)/2$. That gives $-Z^T(W+W^T)A$ instead, as we've found.