I am trying to differentiate
$$ \mathbf{b} \mapsto \mathbf{(a+b)}^\intercal \mathbf{A(a+b)} $$
where $\mathbf{a}$ and $\mathbf{b}$ are $n \times 1$ vectors and $\mathbf{A}$ is an $n \times n$ symmetric positive semidefinite matrix.
I expand it out to
$$\mathbf{a}^\intercal \mathbf{Aa} + \mathbf{a}^\intercal \mathbf{Ab} + \mathbf{b}^\intercal \mathbf{Aa} + \mathbf{b}^\intercal \mathbf{Ab}$$
and then differentiated with respect to $\mathbf{b}$, which I calculated as $\mathbf{a}^\intercal \mathbf{A}$ + $\mathbf{Aa}$ + $\mathbf{2Ab}$, but this does not make sense to me because the first term $\mathbf{a}^\intercal \mathbf{A}$ is now $1 \times n$ but the second and third terms $\mathbf{Aa}$ and $\mathbf{2Ab}$ are $n \times 1$, so the dimensions do not match. What am I doing wrong?
In vector/matrix calculus, we often break things up using the chain rule, rather than by expanding out terms.
Think of it like this: we have $a, b \in \mathbb{R}^n, A \in \mathbb{R}^{n \times n}$.
Now consider the following functions:
$$f: \mathbb{R}^n \rightarrow \mathbb{R}^n. f(b) = a+b$$
$$g: \mathbb{R^n} \rightarrow \mathbb{R}. g(x) = x^TAx$$ where $A$ is a symmetric matrix.
By the chain rule: $\frac{dg(f(b))}{d b} = \frac{dg}{df} \frac{df}{db}$ where $\frac{dg}{df} \in \mathbb{R}^{1 \times n}, \frac{df}{db} \in \mathbb{R}^{n \times n}$ and so $\frac{dg(f(b))}{db} \in \mathbb{R}^{1 \times n}$.
Let's calculate those separately:
$$\frac{df}{ db} = \begin{bmatrix} \frac{\partial f_1}{\partial b_1} & \dots & \frac{\partial f_1}{\partial b_n}\\ \vdots & \ddots & \vdots\\ \frac{\partial f_n}{\partial b_1} & \dots & \frac{\partial f_n}{\partial b_n} \end{bmatrix} \in \mathbb{R}^{n \times n} $$
Now, for each partial derivative in the Jacobian above, we have $\frac{\partial f_i}{\partial b_j} = \frac{\partial }{\partial b_j}a_i + \frac{\partial}{\partial b_j}b_i = \delta_{ij}$. Therefore $\frac{df}{db} = I_n \in \mathbb{R}^{n \times n}$. Now, $$\frac{dg}{df} = \begin{bmatrix} \frac{\partial g}{\partial f_1} & \frac{\partial g}{\partial f_2} & \cdots & \frac{\partial g}{\partial f_n} \end{bmatrix} \in \mathbb{R}^{1 \times n}$$
Note that $g(f) = f^TAf = \sum_{i=1}^{N}{f_i [Af]_i} = \sum_{i=1}^{N}{\sum_{j=1}^{N}{f_iA_{ij}f_j}}$. Therefore:
$$ \frac{\partial g}{ \partial f_k} = \sum_{i, j}{A_{ij} \frac{\partial}{\partial f_k}(f_if_j)} = \sum_{i, j}{A_{ij}(f_i\delta_{jk} + f_j\delta_{ik})} = $$
$$\sum_{i, j}{A_{ij}f_i\delta_{jk}} + \sum_{i, j}{A_{ij}f_j\delta_{ik}} = \sum_{i}{A_{ik}f_i} + \sum_{j}{A_{kj}f_j} $$
Since $A$ is symmetric, $A_{ik} = A_{ki}$ so we have:
$$\frac{\partial g}{\partial f_k} = \sum_{i}{A_{ki}f_i} + \sum_{j}{A_{kj}f_j} $$
Reindexing we are left with:
$$\frac{\partial g}{\partial f_k} = 2 \sum_{i}{A_{ki}f_i} = 2[Af]_k$$
Collecting all the partial derivatives in the Jacobian vector:
$$ \begin{bmatrix} \frac{\partial g}{\partial f_1} & \cdots & \frac{\partial g}{\partial f_n} \end{bmatrix} = 2 \begin{bmatrix} [Af]_1 & \cdots & [Af]_n \end{bmatrix} = 2(Af)^T = 2f^TA^T = 2f^TA \in \mathbb{R}^{1 \times n}$$
Finally, substituting the Jacobians in the chain rule:
$$ \frac{dg(f(x))}{dx} = \frac{dg}{df} \cdot \frac{df}{dx} =2f^TA \cdot I_n = 2f^TA = 2(a+b)^TA \in \mathbb{R}^{1 \times n}$$
as required.