According to this source (The Matrix Cookbook), we have that for $f(X) = \mathbf{a}^T X \mathbf{b}$ where $X \in \mathbb{R}^{n \times m}, \mathbf{a} \in \mathbb{R}^n, \mathbf{b} \in \mathbb{R}^m$, the derivative is $$ \frac{\partial f}{\partial X} = \mathbf{a} \mathbf{b}^T$$ which is a scalar since it's basically a dot product. I see how we can arrive at this result by applying matrix derivative rules/applying basic derivative rules to vectors/matrices. However, taking the derivative (gradient) of a function $f: \mathbb{R}^{n \times m} \to \mathbb{R}$ based on my my calculus knowledge should yield a vector/matrix of partial derivatives.
As an example, I tried this with $\mathbf{a} = \begin{bmatrix} a_1 & a_2 \end{bmatrix}, \mathbf{b} = \begin{bmatrix} b_1 & b_2 \end{bmatrix}$, and $X = \begin{bmatrix} x_1 & x_2 \\ x_3 & x_4 \end{bmatrix}$. I expanded the expression and took the gradient to get that $$\nabla f = \begin{bmatrix} \frac{\partial f}{\partial x_1} & \frac{\partial f}{\partial x_2} \\ \frac{\partial f}{\partial x_3} & \frac{\partial f}{\partial x_4} \end{bmatrix} = \begin{bmatrix} a_1b_1 & a_1b_2 \\ a_2b_1 & a_2b_2 \end{bmatrix}.$$
So, my question is, is this the derivative, or is it $\mathbf{a}\mathbf{b}^T$? Should the derivative be a vector or a scalar?
P.S. I also noticed that $\mathbf{a}\mathbf{b}^T = \operatorname{trace}(\nabla f)$ for the matrix I computed for $\nabla f$. Is there any significance to that?
The problem is vector usually refers to a column vector (otherwise the expression $a^TXb$ doesn't even make sense since the dimensions don't match). If you instead do $$ab^T = \begin{pmatrix}a_1\\a_2\end{pmatrix}\begin{pmatrix}b_1 & b_2\end{pmatrix} = \begin{pmatrix} a_1b_1 & a_1b_2\\ a_2b_1 & a_2b_2 \end{pmatrix}$$ then you get the same result as your computation.