So we have the function $f(\textbf{B}) = a_0 + \textbf{a}^T\textbf{B}$
and we want to do the following operation$\frac{\partial}{\partial \textbf{B}}$
My intuition tells me this should result in $\textbf{a}^T \textbf{1}$. But according to my worksheet the answer is simply $\textbf{a}$.
This doesn't make sense to me because the function originally outputs a scalar so why would taking the partial output a vector?
Since simple words, gradient needs to match the dimension of the original matrix/vector taking derivative of w.r.t.