To my understanding, the gradient essentially packages together all the required partial derivatives of a function in a single structure. In 3 dimensions, the resulting vector gives the slope of steepest descent. For example, consider the scalar valued function $$\large f: \mathbb{R}^{n}\rightarrow \mathbb{R}$$ Its gradient with respect to $\large \mathbf{\vec x}$ is given as: $$ \large \nabla _{\mathbf{\vec x}}f(\mathbf{\vec x}) = \left[\frac{\partial f(\mathbf{\vec x})}{\partial x_{i}}\right]_{n} = \large \begin{bmatrix}% \frac{\partial f(\vec{x})}{\partial x_{1}} & \frac{\partial f(\vec{x})}{\partial x_{2}} & \frac{\partial f(\vec{x})}{\partial x_{3}} & \dots & \frac{\partial f(\vec{x})}{\partial x_{n}}% \end{bmatrix}% $$ So the output of the gradient is also a vector. Here, the vector contains all partial derivatives of $\large f(\mathbf{\vec x})$ with respect to each element in $\large \mathbf{\vec x}$ . Similarly: $$ \large\nabla _{\mathbf{A}}f(\mathbf{A})= \left[\frac{\partial f(\mathbf{A})}{\partial \mathbf{A}_{ij}}\right]_{m \times n} \mspace{30mu}where \mspace{20mu} f: \mathbb{M}_{m, n}(\mathbb{R})\rightarrow \mathbb{R} $$ Here, the matrix contains all partial derivatives of $\large f(\mathbf{\vec x})$ with respect to each element of $\large \mathbf{A}$ .
Here's my proposition For any scalar function, its gradient's structure depends only on the value to whose respect we are taking the gradient and not on the type of input provided to the function. For example, the gradient of a scalar matrix function with respect to a vector should also be a vector. $$ \large \nabla _{\mathbf{\vec x}}f(\mathbf{A}) = \left[\frac{\partial f(\mathbf{A})}{\partial x_{i}}\right]_{n} $$ Is this valid?