I am comfortable with derivatives in single and multi variable calculus. I also am somewhat familiar with taking the derivative of a function w.r.t. a vector.
Now I'm looking at a paper that uses matrix calculus and am trying to develop an intuition for the definitions of the tangent matrix (taking derivative of a matrix w.r.t. a scalar) and gradient matrix (taking derivative of a scalar w.r.t. a matrix).
If the derivative of a function equals its rate of change w.r.t. to some variable, what does it mean to take the derivative of a scalar or matrix? Why are these matrixes defined this way?
https://en.wikipedia.org/wiki/Matrix_calculus#Derivatives_with_matrices
The space of $n \times n$ matrices is naturally identified with Euclidean space (each of the entries is a coordinate with respect to some basis). Hence a function giving matrices as a function of time can be thought of as a path in that space, and differentiating with respect to time gives the velocity vector. This vector can be identified with a matrix -- it's just a matter of notation.
Similarly, a function that gives scalars from matrices can be thought of as a function on $R^{n \times n}$.