We all know that there are such types of derivative:
And scalar-by-matrix derivative $\frac{\partial y}{\partial \textbf{X}} $ is defined as follow:
\begin{pmatrix} \partial y / \partial x_{11} & \partial y / \partial x_{21} & \ldots &\partial y / \partial x_{p1}\\ \partial y / \partial x_{12} & \partial y / \partial x_{22} & \ldots &\partial y / \partial x_{p2}\\ \vdots & \vdots & \ddots & \vdots \\ \partial y / \partial x_{1q} & \partial y / \partial x_{2q} & \ldots &\partial y / \partial x_{pq}\\ \end{pmatrix}
So, why do we need this if we can just put all variables of $y$ in vector and take scalar-by-vector derivative?
Probably, I failed to get the main idea about scalar-by-matrix derivative, can you give me some numerical example? I can't find any concrete example in any textbooks.

You may as well ask why we need matrices at all, when we can just put all the variables in one long vector.