The author of this question was close to determining the derivative of the function of dual variable, when we consider matrices isomorphic (algebraically and topologically) to dual numbers: $$(a+\epsilon b) \sim \begin{bmatrix} a & 0 \\ b & a \\ \end{bmatrix}.$$
So, using the fact we can define the derivative (in the Fréchet sense) for functions $F$ for with an argument in the form of such a matrix and a value in the form of such a matrix: $$F\big(\begin{bmatrix} x+s & 0 \\ y+t & x+s \\ \end{bmatrix}\big)-F\big(\begin{bmatrix} x & 0 \\ y & x \\ \end{bmatrix}\big)=\begin{bmatrix} u' & 0 \\ v' & u' \\ \end{bmatrix}\begin{bmatrix} s & 0 \\ t & s \\ \end{bmatrix}+o\bigg(\bigg|\bigg|\begin{bmatrix} s & 0 \\ t & s \\ \end{bmatrix}\bigg|\bigg|\bigg),$$ where $\bigg|\bigg|\begin{bmatrix} s & 0 \\ t & s \\ \end{bmatrix}\bigg|\bigg|=\max\{|s|,|t|\}$ and all elements of all matrices are real.
Therefore, the existence of such a matrix $\begin{bmatrix} u' & 0 \\ v' & u' \\ \end{bmatrix}$ (which we will call derivative at $\begin{bmatrix} x & 0 \\ y & x \\ \end{bmatrix}$) means differentiability of $F$ at $\begin{bmatrix} x & 0 \\ y & x \\ \end{bmatrix}$.
I'm interested in to what extent can this approach be generalized in defining a matrix-valued function of a matrix argument? I mean the case, when the derivative is an object of the same nature as variables (in opposed to the definition of the derivative of a function $f:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m}$ which is a (Jacobian) matrix).
Can anyone share links to material with respect to such kind of derivatives?
That will quickly go wrong. One step more complex than your example is the case of complex valued functions of complex numbers. The matrix equivalent to those number is then: $$(a+i b) \sim \begin{bmatrix} a & b \\ -b & a \\ \end{bmatrix}.$$
And for the function we know that it needs not one, but two derivatives: $$F\big(\begin{bmatrix} x+s & y+t \\ -y-t & x+s \\ \end{bmatrix}\big)-F\big(\begin{bmatrix} x & y \\ -y & x \\ \end{bmatrix}\big)=\qquad\qquad\qquad \\[20pt] \begin{bmatrix} c & d \\ -d & c \\ \end{bmatrix}\begin{bmatrix} s & t \\ -t & s \\ \end{bmatrix} +\begin{bmatrix} u & v \\ -v & u \\ \end{bmatrix}\begin{bmatrix} s & -t \\ t & s \\ \end{bmatrix}+o\bigg(\bigg|\bigg|\begin{bmatrix} s & t \\ -t & s \\ \end{bmatrix}\bigg|\bigg|\bigg),$$ which we usually write as: $$ f(z+\Delta) - f(z) = c \ \Delta + u\ \Delta^* + o(|\Delta|)\\[15pt] \text{or:}\quad f(z+\Delta) - f(z) = \frac{df}{dz} \ \Delta + \frac{df}{dz^*}\ \Delta^* + o(|\Delta|)\\ $$.
As example take the following functions and their pair of derivatives: $$ \begin{matrix} f(z) & & df/dz & & df/dz^* \\[10pt] {\rm Re}(z) && \frac12 && \frac12 \\ {\rm Im}(z) && -\frac12\ i && \frac12\ i \\ z && 1 && 0 \\ z^2 && 2\,z && 0 \\ z^* && 0 && 1 \\ |z|^2 && z^* && z \\ |z| && \frac12 \frac{\Large z^*}{\Large |z|} && \frac12 \frac{\Large z}{\Large |z|} \\ |z|^3 && \frac32 |z| z^* && \frac32\, |z|\, z \\ && {\rm etc.} && \end{matrix}.$$
As can be seen, only functions that are analytical, like $z$, or $z^2$, have $df/dz^*=0$ so they need only $df/dz$ to describe their derivative (which we then call "the complex derivative"). Likewize, purely anti-analytical functions, like $z^*$, or $(z^*)^2$, need only $df/dz^*$. In general, however, two complex numbers are needed, or in matrix language: two matrices are needed to describe the first order variation for these matrix-valued functions of a matrix. (See also question 2126598.)
For more complex (larger) matrices the number further increase, there simply is much more information required than can be contained in one matrix to describe the derivative in those cases.