Suppose I have differentiable functions (in the sense of the frechet derivative) $f\colon \mathbb{R} \to \mathbb{R}^{n\times n} $ and $g\colon \mathbb{R}^{n} \to \mathbb{R}$ , where $f$ is a linear operator, and want to compute the (frechet) derivative of their composition, i.e. $f \circ g\colon \mathbb{R}^{n} \to \mathbb{R}^{n, n} $. Using the chain rule for normed spaces I obtain \begin{align*} D(f \circ g (\mathbf{x}))h = \underbrace{ (\mathrm{D}f)(g(\mathbf{x}))}_{\in \mathbb{R}^{n\times n} } \cdot \underbrace{ \mathrm{D}g(\mathbf{x})}_{\in \mathbb{R}^{1\times n} }h, \quad h \in \mathbb{R} .\end{align*} How can this hold since the dimensions of the product do not match up?
Edit: Consider the function \begin{align*} f(\mathbf{x}) = \mathbf{A}(\mathbf{x})\mathbf{x} .\end{align*} with \begin{align*} \mathbf{A}(\mathbf{x})\colon \mathbb{R}^{n} \to \mathbb{R}^{n, n} , \quad \mathbf{A}(\mathbf{x}) = \begin{bmatrix} \alpha(\mathbf{x}) & 1 & 0& 0 & \cdots & 0 \\ 1 & \alpha(\mathbf{x}) & 1 & 0 & \cdots & 0 \\ 0 & 1 & \alpha(\mathbf{x}) & 1 & \cdots & 0 \\ \vdots & \cdots & \ddots & \ddots & \ddots & \vdots \\ 0 & 0 & 0 & 1 & \alpha(\mathbf{x})& 1 \\ 0 & 0 & 0 & 0 & 1 & \alpha(\mathbf{x}) \end{bmatrix} \end{align*} whereby $\alpha(\mathbf{x}) = \left\|\mathbf{x}\right\|_{2} $. My professor now rewrites the above as \begin{align*} \mathbf{A}(\mathbf{x})\mathbf{x} = \mathbf{T}\mathbf{x} + \mathbf{x}\left\|\mathbf{x}\right\|_{2} , \quad \mathbf{T}:=\left[\begin{array}{cccccc} 3 & 1 & & & & \\ 1 & 3 & 1 & & & \\ & \ddots & 3 & \ddots & & \\ & & \ddots & \ddots & \ddots & \\ & & & 1 & 3 & 1 \\ & & & & 1 & 3 \end{array}\right] .\end{align*} He then finds \begin{align*} \mathrm{D}f(\mathbf{x}) \mathbf{h}=\mathbf{T h}+\|\mathbf{x}\|_{2} \mathbf{h}+\mathbf{x} \frac{\mathbf{x}^{\top} \mathbf{h}}{\|\mathbf{x}\|_{2}} =\left(\mathbf{A}(\mathbf{x})+\frac{\mathbf{x} \mathbf{x}^{\top}}{\|\mathbf{x}\|_{2}}\right) \mathbf{h} \end{align*}
from which I concluded that \begin{align*} \mathrm{D}(\mathbf{A}(\mathbf{x}))= \frac{\mathbf{x}\mathbf{x}^{\mathsf{T}}}{\left\|\mathbf{x}\right\|_{2} } .\end{align*}
For the purposes of this type of multivariate calculus (e.g. Frechet derivatives), a domain or codomain of $m\times n$ real matrices is identified with $\Bbb R^{mn}$, not with $\Bbb R^{m\times n}$. You "flatten" your matrices before you do derivatives and chain rules on them. At least if you want your derivative at a given point to be represented by a standard rectangular grid of numbers.
If you don't want to flatten your matrices before you do calculus in them, then your derivatives will be cuboids of higher dimension. You're now venturing into what I would call tensor calculus territory.
Edit: After looking at your example, here is what I think happens: $\mathbf A$ is a function $\Bbb R^n\to \Bbb R\to \Bbb R^{n\times n}$, the way you describe. But $f$ is a function $\Bbb R^n\to \Bbb R^n$, and as such its Frechet derivative may be realized as an $n\times n$ matrix. You have been given the matrix $Df = \mathbf{A}(\mathbf{x})+ \dfrac{\mathbf{x}\mathbf{x}^{\mathsf{T}}}{\left\|\mathbf{x}\right\|_{2} }$ as this derivative.