I have been trying to find the Fréchet derivative of the following function: $\psi(x) = \frac{1}{\left \| x \right \|^p}Ax$ $(x \in \mathbb{R}^n, A \in\mathbb{R^{m \times n}})$. One possibility would be to use the derivative of continuous and bilinear operators on Banach Spaces, so in this case if we could set $\psi(x)=B(f(x),g(x))$ whereas $f:\mathbb{R^n} \rightarrow \mathbb{R}, x \mapsto\frac{1}{\left \| x \right \|^p}$ and $g:\mathbb{R^n} \rightarrow \mathbb{R^m}, x \mapsto Ax $. And $B:(\mathbb{R} \times \mathbb{R}^m) \simeq \mathbb{R}^{m+1} \rightarrow \mathbb{R}^m ,B(x,y)=xy$ $$$$ The formula I got to is the following ($Df$ is the total derivative of $f$): $$D\psi(x)=g(x)Df(x)+f(x)Dg(x)$$ But then the wikipedia article on the product rule states that: $$D(f \cdot g)=Df \cdot g+f \cdot Dg(x) ("." \text{represents scalar multiplication})$$ which is different as $Df$ and $g$ do not commute. I checked the dimensions of the objects I'm using to see where the difference lies, and I actually discovered that using the Wikipedia article the matrix multiplication dimensions do not match. Does anyone see whether there is a mistake in my reasoning, and if so then where?
By the way, I am looking for an answer that does NOT use partial derivatives. I can of course solve this problem using them, but I'm looking for a more general approach that doesn't just rely on heavy calculations.
That's right, $Df(x) \cdot g(x)$ if we interpret as a matrix multiplication will yield $1 \times n$ (row vector) times a $m \times 1$ matrix (column vector) which isn't even defined, so it has to be the other way around, like you suggested.
The clearest way for me to apply the product rule is always: if $\psi(x) = B(f(x), g(x))$ then \begin{align} D \psi_x(\cdot) = B(Df_x(\cdot), g(x)) + B(f(x), Dg_x(\cdot)) \end{align} This is true even if the maps are between Banach spaces, and of course easy to remember, because we keep the order of multiplication $B(\cdot, \cdot)$, and just differentiate term by term.
Note that this of course doesn't contradict what you wrote; the matrix representation of the first term, $B(Df_x(\cdot), g(x))$ is $[g(x)] \cdot [Df_x]$ ($m \times 1$ times $1 \times n$), while for the second term, $B(f(x), Dg_x(\cdot))$, it is $f(x) \cdot [Dg_x]$ (a number times $m \times n$ matrix).
In any case, I wouldn't say the equation $D(f \cdot g) = Df \cdot g + f \cdot Dg$ is incorrect, just that it is not properly evaluated everywhere (hence needs to be read with care). If we evaluate everywhere, then it reads: \begin{align} D(f \cdot g)_x[h] &= Df_x[h] \cdot g(x) + f(x) \cdot Dg_x[h] \end{align} This is of course correct, because rather than writing $B()$ for the bilinear multiplication, we simply use a $\cdot$ to indicate the multiplication.