Chain rule for derivatives of matrix equations?

47 Views Asked by At

I was wondering if there was a general chain rule that could be applied to something like this

$$\frac{\partial}{\partial \mathbb{x}}\Vert A\mathbb{x} - \mathbb{y} \Vert_{2}^{2}$$

where $$ \mathbb{x} \in \mathbb{R}^n, \mathbb{y} \in \mathbb{R}^m, A \in \mathbb{R}^{m \times n} $$

I tried using the chain rule from how I knew it from a typical calculus course: $$ \frac{\partial}{\partial x}f(\mathbb{g}(x)) = \frac{\partial f}{\partial \mathbf{g}}\frac{\partial \mathbf{g}}{\partial \mathbf{x}} $$

But when attempting this, I ended up with $$ 2(A\mathbb{x} - \mathbb{y})A $$

which doesn't dimensionally make sense. I understand that matrix multiplication is order-dependent, but I couldn't figure out a general pattern that would resemble the common chain rule we learn in calculus. Can someone please give me an explanation?