I am struggling to understand the chain rule for vectors. Suppose I have two functions $f:\mathbb{R}^m\rightarrow \mathbb{R}$ and $g:\mathbb{R}^m\rightarrow \mathbb{R}^m$;
Is it true that:
$$\frac{\delta f(x)}{\delta x} = \frac{\delta f(x)}{\delta g(x)}\frac{\delta g(x)}{\delta x}$$
Where for $\frac{\delta a(x)}{\delta b}$ is the matrix with $\frac{\partial a_{i}(x)}{\partial b_j}$ in it's $(i,j)^{th}$ element. (Subscript here denotes element, i.e. $a_3$ is the $3^{rd}$ element of $a$). Could anyone give any intuition as to why this is the case if so?
The comments are true, you have a typo and it is a linear map, and thus implies matrix multiplication. We can think about this by starting with the vectors, and then just by going term by term. Feel free to skip to the second answer if you want a tl;dr
Disclaimer:
(Repeated indices in a multiplication like that is supposed to mean that you sum them here. This is called the Einsten Summation Convention. For instance I could write $g_k \hat{e}_k$ instead of putting the summation symbol over k. It's just supposed to save time and space in your writing)
You really have two functions like this: $$f(\vec{x})$$
$$\vec{g}(\vec{x})$$
Where $\vec{x}$ is in $\mathbb{R}^m$
So it sounds like you want "$\frac{d}{d\vec{x}}$" which is just the gradient written as $\nabla$
So $$\nabla f(\vec{g}(\vec{x}))=\nabla_{\vec{g}} f(\vec{g})\nabla\vec{g}$$
The first gradient is just the derivative with respect to each component of $\vec{g}$. That second term is a gradient of a vector, so it's going to be a mxm matrix in your case. Let's see how by breaking it up. $\nabla \vec{g}$ by each component is just: $$\nabla g_j$$ Which is the gradient of each scalar component $g_j$ of $\vec{g}$. I.e. $\vec{g} = \sum_k g_k \hat{e}_k$, where $\hat{e}_k$ are the basis vectors of your space.
Then we can go one layer down into each derivative of the gradient. Since $\nabla = \frac{\partial}{\partial x_i}\hat{e}_i$, then each component will just be:
$$\frac{\partial g_j}{\partial x_i}$$
You can see it has two indices, and thus makes a matrix. There's one last subtly to take care of. Is it:
$$\frac{\partial f}{\partial g_i}\frac{\partial g_j}{\partial x_i}$$
or
$$\frac{\partial f}{\partial g_j}\frac{\partial g_j}{\partial x_i}$$?
In one case we sum over i with the number of derivatives and in the other we some over j with the number of components of g.
Well it should be summed over the number of total g derivatives to take since we wrote $\nabla_{\vec{g}} f$. Thus it will have the same number of components as $\vec{g}$, thus the end result is:
$$\nabla f(\vec{g}(\vec{x}))=\frac{\partial f}{\partial g_j}\frac{\partial g_j}{\partial x_i}$$
SECOND ANSWER
I feel like the first answer might be hard to swallow because of the weird gradients I did, so let's take a slightly different path. We still want $$\nabla f(\vec{g}(\vec{x}))$$
But let's just look at the components of each vector object. Instead of looking at the whole gradient and the whole $\vec{g}$, let's look at a particular component of the gradient $\frac{\partial}{\partial x_k}$ and a particular component of $\vec{g}$ $g_l$. k and l can be different in general because we will be taking all derivatives of all components of $\vec{g}$. Then the equation component wise is just:
$$\frac{\partial}{\partial x_k}f(g_l(\vec{x})) = \frac{\partial f}{\partial g_l}\frac{\partial g_l}{\partial x_k}$$
We will sum over l and then our resulting vector is indexed by k, so a way of writing the answer is just $$\nabla f(\vec{g}(\vec{x}))=\frac{\partial f}{\partial g_l}\frac{\partial g_l}{\partial x_k}\hat{e}_k = \frac{\partial f}{\partial g_l}\nabla g_l$$
Which still gives a vector. This makes sense cause $f$ has scalar values and the gradient will make it into a vector.