Let's say I have a vector $x \in \mathbb{R}^d$ and a 3-tensor $W \in \mathbb{R}^{d \times d \times d}$
I define $f(x) = \sum_i \sum_j \sum_k x_i x_j x_k W_{i,j,k}$
How would I compute $\frac{\partial}{\partial x}$?
I would think it would be $\frac{\partial}{\partial x_i} = \sum_j \sum_k x_j x_k W_{i,j,k}$
But when I compute both the numerical and the above analytical gradient, they don't match up.
The Einstein summation convention was created for this kind of calculation. $$\eqalign{ f &= x_ix_jx_k \,W_{ijk} \\ \\ \frac{\partial f}{\partial x_p} &= \left(\frac{\partial x_i}{\partial x_p}\right)x_jx_k W_{ijk} + x_i\left(\frac{\partial x_j}{\partial x_p}\right)x_k W_{ijk} + x_ix_j\left(\frac{\partial x_k}{\partial x_p}\right) W_{ijk} \\ &=(\delta_{ip})x_jx_k W_{ijk} + x_i(\delta_{jp})x_k W_{ijk} + x_ix_j(\delta_{kp}) W_{ijk} \\ &= x_jx_k W_{pjk} + x_ix_k W_{ipk} + x_ix_j W_{ijp} \\ &= x_jx_k W_{pjk} + x_jx_k W_{jpk} + x_jx_k W_{jkp} \\ }$$ Or using explicit summations $$\eqalign{ \frac{\partial f}{\partial x_p} &= \sum_j\sum_k x_jx_k\big(W_{pjk}+W_{jpk}+W_{jkp}\big) \\ }$$