I came across the following equation for neural networks: $$ J(\theta) = \frac{-1}{2m} [\sum_{i=1}^m y^{(i)}log(h_\theta(x^{(i)}) + (1-y^{(i)})log(1-h_\theta(x^{(i})] + \frac{\lambda}{2m} \sum_{l=1}^{L-1} \sum_{i=1}^{S_l} \sum_{j=1}^{S_l + 1} (\theta_{ji}^{l})^2 $$
I don't understand how to do the $\sum_{l=1}^{L-1} \sum_{i=1}^{S_l} \sum_{j=1}^{S_l + 1} (\theta_{ji}^{l})^2$, because of the numerous sums.
How do you do it?
Thanks
Perhaps it would be easier to understand the concept by looking at an
It is actually possible to write such sums in a more compact form by alternating upper and lower indices. For more details, see Einstein Summation Convention.