Gradient norm in a neural network is bounded?

177 Views Asked by At

Consider a fully connected neural network with single hidden layer $f(x,w) = w^T_2 \sigma(w^T_1 x) $ where $w = [ w_2, w_1 ]$ are networks' parameters and $\sigma$ is an activation function (e.g tanh, sigmoid, relu). Let $l(f(x,w), y)$ be the binary cross entropy loss. Is that true that for a particular data point $x$, the gradient norm of the loss function $\| \nabla_w l(f(x,w),y)\|_2 \leq C$ is always bounded over all choices of model parameters $w$.