When doing regularization in neural network backpropagation you have to set a reasonable value for the regularization parameter (e.g. if doing L2 regularization). I'm used to: 1. people either just setting the value constant at somewhat resonable value based on what other people say and not actually tuning it, or 2. saying that you could tune the hyperparameter using crossvalidation by doing a grid search on that parameter space and taking the argmin.
But then I read somewhere that you could tune the regularization parameter at the same time in backpropagation by calculate the gradient of the cost wrt the regularization parameter, just like any weight? But wouldn't tuning this way always drive the regularization parameter to 0 because for any positive value of the reg. parameter, it will increase the cost function (since if using L2 or L1 norm regularization the term for the weights wil always be positive)?
My guess is that you would backpropagate only the error term for purposes of calculating the loss w.r.t. lambda, so that lambda will go up if larger weights would lead to higher error, etc. The weights themselves would still be shrunk as usual.