Practical advice for setting Lagrange Multipliers for an optimization problem with positivity and range constraints

53 Views Asked by At

I am working on a convex optimization problems using the method of moments. Basically I have a discrete stochastic simulation model, and I want to tune the parameters against some data. So I have a simple objective function which has the squared loss for the mean and variance of the simulation against the data (as shown below).

So the partial objective or loss function looks as follows. By partial, I mean that I have not specified additional constraints yet. Let $p$ be a set of parameters, and $f(p)$ be the model. I run the stochastic simulation multiple times given $p$ and then I calculate the mean and variance of the simulation $mean(f(p))$ and $variance(f(p))$ for each timestep.

$$ \mathcal{L}(p) = || y_i - mean(f(p))||^2 + 0.1*||variance(y_i) - variance(f(p))||^2 + ... $$

My question is about some practical tips for setting the numerical value for the lagrange multipliers for the constraints.

In the simulation itself, I include draws from a binomial and multinomial distribution. Hence some of the parameter values are probabilities, meaning that they are positive and fall in the range $(0, 1)$. The problem that I run into is if the optimization algorithm suggests values that are negative or outside of the 0,1 range, then the code to run the model crashes because it cannot sample from a binomial distribution with a rate of $-0.5$ or something.

So I can add lagrange multipliers to include these constraints. For example if I have a probability parameter $p_1$, I could include the following constraints:

$$ -\lambda_1*p_1 $$

to ensure that $p_1$ is positive, and

$$ \lambda_2*(p_1 - 1) $$

to ensure that $p_1$ is less than 1.

However, my question is how to set the values for the lagrange multipliers $\lambda_i$? Like is there a good starting point or a good rule of thumb to follow so that I don't casually get negative values or values outside of the range? Should all the $\lambda_i$ be set to like 100 or something large, and perhaps gradually lowered?

Like I said, tuning the lagrange multipliers against some data or loss is also a big problem, because if during the turning process I get negative values, the entire programming routine will crash because a binomial or multinomial can't accept negative probabilities or probabilities above 1.0.