Can I use the BFGS algorithm in combination with a L1-Norm penalized LLH to get exact zeros?

62 Views Asked by At

I am working along a paper Predicting the Long term Stock Market Volatility:A GARCH MIDAS Model with Variable Selection

It uses a PLLH = LLH - L1-Norm (equation 7)

And always speaks of "non-zero" and "shrunken to zero"

We choose the optimal tuning parameter using Generalized Information Criteria (GIC), and select the variables with non zero parameter estimates

I coded it in R for a simulation study so I know the true variables and also tried a proximal gradient descent variant but that struggles to converge and takes a lot longer. So now I am using optim(method="BFGS") and it works quiet well but it never returns exact zero estimates but in the ball park of 1e-7 - 1e-13 for the non-active variables.

The gradient function is a numerical approximation of the PLLH.

Can I even achieve exact zero estimates without a threshold or did I do something wrong?

2

There are 2 best solutions below

0
On

The BFGS method is a sophisticated gradient descent but a descent method nonetheless. The $1$-norm partial derivatives (outside of the axes) are equal to "sign(component)". If a component of the current solution is doomed to be non-active, it will oscillate around $0$ as it will be pushed to either side of 0 without hitting it exactly.

To sum up, this is perfectly normal and, as you hinted, a threshold should be used afterward if one wants to set non-active components to 0.

0
On

BFGS obtains superior convergence to gradient decsent by forming an approximation to the Hessian matrix as the iteration progresses, which speeds up convergence as you enter a region where the true Hessian is positive definite, which occurs at the minimum of a smooth function. However, the 1-norm is not smooth at the origin, so the Hessian is undefined and cannot be approximated, meaning that BFGS may not converge to the true minimum at the origin. If you need this, try doing a few steps of gradient descent starting at the BFGS solution to send the non-active variables to zero.