Which of the following two ridge regression estimators is preferable?

75 Views Asked by Bumbble Comm At 10 May 2026 - 12:38

From the link https://arxiv.org/pdf/1509.09169.pdf on page 33:

" The expression levels of the $j$-th gene are explained by a linear regression model in terms of those of all other genes. Consider the following two ridge regression estimators of the regression parameter of this model, defined as: $$ \arg \max_{\beta}\sum_{i=1}^n (Y_{i,j}-\mathbf{Y}_{i,j}\beta_j)^2 + \lambda||\beta_j||_2^2 $$ and $$ \arg \max_{\beta}\sum_{i=1}^n (Y_{i,j}-\mathbf{Y}_{i, j}\beta_j)^2 + n\lambda||\beta_j||_2^2 $$ Which do you prefer? Motivate."

I even got an answer, namely

" The penalty parameter is a dimensionless scalar. Rescaling from $\tilde{\lambda} = n\lambda$ is thus non-problematic for the interpretation. Moreover, the penalty parameter is chosen through (e.g.) cross-validation. The resulting choice of the penalty parameter will yield the same ridge regression estimates. "

But I'm still stuck. What do they mean by dimensionless scalar? I read about cross-validation, it has to do with splitting in training- and test sets and taking an average of the best predictors found but how can that be linked in this situation? And what kind of estimator is this? It looks like a loss-function, $$ \sum_{i=1}^n (\mathbf{Y}_i-\mathbf{X}_{i,*}\beta)^2 +\lambda||\beta||_2^2 $$ but usually we then take a minimum instead of maximum.

Can anyone explain what's going on here?

Original Q&A

Which of the following two ridge regression estimators is preferable?

Related Questions in REGRESSION

Related Questions in LINEAR-REGRESSION

Related Questions in REGRESSION-ANALYSIS

Trending Questions

Popular # Hahtags

Popular Questions