From the link https://arxiv.org/pdf/1509.09169.pdf on page 33:
" The expression levels of the $j$-th gene are explained by a linear regression model in terms of those of all other genes. Consider the following two ridge regression estimators of the regression parameter of this model, defined as: $$ \arg \max_{\beta}\sum_{i=1}^n (Y_{i,j}-\mathbf{Y}_{i,j}\beta_j)^2 + \lambda||\beta_j||_2^2 $$ and $$ \arg \max_{\beta}\sum_{i=1}^n (Y_{i,j}-\mathbf{Y}_{i, j}\beta_j)^2 + n\lambda||\beta_j||_2^2 $$ Which do you prefer? Motivate."
I even got an answer, namely
" The penalty parameter is a dimensionless scalar. Rescaling from $\tilde{\lambda} = n\lambda$ is thus non-problematic for the interpretation. Moreover, the penalty parameter is chosen through (e.g.) cross-validation. The resulting choice of the penalty parameter will yield the same ridge regression estimates. "
But I'm still stuck. What do they mean by dimensionless scalar? I read about cross-validation, it has to do with splitting in training- and test sets and taking an average of the best predictors found but how can that be linked in this situation? And what kind of estimator is this? It looks like a loss-function, $$ \sum_{i=1}^n (\mathbf{Y}_i-\mathbf{X}_{i,*}\beta)^2 +\lambda||\beta||_2^2 $$ but usually we then take a minimum instead of maximum.
Can anyone explain what's going on here?