Looking for a Regression Method with way to enforce a reflection/ flip invariance for input

65 Views Asked by At

Problem: Given a set of $N$ dimensional 1D features, I seek to predict a single scalar value.

My approach has been to build a linear regression model ($\hat{y}$) to predict the scalar value from the 1D feature. Say, $$ \hat{y}(w, x) = w_0 + w_1 x_1 + ... + w_p x_p $$ $w = (w_1,..., w_p)$ is the regression coefficients. Which we seek.

Then the regressor coefficients are computed by solving the minimization problem: $$ \min_{w} || X w - y||_2^2 + \alpha ||w||_2^2 $$

So far this is a classical regression problem, for which I can use any of the ML toolkits.

However, I have an additional requirement. I need the model to be invariant to a flipped feature input. Here,
flipping is defined as flipping the order of feature components. i.e A feature $X$ = $(X_0\ldots X_n)\to(X_n\ldots X_0)$

Or formally we can define Flip as: $Flip((X_0\ldots X_n) )$ = $(X_n\ldots X_0)$

Hence, $$ \hat{y}(w, x) = \hat{y}(w, Flip(x)) $$

Any thoughts on what approach to take to incorparate such a constraint in the regression method.

P.S: Currently I am using a Lasso Regressor with a high penalty ($\alpha$). The reasoning being that the high alpha will lead to a regressor with coeffienients near zero. Such a regressor with be robust to flipped input

1

There are 1 best solutions below

4
On BEST ANSWER

One very simple way to achieve this would be to put the condition into the parameters.

Instead of $w=(w_1,\ldots,w_p)$, take $w=(w_1,w_2,\ldots, w_{p/2}, \ldots,w_2, w_1)$, i.e. a parameter vector that is invariant under the flip operation.

This works because $flip(x) = P\cdot x$ where $P=\begin{pmatrix}&& 1\\&\cdots&\\1&&\end{pmatrix}$ is the matrix with ones on the off diagonal. Clearly $P$ is symmetric, so $w^T flip(x) = w^T P x = (Pw)^T x = flip(w)^T x = w^T x$

PS: If you should ever need invariance under arbitrary permutations, check out Deep-Sets

EDIT: Implementation wise, I think the easiest way would be to simply put

$$ \hat y = X\beta, \quad \beta=\frac{1}{2}(w+Pw),\quad L = \|y-\hat y\|_2^2 + \alpha\|\beta\|_2^2 $$

Note that $flip(\beta) = \beta$ for any $w$ (since $P^2=I$). Let $\tilde X = \frac{1}{2}X\cdot(I+P)$ (So $X\beta = \tilde X w$), then

$$ \frac{\partial}{\partial w} L = -2\tilde X^T (y-\tilde X w) + 2\alpha(I+P) w $$

Furthermore:

$$ \frac{\partial}{\partial w} L = 0 \iff \big(\tilde X^T\tilde X + \alpha (I+P)\big)w = \tilde X^Ty \iff \hat w = \big(\tilde X^T\tilde X + \alpha (I+P)\big)^{-1}\tilde X^Ty $$