Least Angle Regression for Complex Rational Models

34 Views Asked by At

Background

I am doing a project and need to fit the following rational model, where the terms $\Psi$s are real but $y$, $p$, and $q$ are complex:

$$ y_{i} = \frac{\sum_{j=1}^{n_{p}}\Psi_{ij}p_{j}}{1+\sum_{k=1}^{n_{q}}\Psi_{ik}q_{k}} \phantom{x} \forall \phantom{x}i \in \left\{1,2,...,n\right\} $$

The model is chosen because the underlying physics dictate that $y$ will take this form. There are limited observations so I wish to perform sparse regression to avoid overfitting.

My Attempt So Far

I tried to simplify by multiplying the equation with its denominator and rearranging the terms to obtain the following:

$$ 0 = y_{i} - \sum_{j=1}^{n_{p}}\Psi_{ij}p_{j} + y_{i}\sum_{k=1}^{n_{q}}\Psi_{ik}q_{k} \phantom{x} \forall \phantom{x}i \in \left\{1,2,...,n\right\} $$

Then I define a diagonal matrix $\mathbf{M}$ such that $M_{ii}=y_{i}$, a "regressor" matrix $\mathbf{X}=\begin{bmatrix}\mathbf{\Psi}_{p}&-\mathbf{M}\mathbf{\Psi}_{q}\end{bmatrix}$, and the coefficient vector $\mathbf{\beta}=\begin{bmatrix}\mathbf{p}^{T}&\mathbf{q}^{T}\end{bmatrix}^{T}$. Consequently, I can represent the equation using matrix-vector notation:

$$ \mathbf{0} = \mathbf{y}-\mathbf{X}\mathbf{\beta} $$

Now that the equation takes the familiar form of linear regression, I thought it might be a good idea to use existing linear sparse regression algorithm e.g. LARS. Since the original algorithm is intended for real numbers, I modified it slightly to work with complex numbers:

Initialise all coefficients with zeros

Just like with LARS, the model is empty at the beginning and at each step the coefficients are modified $$\hat{\mathbf{\beta}}=\mathbf{0}\implies\hat{\mathbf{y}}=\mathbf{X}\hat{\mathbf{\beta}}=\mathbf{0}\implies\mathbf{e}=\mathbf{y}-\hat{\mathbf{y}}=\mathbf{y}$$

Select regressor(s)

The regressors are columns of $\mathbf{X}$. We choose column(s) $\mathbf{x}_{l}$ that maximize "covariance" $\left|\mathbf{x}_{l}^{H}\mathbf{e}\right|$. The chosen regressor(s) is stored in columns of $\mathbf{X}_{A}$

Compute the next addition to the model

The idea is to add coefficients of the regressors that have the largest covariance with the remaining error. Let the additional term be proportional to $\mathbf{u}_{A}=\mathbf{X}_{A}\mathbf{w}_{A}$. The weight $\mathbf{w}_{A}$ is chosen such that the covariances of all columns of $\mathbf{X}_{A}$ with the error decrease but remain equal. Normalising factor $\alpha$ is chosen such that $\|\mathbf{u}_{A}\|_{2}=1$

$$ \begin{aligned} \mathbf{w}_{A}&=\alpha\cdot\left(\mathbf{X}_{A}^{H}\mathbf{X}_{A}\right)^{-1}\mathbf{X}_{A}^{H}\mathbf{e} \\ \alpha &= \left[\left(\mathbf{X}_{A}^{H}\mathbf{e}\right)^{H}\left(\mathbf{X}_{A}^{H}\mathbf{X}_{A}\right)^{-1}\left(\mathbf{X}_{A}^{H}\mathbf{e}\right)\right]^{-0.5} \end{aligned} $$

Add $\gamma\cdot\mathbf{u}_{A}$ to the model

This step is equivalent to adding $\gamma\cdot\mathbf{w}_{A}$ to the coefficients associated with regressors that are chosen to be column of $\mathbf{X}_{A}$. The updated model and error are given below:

$$ \begin{aligned} \hat{\mathbf{y}}_{new} &= \hat{\mathbf{y}}_{old} + \gamma \mathbf{u}_{A} \\ &= \hat{\mathbf{y}}_{old} + \gamma \mathbf{X}_{A} \mathbf{w}_{A}\\ \\ \mathbf{e}_{new} &= \mathbf{e}_{old} - \gamma \mathbf{u}_{A}\\ &= \mathbf{e}_{old} - \gamma \mathbf{X}_{A} \mathbf{w}_{A} \end{aligned} $$

The scalar $\gamma$ is chosen to be the smallest positive step until a new column of $\mathbf{X}$, denoted $\mathbf{x}_{r}$ has equal covariance with the error of all columns of $\mathbf{X}_{A}$,denoted $\mathbf{x}_{s}$.

$$ \|\mathbf{x}_{s}^{H}\mathbf{e} - \gamma \mathbf{x}_{s}^{H}\mathbf{u}_{A}\|_{2} = \|\mathbf{x}_{r}^{H}\mathbf{e} - \gamma \mathbf{x}_{r}^{H}\mathbf{u}_{A}\|_{2} $$

This results in a quadratic equation:

$$ \begin{aligned} 0 &= a\gamma^{2} + b\gamma + c \\ \\ a &= \alpha^{2}\cdot\mathbf{e}^{H}\mathbf{x}_{s}\mathbf{x}_{s}^{H}\mathbf{e} - \mathbf{u}_{A}^{H}\mathbf{x}_{r}\mathbf{x}_{r}^{H}\mathbf{u}_{A} \\ b &= 2\Re{\left(\mathbf{u}_{A}^{H}\mathbf{x}_{r}\mathbf{x}_{r}^{H}\mathbf{e}\right)} -2\alpha\cdot\mathbf{e}^{H}\mathbf{x}_{s}\mathbf{x}_{s}^{H}\mathbf{e} \\ c &= \mathbf{e}^{H}\mathbf{x}_{s}\mathbf{x}_{s}^{H}\mathbf{e} - \mathbf{e}^{H}\mathbf{x}_{r}\mathbf{x}_{r}^{H}\mathbf{e} \end{aligned} $$

We can solve the quadratic equation and select the smallest $\gamma$. Then we go back to step "Select regressor(s)" and loop until the error is small enough

Results So Far

I tried the algorithm to fit the model for mock observations using $4$ uniform random samples of $x$, the observations and models respectively:

$$y=\frac{1}{1+0.2x} \implies \hat{y}=\frac{0.98 + 0.0 x + 0.0 x^{2}}{1 + 0.199 x + 0.0 x^{2} + 0.0 x^{3}}$$ $$y=\frac{1}{1+0.2x^{2}} \implies \hat{y}=\frac{1.12 + 0.0 x + 0.0 x^{2}}{1 + 0.0 x + 0.198 x^{2} + 0.0 x^{3}}$$

The algorithm managed to consistently yield "accurate" model upon multiple repetitions. However, it failed for the following:

$$ y = \frac{1}{1+0.2x+0.4x^{2}} $$

Upon repetitions, it yielded different non-zero coefficients for different terms each time. I though it may be because I have more coefficients and need more observations, however, increasing number of samples did not change anything.

Question

Is there any further modification I can apply to the algorithm to improve its performance, or is there any other algorithms that are better suited for this kind of problem?