Multivariate Linear regression with fewer trained parameters

61 Views Asked by At

I have a multivariate linear regression problem to solve for identifying a dynamic greybox model.

Normally I would formulate it in this form:

Given $n$ number of observations, $m$ response variables, $p$ predictors, we have the regression model:

$y_{ik} = b_{0k} + \sum_{j=1}^p b_{jk} x_{ij} + e_{ik}$ for $i \in (1,..,n)$, and $k \in (1,..,m)$

in matrix form:

$ Y = X B + E $ with $ Y \in \mathbb{R}^{n\times m}$,$X \in \mathbb{R}^{n\times (p+1)} $, $B \in \mathbb{R}^{(p+1) \times m} $, $E \in \mathbb{R}^{n\times m} $

The solution of the least square problem can be found with: $$\hat{B} = (X'X)^{-1}X'Y$$

Instead what I am searching for is a solution in the least square sense with a fewer number of parameters than $(p+1) \times m$.

I can define the vector of parameters as the vector $u$ that if multiplied by $G_r$ gives me the row $r$ of $B$. (The matrices $G_1,G_2,...$ are known and they come from the greybox formulation). In matrix notation: $$B = \sum_{q=1}^{n_{param}} z_q u^T G_q$$ where $z_q$ is the $q^{th}$ column of identity matrix $I_m$ and $u$ is the vector of $n_{param}$ parameters.

An example of $B$ is: $$B_{example}= \begin{bmatrix} u_1 & -u_1 & 0 & 0 \\ 0 & u_2 & -u_2 & 0 \\ 0 & 0 & u_3 & -u_3 \\ -u_4 & 0 & 0 & u_4 \\ u_5 & u_6 & u_7 & u_8 \end{bmatrix}$$

How can I find the vector $x$ of parameters in practice?

1

There are 1 best solutions below

3
On

If you whant to extract one equation from your system $ Y = X B + E $ of $m$ equations you can achieve this by multiplying the system on the right with the $i^{th}$ column $e_i$ of the identity matrix $I_m$. This yields $$ Y_i = XB_i + E_i, $$ on which you can apply linear least squares to obtain $\hat B_i=(X^TX)^{-1}X^TY_i$ for $i=1,...,m.$ These $m$ solutions are the same as those obtained for the whole system.
If you want to impose (and test) restrictions on $B_i$ (within a given equation) this can then be achieved through restricted least squares by imposing contraints like $R_iB_i=r_i$. This yields
$$\hat B_{Ri}=\hat B_{i}-(X^TX)^{-1}R_i^T[R_i(X^TX)^{-1}R_i^T]^{-1}(R_i\hat B_{i}-r_i). $$ You can also impose restrictions between the $B_i$ of differents equations if you consider $BR=r$.