linear model matrix identification with least squares

166 Views Asked by At

I need to do a linear model identification using least squared method. My model to identify is a matrix $A$. My linear system is:

$[A]_{_{n \times m}} \cdot [x]_{_{m \times 1}} = [y]_{_{n \times 1}}$

where $n$ and $m$ define the matrix sizes. in my notation I define my known arrays $x$ and $y$ as vectors.

To identify $[A]$ I have a set of $p$ equations:

$[A] \cdot \vec{x}_1 = \vec{y}_1$

$[A] \cdot \vec{x}_2 = \vec{y}_2$

...

$[A] \cdot \vec{x}_p = \vec{y}_p$

knowing that my system is overdetermined ($p>n,m$) and that each pair of $\vec{x}$ and $\vec{y}$, is known, I want to identify my linear model matrix $[A]$ with least squares.

My approach:

I have aranged my known equations like above:

$[A] \cdot [\vec{x}_1\>\vec{x}_2\>...\>\vec{x}_p]=[\vec{y}_1\>\vec{y}_2\>...\>\vec{y}_p]$

My initial linear system becomes a matrix equation:

$[A]_{_{n \times m}} \cdot [X]_{_{m \times p}} = [Y]_{_{n \times p}}$

The problem:

A) Is this the right thing to do to find $[A]$ with the Moore-Penrose inverse of $[X]$?

In the simplest scalar case of $a \cdot x = b$, the different $(x_1, y_1)...(x_p, y_p)$ pairs are arranged in rows instead of columns which makes sense for me:

$[x_1 \> x_2 \> ... \> x_p]^T \cdot a = [y_1 \> y_2 \> ... \> y_p]^T$

This confuses me.

B) Also is least squares the right approach? I am not constrained by least squares.

2

There are 2 best solutions below

2
On BEST ANSWER

Normally when solving a least squares problem of a linear equation is would be of the form $A\,x = b$, with $A\in\mathbb{R}^{n \times m}$ and $b\in\mathbb{R}^{n}$ known and $x\in\mathbb{R}^{m}$ unknown. The least squares solutions for $x$ minimizes $\|A\,x-b\|$ or equivalently $\left(A\,x-b\right)^\top \left(A\,x-b\right)$. This has the solution $x=\left(A^\top A\right)^{-1} A^\top b$.

Your problem can also be formulated into this form by using vectorization and the Kronecker product. Namely $\mathcal{A}\,\mathcal{X} = \mathcal{Y}$, with $\mathcal{A}\in\mathbb{R}^{n \times m}$ unknown and $\mathcal{X}\in\mathbb{R}^{m \times p}$ and $\mathcal{Y}\in\mathbb{R}^{n \times p}$ known, can also be written as

$$ \underbrace{\left(\mathcal{X}^\top \otimes I\right)}_A \underbrace{\mathrm{vec}(\mathcal{A})}_x = \underbrace{\mathrm{vec}(\mathcal{Y})}_b, \tag{1} $$

with $I$ an identity matrix of $n \times n$. This can now be solved using the normal least squares solution, after which the solution for $\mathrm{vec}(\mathcal{A})$ can be reshaped to a matrix of appropriate size.

However when comparing this to your direct solution using the Moore-Penrose inverse directly on $\mathcal{X}$ it does seems to yield the same solution as with $(1)$. However it is not obvious to me that this should be the case, since the squares one wants to minimize can be formulated as $\mathrm{Tr}\left((\mathcal{A}\,\mathcal{X} - \mathcal{Y})^\top (\mathcal{A}\,\mathcal{X} - \mathcal{Y})\right)$ but the partial derivative with respect to $\mathcal{A}$ would also be annoying to formulate, since it is a matrix.

0
On

Your answer gave me a lot of insight. I will just rewrite your answer to confirm what I have understood and add something to it. So, my problem, as you well stated, can be defined by the Kronecker product and vectorisation because What I realy have is:

$$\begin{bmatrix} [A]&0&0&...\\ 0&[A]&0&...\\ &&\ddots\\ 0&0&...&[A] \end{bmatrix}_{n\cdot p\times m\cdot p} \begin{bmatrix} \vec{x_1}\\ \vec{x_2}\\ \vdots\\ \vec{x_p} \end{bmatrix}_{m\cdot p \times 1} = \begin{bmatrix} \vec{y_1}\\ \vec{y_2}\\ \vdots\\ \vec{y_p} \end{bmatrix}_{n\cdot p \times 1}\tag{1} $$ This is ground truth from my construction because I get $n\times p$ equations (and $n\times m$ unknowns so we need $p\ge m$). but then this is really equivalent to: $$ vec([A]_{_{n \times m}} \cdot [X]_{_{m \times p}}) = vec([Y]_{_{n \times p}}) \rightarrow vec([A]_{_{n \times m}} \cdot [X]_{_{m \times p}} \cdot [I]_{p}) = vec([Y]_{_{n \times p}}) \tag{2}$$ which is a known form of the kronecker product. I get: $$ ([I]_p\otimes [A]_{n\times m})vec([X]_{m \times p})=vec([Y]_{n\times p}) \tag{3}$$ or rearranging $(2)$: $ vec([A]_{_{n \times m}} \cdot [X]_{_{m \times p}}) = vec([Y]_{_{n \times p}}) \rightarrow vec([I]_{n}\cdot [A]_{_{n \times m}} \cdot [X]_{_{m \times p}}) = vec([Y]_{_{n \times p}})$ it is also equivalent to: $$ ([X]^T_{p\times m}\otimes [I]_{n})vec([A]_{n \times m})=vec([Y]_{n\times p}) \tag{4}$$ We can agree that $(1) (2) (3) (4)$ are equivalent

With this solid basis we can continue , as you well stated, the Least squares procedure by finding the minimum of:

$$J=\mathrm{Tr}\left((A\,X - Y)^\top (A\,X - Y)\right) \tag{5}$$ This is what I add: According to The Matrix Cookbook, more precisely the relation $(119)$ in Derivatives of Traces (or by working out and separating the Trace in 3 elements then computing the derivative of each element):

$$\frac{dJ}{dA}=2\,A\,X\,X^T-2\,Y\,X^T$$

So we end up having the same relation than in the Moore-Penrose derivative:

$$\frac{dJ}{dA}=0 \rightarrow A=Y\,X^T\,(X\,X^T)^{-1}$$

so in conclusion: $ A\,X=Y \rightarrow A=Y\,X^{-1}$, $X^{-1}$ being the Moore-Penrose Inverse