Reverse engineer linear model $Ax+c=y$. Given $x$ and $y$, find $A$ and $c$

633 Views Asked by At

Consider the standard linear model linear, given by the matrix equation

\begin{align*} \mathbf{A}\vec{b}_{i}+\vec{c} & =\vec{y}_{i}\,, \end{align*} where vector $\vec{b}_{i}$ denotes the input to the model, $\vec{y}_{i}$ the output of the model, while $\mathbf{A}$ and $\vec{c}$ are the matrix and constant-offset vector of the linear model. Typically, in least squares, $\mathbf{A}$, $\vec{c}$, and $\vec{y}_{i}$ are known, and one wishes to find $\vec{\beta}_{i}$.

Here, instead, we wish to consider the model matrix and vector, $\mathbf{A}$ and $\vec{c}$, as a black box that we can empirically test. We run $N$ experiments with a set of known inputs $\vec{b}_{i}\in\left\{ \vec{b}_{1},\ldots,\vec{b}_{N}\right\} $ and a corresponding set of known, observed outcomes $\vec{y}_{i}\in\left\{ \vec{y}_{1},\ldots,\vec{y}_{N}\right\} $. Based on the known vector data, we wish to ``reverse-engineer'' the black-box consisting of the unknown $\mathbf{A}$ and $\vec{c}$.

This seems like a textbook problem that should be treated somewhere, and probably has a name. I hope someone could help educate me on this and point me to a reference or textbook.

In the absence of that, let me propose an approach to the solution, but perhaps there is a much better way? For each data set $i$, we have a matrix equation like this, which, here, we take to be 3 dimensional,

\begin{align} \begin{pmatrix}A_{xx} & A_{xy} & A_{xz}\\ A_{yx} & A_{yy} & A_{yz}\\ A_{zx} & A_{zy} & A_{zz} \end{pmatrix}\begin{pmatrix}b_{i,x}\\ b_{i,y}\\ b_{i,z} \end{pmatrix}+\begin{pmatrix}c_{x}\\ c_{y}\\ c_{z} \end{pmatrix} & =\begin{pmatrix}y_{i,x}\\ y_{i,y}\\ y_{i,z} \end{pmatrix}\,.\label{eq:z1} \end{align} We could vectorize the matrix and constants, (and slightly change the notation) \begin{align*} \boldsymbol{\beta} & \equiv\left(A_{xx},A_{xy},\ldots,A_{zz},c_{x},c_{y},c_{z}\right)^{\mathrm{T}},\\ \mathbf{X}_{i} & \equiv\begin{pmatrix}b_{i,x} & b_{i,y} & b_{i,z} & & & & & & & 1\\ & & & b_{i,x} & b_{i,y} & b_{i,z} & & & & & 1\\ & & & & & & b_{i,x} & b_{i,y} & b_{i,z} & & & 1 \end{pmatrix}\\ \mathbf{y}_{i} & \equiv\begin{pmatrix}y_{i,x}\\ y_{i,y}\\ y_{i,z} \end{pmatrix}\,. \end{align*} In view of this, the equation above takes the form \begin{eqnarray*} \mathbf{X}_{i}\boldsymbol{\beta} & = & \mathbf{y}_{i}\,\forall i\,. \end{eqnarray*} To take account of all the data, we can vertically concatenate all the matrix equations, and define the partitioned matrix of known inputs to the system \begin{eqnarray*} \mathbf{X} & \equiv & \begin{pmatrix}\mathbf{X}_{1}\\ \hline \vdots\\ \hline \mathbf{X}_{N} \end{pmatrix}, \end{eqnarray*} which is a $3N\times12$ matrix, since there are 3 rows per data set, and the number of unknown are 12, and there are 12 datasets. The corresponding known output data matrix is \begin{align*} \mathbf{y} & \equiv\begin{pmatrix}\mathbf{y}_{1}\\ \hline \vdots\\ \hline \mathbf{y}_{N} \end{pmatrix}\,. \end{align*} Thus the complete matrix equation is in the standard least-squares form \begin{eqnarray*} \mathbf{X}\boldsymbol{\beta} & = & \mathbf{y}\,. \end{eqnarray*} This allows us to employ the standard method of least squares to an overdetermined system of equations

\begin{eqnarray*} \hat{\boldsymbol{\beta}} & = & (\mathbf{X}^{{\rm T}}\mathbf{X})^{-1}\mathbf{X}^{{\rm T}}\mathbf{y}, \end{eqnarray*} where $\hat{\boldsymbol{\beta}}$ is the least square solution vector, which contains all the information necessary to find out what $\mathbf{A}$ and $\vec{c}$ are in the first equation.

Would this be the advisable way to solve this kind of problem on a computer program? (e.g., numpy in python). This question seems like a textbook kind of problem, but after some digging around I couldn't find anything. Perhaps you can educate me on what this type of problem is called, and what a good go to reference to read up on it would it. Thank you, this would be much appreciated.

3

There are 3 best solutions below

3
On BEST ANSWER

Augment the matrix $A$ with the vector $c$ as an extra column. Then augment each vector $b_i$ with an extra component equal to unity. The $y_i$ vectors are unchanged.

Now the equations read $$Ab_i = y_i$$ where the quantities on the LHS are augmented as described above.

But the indexed vectors are really just columns of their constituent matrices, i.e. $$\eqalign{ b_i &= Be_i \cr y_i &= Ye_i \cr }$$

And so we have arrived at a purely matrix equation, which can be solved by standard techniques. $$\eqalign{ AB &= Y \cr ABB^T &= YB^T \cr A &= YB^T(BB^T)^{-1} \cr }$$

3
On

It can be much simpler. Use $n+1$ test vectors, one of all zeroes, and $n$ each with a single $1$ and the rest $0$. Put all these vectors into a $n$ by $n+1$ matrix called $\mathrm{X}$ that looks like $[\mathrm{I}\ 0]$ or

$$\mathrm{X} = \left[\begin{matrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ \end{matrix}\right]$$

Plug each of the vectors into the transformation, and look at the result like $\left[M\ c\right]$. $c$ is the $c$ from the transformation equation, and $\mathrm{A} = M - c^t \mathrm{I}$

0
On

We have the following linear model

$$\rm y = A x + b$$

where $\mathrm x \in \mathbb R^n$ and $\mathrm y \in \mathbb R^m$, $\mathrm A \in \mathbb R^{m \times n}$ and $\mathrm b \in \mathbb R^m$. Vectorizing, we obtain an underdetermined system of $m$ linear equations in $m n + m = m (n+1)$ variables

$$\begin{bmatrix} \mathrm x^\top \otimes \mathrm I_m & \mathrm I_m\end{bmatrix} \begin{bmatrix} \mbox{vec} (\mathrm A)\\ \mathrm b\end{bmatrix} = \mathrm y$$

Since the matrix has full row rank for all $\rm x$, the least-norm solution of the linear system is

$$\begin{array}{rl} \begin{bmatrix} \mbox{vec} (\mathrm A_{\text{LN}})\\ \mathrm b_{\text{LN}}\end{bmatrix} &:= \begin{bmatrix} \mathrm x \otimes \mathrm I_m \\ \mathrm I_m\end{bmatrix} \left( \begin{bmatrix} \mathrm x^\top \otimes \mathrm I_m & \mathrm I_m\end{bmatrix} \begin{bmatrix} \mathrm x \otimes \mathrm I_m \\ \mathrm I_m\end{bmatrix} \right)^{-1} \mathrm y\\ &\,= \begin{bmatrix} \mathrm x \otimes \mathrm I_m \\ \mathrm I_m\end{bmatrix} \left( (\mathrm x^\top \otimes \mathrm I_m) (\mathrm x \otimes \mathrm I_m) + \mathrm I_m \right)^{-1} \mathrm y\\ &\,= \begin{bmatrix} \mathrm x \otimes \mathrm I_m \\ \mathrm I_m\end{bmatrix} \left( (\mathrm x^\top \mathrm x \otimes \mathrm I_m) + \mathrm I_m \right)^{-1} \mathrm y\\ &\,= \frac{1}{ \| \mathrm x \|_2^2 + 1} \begin{bmatrix} \mathrm x \otimes \mathrm I_m \\ \mathrm I_m\end{bmatrix} \mathrm y\\ &\,= \frac{1}{ \| \mathrm x \|_2^2 + 1} \begin{bmatrix} \mathrm x \otimes \mathrm y \\ \mathrm y\end{bmatrix}\end{array}$$

Un-vectorizing, we obtain

$$\boxed{\begin{array}{rl} \mathrm A_{\text{LN}} &:= \color{blue}{\frac{1}{ \| \mathrm x \|_2^2 + 1} \, \mathrm y \mathrm x^\top}\\\\ \mathrm b_{\text{LN}} &:= \color{blue}{\frac{1}{ \| \mathrm x \|_2^2 + 1} \, \mathrm y}\end{array}}$$

Verifying,

$$\mathrm A_{\text{LN}} \mathrm x + \mathrm b_{\text{LN}} = \frac{1}{ \| \mathrm x \|_2^2 + 1} \left( \mathrm y \mathrm x^\top \mathrm x + \mathrm y \right) = \left(\frac{ \| \mathrm x \|_2^2 + 1 }{ \| \mathrm x \|_2^2 + 1 } \right) \mathrm y = \mathrm y$$