How to solve $\rm D^TDXbb^T = D^Tab^T$?

597 Views Asked by At

Let $a$ and $b$ be two vectors and let $D$ and $X$ be two matrices. Minimize the following cost function with respect to $X$

$$E = \| a - DXb \|_2^2$$

My work:

$$ E = (a-DXb)^T(a-DXb) = a^Ta -2a^TDXb + b^TX^TD^TDXb. $$

next I derive $E$ with respect to $X$ according to the matrix cookbook

$$ \frac{\partial E}{\partial X} = -2D^Tab^T+ 2D^TDXbb^T $$

Equating the derivative to zero yield the following equation

$$ D^TDXbb^T = D^Tab^T \tag1 $$

My question: How can I solve equation $(1)$?

Thanks.

3

There are 3 best solutions below

0
On

As H. H. Rugh pointed out there will not be a unique solution to the problem. Here is one way to numerically compute one solution $X$ if $D$,$a$ and $b$ are known.

Let $D$ be $m \times p$, $X$ be $p \times q$, and $b$ be $q \times 1$ so that $a$ is $m \times 1$.

Let $X=(x_{ij})_{1 \leq i \leq p, 1 \leq j \leq q}$ where the $x_{ij}$'s are to be determined.

For $ 1 \le i \le p$ and $ 1 \leq j \leq q$ let $E_{ij}$ denote the $p \times q$ matrix with $1$ at position $i,j$ and $0$ elsewhere and let $V_{ij}$ denote the $m \times 1$ vector $DE_{ij}b$. $V_{ij}$'s are all known.

So our task is to choose $x_{ij}$ such that $\|\sum_{i=1}^{p}\sum_{j=1}^{q} x_{ij}V_{ij} - a\|$ is as small as possible.

Let $V=[V_{11} \dots V_{1q}\ V_{21} \dots V_{2q} \dots V_{p1} \dots V_{pq}]$ and $X^1 = [x_{11} \dots x_{1q}\ x_{21} \dots x_{2q} \dots x_{p1} \dots x_{pq}]^T$. $V$ is a $ m \times pq$ matrix and $X^1$ is a $pq \times 1$ vector, and we have to choose $X^1$ to minimize $\|VX^1 - a\|$ and one such $X^1$ is given by $X^1 = V^\dagger a$ where $V^\dagger$ is the Moore-Penrose inverse of $V$. $X^1$ can be used to construct an $X$ and this leads to an $X$ with smallest possible value of $\texttt{trace}(X^TX).$

0
On

Let's find the least squares solution of the original equation $$DXb = a$$ Start by vectorizing the equation and solving for $X$ $$\eqalign{ (b^T\otimes D)\,{\rm vec}(X) &= {\rm vec}(a) \cr {\rm vec}(X) &= (b^T\otimes D)^+\,{\rm vec}(a) \cr &= (b^{+T}\otimes D^+)\,{\rm vec}(a) \cr &= {\rm vec}(D^+ab^{+}) \cr }$$ where $b^+$ denotes the Moore-Penrose inverse of $b$.

De-vectorizing yields the least squares solution $$\eqalign{ X &= D^+ab^{+} \cr }$$ More generally, we can add terms from the nullspace and still satisfy the original equation. $$\eqalign{ X &= D^+ab^{+} + (I-D^+D)M + N(I-bb^+) \cr }$$ where $(M,N)$ are arbitrary matrices.

0
On

We have the following matrix equation

$$\mathrm D \mathrm X \mathrm b = \mathrm a$$

where $\mathrm b$ is a nonzero vector. If $\mathrm D$ has full row rank, then $\mathrm D \mathrm D^{\top}$ is invertible and, thus, a solution to the matrix equation above is

$$\bar{\mathrm X} := \mathrm D^{\top} (\mathrm D \mathrm D^{\top})^{-1} \mathrm a (\mathrm b^{\top} \mathrm b)^{-1} \mathrm b^{\top}$$

where $\mathrm D^{\top} (\mathrm D \mathrm D^{\top})^{-1}$ is a right inverse of $\mathrm D$ and $(\mathrm b^{\top} \mathrm b)^{-1} \mathrm b^{\top}$ is a left inverse of $\mathrm b$. Note that

$$\mathrm D \bar{\mathrm X} \mathrm b = \underbrace{\mathrm D \mathrm D^{\top} (\mathrm D \mathrm D^{\top})^{-1}}_{=\mathrm I} \mathrm a \underbrace{(\mathrm b^{\top} \mathrm b)^{-1} \mathrm b^{\top} \mathrm b}_{=1} = \mathrm a$$

Left-multiplying both sides of the original matrix equation by $\mathrm D^{\top}$, we obtain a new matrix equation

$$\mathrm D^{\top} \mathrm D \mathrm X \mathrm b = \mathrm D^{\top} \mathrm a$$

If $\mathrm D$ has full column rank, then $\mathrm D^{\top} \mathrm D$ is invertible and, thus, a solution to the new matrix equation is

$$\hat{\mathrm X} := (\mathrm D^{\top} \mathrm D)^{-1} \mathrm D^{\top} \mathrm a (\mathrm b^{\top} \mathrm b)^{-1} \mathrm b^{\top}$$

Note that

$$\mathrm D^{\top} \mathrm D \hat{\mathrm X} \mathrm b = \underbrace{\mathrm D^{\top} \mathrm D (\mathrm D^{\top} \mathrm D)^{-1}}_{= \mathrm I} \mathrm D^{\top} \mathrm a \underbrace{(\mathrm b^{\top} \mathrm b)^{-1} \mathrm b^{\top} \mathrm b}_{=1} = \mathrm D^{\top} \mathrm a$$

However,

$$\mathrm D \hat{\mathrm X} \mathrm b = \mathrm D (\mathrm D^{\top} \mathrm D)^{-1} \mathrm D^{\top} \mathrm a (\mathrm b^{\top} \mathrm b)^{-1} \mathrm b^{\top} \mathrm b = \underbrace{\mathrm D (\mathrm D^{\top} \mathrm D)^{-1} \mathrm D^{\top}}_{=: \mathrm P_{\mathrm D}} \mathrm a = \mathrm P_{\mathrm D} \mathrm a$$

where $\mathrm P_{\mathrm D}$ is a projection matrix that projects onto the column space of $\mathrm D$. Note that $\hat{\mathrm X}$, one solution to the new matrix equation, is

  • only a solution to the original matrix equation if $\mathrm P_{\mathrm D} \mathrm a = \mathrm a$, i.e., if $\mathrm a$ is in the column space of $\mathrm D$.
  • also a solution to the matrix equation $\mathrm D^{\top} \mathrm D \mathrm X \mathrm b \mathrm b^{\top} = \mathrm D^{\top} \mathrm a \mathrm b^{\top}$, whose solution set is the set of least-squares solutions, as shown by the OP in the question statement.

Note also that

$$\| \mathrm a - \mathrm D \hat{\mathrm X} \mathrm b \| = \| \mathrm a - \mathrm P_{\mathrm D} \mathrm a \| = \| (\mathrm I - \mathrm P_{\mathrm D}) \mathrm a\|$$

where $\mathrm I - \mathrm P_{\mathrm D}$ is a projection matrix that projects onto the orthogonal complement of the column space of $\mathrm D$, which is the left null space of $\mathrm D$.

If $\mathrm D$ has neither full column rank nor full row rank, then one may resort to using pseudoinverses.