Is there a clean linear algebra matrix or scalar form for this?

71 Views Asked by At

So let's say I have the following matrix equation to produce image $I$: $$ I = W\cdot U \cdot Reshape(V \cdot S)\\ I \in \mathbb{R}^{p \times 1}\\ W \in \mathbb{R}^{p \times 2n}\\ U \in \mathbb{R}^{2n \times kn}\\ V \in \mathbb{R}^{k \times 50}\\ S \in \mathbb{R}^{50 \times n}\\$$

$Reshape()$ is an operation that vectorizes from $(k\times n)$ to $(kn \times 1)$. The worst part is that $U$ is actually an $n\times n$ diagonal block matrix where the blocks of size $2\times k$ are found along the diagonal and everywhere else is zero...

This is a linear regression problem (need to alternatively update $W,U,V$) I need to code for and am having trouble coming up with either a clean matrix solution form or scalar summation solution form for $W, U, V$ each. Oh and there are $m$ training images $I$ and $m$ "input" vectors $S$. Is this simply not tractable (is the only way to do this via a linear neural network)?

Edit: Reshape() operator is just same as the column-stacking vectorization operator

1

There are 1 best solutions below

5
On BEST ANSWER

I will use $\mathcal I$ for the image vector, and reserve $I$ for identity matrices. Per the discussion in the comments, we have $$ \mathcal I = \sum_{j=1}^n W(e_j \otimes (U_jVSe_j)). $$ If we break $W$ up into $W = \sum_{q=1}^n e_q^T \otimes W_q$, i.e. if we take $W_1,\dots,W_n$ to be the block columns of $W$, then we have $$ \mathcal I = \sum_{j=1}^n \sum_{q=1}^n (e_q^T \otimes W_q)(e_j \otimes (U_jVSe_j)) = \sum_{j=1}^n (W_jU_jVSe_j). $$ We can now solve this equation for any particular $U_q$ by considering the equation $$ W_qU_qVSe_q = \mathcal I - \sum_{j\neq q} (W_jU_jVSe_j). $$ We can solve for $V$ by writing the equation as $$ \mathcal I = \left(\sum_{j = 1}^n (Se_j)^T \otimes (W_j U_j)\right) \operatorname{vec}(V). $$


Recap/derivation of information conveyed in comments on the question:

Let $U_i$ denote the $i$th diagonal block of $U$. We can write $$ U = \sum_{j=1}^n E_{jj} \otimes U_j $$ where $E_{jj}$ denotes the $n\times n$ matrix with a $1$ in the $j,j$ entry and zeros elsewhere, and $\otimes$ denotes the Kronecker product. With that, we have $$ I = \sum_{j=1}^n W(E_{jj} \otimes U_j) \operatorname{vec}(VS)\\ = \sum_{j=1}^n W\operatorname{vec}(U_jVSE_{jj})\\ = \sum_{j=1}^n W\operatorname{vec}(U_jVSe_je_j^T)\\ = \sum_{j=1}^n W(e_j \otimes (U_jVSe_j)) $$

If $\mathcal I$ denotes the matrix whose $q$th column is $\mathcal I_q$, then we have $$ \mathcal I = \sum_{j=1}^n\sum_{q = 1}^m [W(e_j^{(n)} \otimes (U_jVS_qe_j))]\cdot [e_q^{(m)}]^T $$