I have two feature matrices $\textbf{X}$ and $\textbf{Y}$ which I encoded through one-hot encoding the rows of two feature matrices $\textbf{X'}$ and $\textbf{Y'}$. Thus, they are sparse with a few $1$'s in each row. I'm trying to solve the following equation
$(\textbf{X}\otimes\textbf{Y})\textbf{w}=\textbf{z}$
Using the matrix $(\textbf{X}\otimes\textbf{Y})$ I get 0 error in training. However, this matrix is very large. Is there any other matrix I can use as a basis?
I have tried pairwise multiplication and concatenation of the rows in $\textbf{X},\textbf{Y}$, i.e., $\phi(\textbf{x}_i,\textbf{y}_j)=[\textbf{x}_i,\textbf{y}_j]$ but it doesn't work very well. Is there any function $\phi(\textbf{X},\textbf{Y})$ that gives me a matrix with a small number of columns I can use to approximate the space spanned by $\textbf{X}\otimes\textbf{Y}$? I would like to avoid using methods such as PCA or kernels.
Couldn't you use the fact that ${\rm vec}\{\mathbf A \mathbf X \mathbf B^{\rm T}\} = (\mathbf B \otimes \mathbf A) \cdot {\rm vec}\{\mathbf X\}$ for matrices $\mathbf A, \mathbf X, \mathbf B$ of compatible dimensions?
This way you should be able to reformulate your equation into something like $\mathbf Y \cdot \mathbf W \cdot \mathbf X^{\rm T} = \mathbf Z$, which to solve for $\mathbf W$ you never need to build $\mathbf X \otimes \mathbf Y$ explicitly. You can invert away $\mathbf X$ and $\mathbf Y$ separately. Here, $\mathbf W$ and $\mathbf Z$ are matrices such that your $\mathbf w$ and $\mathbf z$ are their vectorized versions.