The problem is simple: first decompose a training set using non-negative matrix factorisation (NMF), which yields W(so-called feature matrix) and H, allowing approximate dot(W, H)==data_train. Then, I want to find if samples in a unseen testing set (lets say data_test) have features presented in W i.e. find H' which assign test samples their labels/classifications/whatever the term.
I believe there should be a neat way to do this, like we can train a Gaussian mixture model and use it to predict labels. However tried the following methods/measurements on data_train (which means I can compare the results with ground truth H):
1) pseudoinverse to solve H' in dot(W, H')=data_test;
2) same equation, non-negative least squares (NNLS) to solve H';
3) use W as a matrix encoding probability distribution, calculate expectations for samples using according entries in W and data_test, set a threshold for it;
4) spearman correlation between samples and W, set a threshold for it;
and performances are kind of bad. H plotted as the blue line. They do have some consents on several peaks albeit overall performances are terrible (~pearson(H, H')==0.3).
Decomposition process of NMF is rather stable but seems like I just can't 'reverse' it, even when try to find H' using NNLS (which I think has a similar algorithm to NMF in converging part). In other words, what I'm confused with is that existing of different pairs of W and H (given by that fact data=dot(W,inverse(Q),Q,H) for random matrix Q) do make sense, but shouldn't I be able to converge to a place near the local minima of H, if W is fixed?
To sum up, my questions are:
1) is it even possible to find a reasonable H' (or to say, reproduce H given W and data)?
2) if it's not the case, how should I interpret W? In some biological papers NMF is used on find signatures (W) in genome data (e.g. matrix contains expression level catalogue), but if W can't be used to make predictions or be compared with each other, wouldn't the signatures meaningless other than some fancy patterns?
Did some test and found out what's happening.
First of all, NMF should be able to find
WorHgivenV, e.g. set update_H toFalsein pythonsklearn.decomposition.non_negative_factorization, and it is stable. The main issue is that, we do haveW inverse(Q) Q H, however NMF itself is not robust in the way that if we feed the model withdot(W, inverse(Q))asW, seems it will not convergeHtodot(inverse(Q), H)(strategy: minimize L2 norm). My data actually has been tweaked by aQ.Though not sure if this is a rational explanation, as the model would converge to some other random local minimas don't make a lot of sense to me.