I've been reading up on word embeddings and there was one sentence that confused me a bit:
..handled words not seen during training by learning a linear transformation between their RNN word embedding space and a larger word embedding such as word2vec.
I'm familiar with linear transformations but don't understand how we are learning a linear transformation.
Any help would be much appreciated.