I have 2 vectors $ a, b \in \mathbb{R}^{k} $, which can be thought of as feature vectors in machine learning.
By simple transformation (linear, affine, concatenate...), I want to combine them into a vector $ c \in \mathbb{R}^{l}, l < k $ and keep the most information.
If $ l \geq 2k $, then I think I can just concatenate them. But if $ l < k $, what should I do?
- Should I concat them, then multiply by a matrix to reduce dimension to $ l $?
- Or should I multiply each of them by a matrix to reduce dimension to $ l/2 $, then concat the result?
- Or should I just multiply each of them by a matrix to reduce dimension to $ l $, then averaging the result?
Or is it no matter what order I do?
Learn it! This is what autoencoders do.
Make a neural network with three layers: an input layer of dimension $2k$ neurons, a hidden layer of $l$ neurons and an output layer of $2k$ neurons. Train this neural network to simply predict the identity function on your dataset.
The result will be that the neural network learns a 'summary' of the data in $l$ features (the hidden layer) that can be used to reconstruct the input.
Now after training simply delete the output layer and use values of the hidden layer as your features.
Alternatively you can go with good old principal component analysis.