Feature vector concatenation

952 Views Asked by At

In machine learning (and not only), it is very common to see concatenation of different feature vectors into a single one of higher dimension which is then processed by some function. For example, feature vectors computed for an image at different scales are concatenated to form a multi-scale feature vector which is then further processed.

However, combining vectors by concatenation seems somehow artificial to me (we simply stack them and then use a function that operates on a higher-dimensional space):

$$\mathbf{z} = \mathbf{v} \oplus \mathbf{w} = [v_1, \dots, v_n]^T \oplus [w_1, \dots, w_m]^T = [v_1,\dots, v_n, w_1,\dots, w_n]^T \in \mathbb{R}^{n+m},$$

$$f(\mathbf{z}): \mathbb{R}^{n+m} \to \mathbb{R}^{k}.$$

First, I would like to ask if there is a formal definition of concatenation as a mapping to higher-dimensional space (perhaps in form of a matrix multiplication). What can be said about the space where the concatenated vectors live? In particular, if the second vector is fixed, the points represented by the first vector will be mapped to a higher-dimensional space, but they will be confined to the subspace $\mathbb{R}^{n}\subset \mathbb{R}^{n+m}$ perpendicular to the rest of the axes $n+1,\dots m$. It's like manifold embedding.

Finally, I was wondering if there are alternatives to concatenation for effective combination of feature vectors?

2

There are 2 best solutions below

2
On BEST ANSWER

Is there a formal definition of concatenation as a mapping to higher-dimensional space (perhaps in form of a matrix multiplication)

Formally, the mapping would map a pair of vectors into a higher dimensional vector, so it would be a mapping

$$C:\mathbb R^n\times \mathbb R^m \to \mathbb R^{n+m}$$

but the matrix of the mapping, as always, depends on the basis you use in both the domain and codomain. The matrix, if you choose the obvious basis vectors, is simply the identity matrix. In other words, the mapping is, in all honesty, a pretty boring one as far as mathematics are concerned.

In fact, the vector spaces $\mathbb R^n\times\mathbb R^m$ and $\mathbb R^{n+m}$ are so similar that in linear algebra, they are usually regarded as "the same" space. That's actually the reason for the notation $\mathbb R^{k}$ in general.

If you want to be strictly formal, you can define $A\times\B\times C$ as $(A\times B)\times C$, or you can define it as $A\times (B\times C)$. And the honest truth is that nobody cares which definition you use, because all results are perfectly valid using either one. Similarly, you can define $\mathbb R^n$ as

$$\mathbb R^1=\mathbb R\\ \mathbb R^{k+1}=\mathbb R^k\times\mathbb R$$

or as $$\mathbb R^1=\mathbb R\\ \mathbb R^{k+1}=\mathbb R\times \mathbb R^k$$

and it's all the same from there on.


Finally, I was wondering if there are alternatives to concatenation for effective combination of feature vectors?

What do you mean by effective? This question can very quickly fall out of scope of mathematics, as "effective" is often defined by how real-world-useful it is.

Still, you may want to look up PCA (principle component analysis) or other methods of dimensionality reduction. But as I said, this is more a question for https://datascience.stackexchange.com/

0
On

By definition of the direct sum of two vector spaces, say $\mathbb{R}^n$ and $\mathbb{R}^m$, the direct sum $\mathbb{R}^n\oplus\mathbb{R}^m$ is the set of elements $(u,v)$, where $u\in\mathbb{R}^n$ and $v\in\mathbb{R}^m$. Then you can define an isomorphism from $\mathbb{R}^n\oplus\mathbb{R}^m\rightarrow\mathbb{R}^{n+m}$ by the map $(u,v)\mapsto(u_1,\ldots,u_n,v_1,\ldots,v_m)$. The constructions that look "obvious", like simple stacking, are often simply described by isomorphisms. These are the maps that tells you when two structures are the same, sometimes up to obvious notation. Hope this helps :)