Could someone give me a concrete mathematical definition of a "linear translation"?

59 Views Asked by At

I'm analyzing a paper titled "Distributed Representations of Words and Phrases and their Compositionality". The paper deals with Natural Language Processing, so it has math concepts involved with the models used to perform NLP tasks. I ran across the term "linear translation", but I struggled to find a concrete definition of this term on wikipedia or on this forum. Could someone please define it?

The following is a snippet of where I found this for more context

Distributed representations of words in a vector space help learning algorithms to achieve better performance in natural language processing tasks by grouping similar words. One of the earliest use of word representations dates back to 1986 due to Rumelhart, Hinton, and Williams [13]. The follow up work includes applications to automatic speech recognition and machine translation [14, 7], and a wide range of NLP tasks [2, 20, 15, 3, 18, 19, 9].

Recently, Mikilov et al. [8] introduced the Skip-gram model, and efficient method for learning high quality vector representations of words from large amounts of unstructured text data. Unlike most of the previously used neural network architectures for learning word vectors, training of the Skipgram model (see Figure 1) does not involve dense matrix multiplications. This makes the training extremely efficient: an optimized single-machine implementation can train on more than 100 billion words in one day.

The word representations computed using neural networks are very interesting because the learned vectors explicitly encode many linguistic regularities and patterns. Somewhat surprisingly, many of these patterns can be represented as linear translations. For example, the result of a vector calculations vec("Madrid") - vec("Spain") + vec("France") is closer to vec("Paris") than to any other word vector [9,8].

(Link to screenshot: context)

1

There are 1 best solutions below

2
On BEST ANSWER

The answer is given in your snippet you show:

For example, the result of a vector calculations vec("Madrid") - vec("Spain") + vec("France") is closer to vec("Paris") than to any other word vector [9,8].

The operations $-$ and $+$ are those "linear translations", in contrast to rotations (mult). So the vector to the "capital of Spain", subtracted by the vector of "Spain", added the vector of "France" seems pretty close to the vector of "Paris".
See also https://en.wikipedia.org/wiki/Translation_(geometry)