What is Vector Offset?

5.4k Views Asked by At

I am reading a paper on computational linguistics. Here I have problems in some basic terms of the paper. I searched internet about them, but I could not find an appropriate answer about the definitions of these terms.

Here is the Paper link

Reading the abstract these terms look to fundamental about this paper:

  • Vector Offset
  • Baseline
  • A consistent vector offset?

This is the abstract:

The offset method for solving word analogies has become a standard evaluation tool for vector-space semantic models: it is considered desirable for a space to represent semantic relations as consistent vector offsets. We show that the method’s reliance on cosine similarity conflates offset consistency with largely irrelevant neighborhood structure, and propose simple baselines that should be used to improve the utility of the method in vector space evaluation.

I do not get exactly what these terms stand on. And what exactly means for a vector offset to be consistent?

Thank you in advance.

1

There are 1 best solutions below

0
On BEST ANSWER

Vector offset

Let vector $a$ correspond to word "debug", and vector $a^\star$ to word "debugging". Their difference, $a^\star - a$, is the vector offset that corresponds to the linguistic relationship the "-ing" suffix denotes.

A consistent vector offset

If the vector offset is consistent, then if vector $b$ corresponds to word "scream", then vector $b + (a^\star - a) = b + a^\star - a$ is likely to correspond to word "screaming". Similarly for any other word that has a related word with the same linguistic "-ing" relationship.

If the vector offset is not consistent, then vector $b^\star$ corresponding to word "screaming" is more likely to differ, $b^\star \ne b + (a^\star - a)$.

Baseline

It is their name for a set of alternative methods, or helper functions, of finding words with a specific linguistic relationship (not relying on the properties of the vector difference between them) to a known one. They call them "baselines", because they use them as "standards" or baselines, to compare the vector offset method results to.

The paper lists five baselines: "vanilla", "add", "only-b", "ignore-a", "add-opposite", and "multiply". The "vanilla" one is the direct vector offset one: $$\bbox{x^\star = \operatorname*{argmax}_{x^\prime} \frac{x^\prime \cdot ( a^\star - a + x )}{\left\lVert x^\prime \right\rVert \; \left\lVert a^\star - a + x \right\rVert}}$$ i.e., $x^\star$ is the one among the known $x^\prime$ that maximizes the cosine angle between $x^\prime$ and the estimated vector from the known $x$, using vector offset ($a^\star - a$ that corresponds to the linguistic relationship between the known word $x$ and the word $x^\star$ we are looking for). The four others are variations.

The two reverse ones ("reverse (add)" and "reverse (multiply)") are the same as "add" and "multiply", respectively, except when the expression is used to find $x$ when $x^\star$ is known, instead.


It might be a little clearer to write the baselines using better variable names.

For example, let's say $y = x + a^\star - a$ is the estimated vector for the word we are looking for. It is related to $x$ the same way $a^\star$ is related to $a$, assuming the vectors are defined in a way that vector offsets are consistent. If $w_i$ are all known words, then $$\bbox{x^\star = \operatorname*{argmax}_{w_i} \frac{w_i \cdot y }{\left\lVert w_i \right\rVert \; \left\lVert y \right\rVert}}$$ i.e. $x^\star$ is the $w_i$ that has the largest angle cosine to our estimated vector $y$.

The other baselines just change how the right side is calculated.