How many dimensions does it take to model language for AI?

58 Views Asked by At

I was watching the video Using AI to Decode Animal Communication with Aza Raskin and he talks about converting semantic relationships between words into geometric relationships. I don't think he says it in that video (but I feel like he strongly implies it), but in a Guardian article, they explicitly state that they are using multi-dimensional geometry. This led me to wonder "how many dimensions".

I tried looking this up on my own, but my knowledge of math is limited to 10th grade, and I have zero knowledge of computer programming, so when I read papers that appear to be related, I am utterly lost. Also, I can't find the original paper by Raskin's team. I was hoping I could skim through them for something concrete like "it took x numbers of dimensions to model the English/Spanish/German/etc. language", but no such luck. I am not even sure if thinking about the dimensions in terms of real numbers makes sense. So, assuming the answer is as simple as a single number, how many dimensions did they use?

I suspect they need as many dimensions for every, single word as there are words in the entire language, but again, this is based on virtually no knowledge of math.

I understand that this may be a question for the Computer Science Stack Exchange. If that is the case, obviously vote to close and I'll put it over on there.

1

There are 1 best solutions below

0
On

Perhaps a simplified model would make this more intuitively clear.

Suppose you have an unknown real valued function $y = f(x)$ that you want to approximate using a set of data given by $(x_i, y_i)$ pairs. If the function is linear, then you only need two pairs and then solve two linear equations to get the coefficients of the linear function. If the function is given by a polynomial, then the pairs needed to solve for the coefficients of the polynomial must be at least the number of coefficients of the polynomial. The dimension of the space of possible polynomial functions in this case is equal to the number of coefficients needed to be specified. However, if the data is approximate to begin with and the function is known to be approximately linear, then you can use linear regression to get the best linear function to approximate the data. This reduces the dimension of the space of approximating functions while using a worse approximation. This tradeoff between complexity (in this case, the dimension of the space of approximations) and the goodness of the approximation is typical.

In the situation that you mention

semantic relationships between words into geometric relationships.

the semantic space of words is modeled by a space of mathematical functions as an approximation. The higher the dimension of the function space, it becomes a better model of the semantic space, but there is no definite dimension of the unknown underlying semantic space.