What is the mathematical notation for dependency length calculation algorithm?

156 Views Asked by At

I'm doing a computational linguistic research in python programming. I have written an algorithm that calculate dependency length of any sentence, but I won't to describe it in a simple statistical notation. The idea is simple:

Any sentence is a set, whereas, a word in any of those sentences is an element of that sentence. Thus; $x$ is a word and $A$ is the sentence $A$.

$x \in A$

Moreover, any x contains a subset. In our example of sentence A (see figure), the verb 'threw' contains John, out, thrash. Whereas each element of the subset has a property that represent the distance between it and its head. The result I want is to sum all those distances to get the sentence total dependency length.

Examples of dependency trees with numbers of distance to sum

1

There are 1 best solutions below

0
On BEST ANSWER

Sentence $A$ is a sequence of $n$ words, where each word is a set of other words with the uniqueness property that if $w \in x \in A$ and $w \in y \in A$ then $x =y$. We can denote the $k^{\text{th}}$ word in $A$ as $A_k$. The sentence total dependency of a sentence $A$ can be denoted

$$\sum_{k=1}^n\sum_{w \in A_k} \phi(w),$$

where $\phi: A \mapsto \mathbb{N}$ satisfies $\phi(w) = d$, where $w=A_j$, the unique superset of $w$ is $A_k$, and $|j-k|=d$. More compactly, the sentence total dependency of a sentence $A$ can be denoted

$$\sum_{k=1}^n\sum_{w = A_j \in A_k} |j-k|$$