Counting similar pairs

83 Views Asked by At

I was given a simple programming assignment:

Your task is to quickly find the number of pairs of sentences that are at the word-level edit distance at most 1. Two sentences S1 and S2 they are at edit distance 1 if S1 can be transformed to S2 by: adding, removing or substituting a single word.

Since the text file was very large, I created a new text file with only the unique sentences and solved the solution on this file.

Now, say my program returns that $(s_1,s_2)$ are similir, and in the original file $s_i$ appear $l_i$ times.

What is the total number of similar pairs I had in the original file "induced" from the existence of $(s_1,s_2)$ ? is it just $l_1\cdot l_2 $ ?

There is also the need to add pairs that correspond to duplicates: If $s$ appeared $k$ times I add $${k \choose 2}=k(k-1)/2$$ (I ignore pairs with the same index,e.g if an element appeared once I add $1*(1-1)/2=0)$

Does that also seems right ?