I was given a simple programming assignment:
Your task is to quickly find the number of pairs of sentences that are at the word-level edit distance at most 1. Two sentences S1 and S2 they are at edit distance 1 if S1 can be transformed to S2 by: adding, removing or substituting a single word.
Since the text file was very large, I created a new text file with only the unique sentences and solved the solution on this file.
Now, say my program returns that $(s_1,s_2)$ are similir, and in the original file $s_i$ appear $l_i$ times.
What is the total number of similar pairs I had in the original file "induced" from the existence of $(s_1,s_2)$ ? is it just $l_1\cdot l_2 $ ?
There is also the need to add pairs that correspond to duplicates: If $s$ appeared $k$ times I add $${k \choose 2}=k(k-1)/2$$ (I ignore pairs with the same index,e.g if an element appeared once I add $1*(1-1)/2=0)$
Does that also seems right ?