Similarity measurement for strings of letters

81 Views Asked by At

Let's say, I have 10 different groups, and each group has its own string sequence. So, it should be like :

G1 -> CHFAIEBD

G2 -> HCFJIGBD

G3 -> HCFAIJBD

G4 -> HFCIJEBD

G5 -> .....

G6 -> ....

The question is, is there a statistical test to say that the ordering in that 10 groups is similar ? I know it looks kinda similar, but I don't know how to prove it statistically. I really appreciate your comment on this.

1

There are 1 best solutions below

1
On

There are several algorithms and measures that can be used to quantify the difference/similarity between sequences. For short strings - as those reported in the question - using the Levenshtein distance could be a good choice. This is a simple measure of difference between two strings that computes the minimum number of single-character edits necessary to change one string into the other. Edits can be substitutions, insertions, or deletions. To apply it, you can also use one of the tools available online (for example, here).