Longest Common Prefix among $n$ strings with length $k$

Question

Longest Common Prefix among $n$ strings with length $k$

314 Views Asked by Bumbble Comm At 13 Apr 2026 - 10:01

Suppose I have a set of alphabet $\Sigma = \{ A,B,\ldots,Z \}$ with $|\Sigma|=26$. I am generating a set of $n$ random strings $S$, with each string of length $k$ from the alphabet (for the random string, each letter in a position is picked with probability $p=\frac{1}{|\Sigma|}$).

I want to find out:

the expected length of longest common prefix
the maximum length of longest common prefix with high probability.

I am working on a radix-tree based algorithm and want to justify the expected length of search operation theoretically.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2016-05-12 20:49:08

Suppose we are selecting $n$ strings of length $k$ over an alphabet of $m$ letters. Classifying by the length $q$ of the longest common prefix we obtain for the total count of strings

$$m^k + \sum_{q=0}^{k-1} m^q (m^n - m) m^{n(k-1-q)}.$$

This simplifies to

$$m^k + (m^n - m) m^{n(k-1)} \sum_{q=0}^{k-1} m^q m^{-nq} \\ = m^k + (m^n - m) m^{n(k-1)} \sum_{q=0}^{k-1} m^{q(1-n)} \\ = m^k + (m^n - m) m^{n(k-1)} \frac{m^{(1-n)k}-1}{m^{1-n}-1} \\ = m^k + (m^n - m) m^{n(k-1)} \frac{m^{k-(k-1)n}-m^n}{m-m^n} \\ = m^k - m^{n(k-1)} \times (m^{k-(k-1)n}-m^n) \\ = m^k - m^k + m^{nk} = m^{nk}.$$

We have all of them and the sanity check goes through. The trick here was that we can view these $n$ strings as a rectangular array with $n$ rows and $k$ columns. Supposing that the first $q$ columns are the same the next column must not consist of $n$ identical letters or else the common prefix would extend to include it. That means we have $m^n-m$ choices for the letters in said column (there are $m$ strings of length $n$ containing a single letter.)

Now for the expectation we get

$$k m^k + \sum_{q=0}^{k-1} q m^q (m^n - m) m^{n(k-1-q)} \\ = k m^k + (m^n - m) m^{n(k-1)} \sum_{q=0}^{k-1} q m^q m^{-nq} \\ = k m^k + (m^n - m) m^{n(k-1)} \sum_{q=0}^{k-1} q m^{q(1-n)}.$$

Now we have

$$x\left(\sum_{p=0}^{k-1} x^p\right)' = \sum_{p=0}^{k-1} p x^p = x \left(\frac{x^k-1}{x-1}\right)' = x \frac{k x^{k-1}}{(x-1)} - x \frac{x^k-1}{(x-1)^2}.$$

Apply this to the sum to get two terms, the first of which is

$$k m^k + (m^n - m) m^{n(k-1)} m^{1-n} \frac{k m^{(1-n)(k-1)}}{m^{1-n}-1} \\ = k m^k + (m^n - m) m^{n(k-1)} m \frac{k m^{(1-n)(k-1)}}{m-m^n} \\ = k m^k - m^{n(k-1)} m k m^{(1-n)(k-1)} = 0.$$

The second is

$$- (m^n - m) m^{n(k-1)} m^{1-n} \frac{m^{(1-n)k}-1}{(m^{1-n}-1)^2} \\ = - (m^n - m) m^{n(k-1)} m^{n+1} \frac{m^{(1-n)k}-1}{(m-m^n)^2} \\ = m^{n(k-1)} m^{n+1} \frac{m^{(1-n)k}-1}{m-m^n} \\ = m^{nk} \frac{1-m^{(1-n)k}}{m^{n-1}-1}$$

Divide by $m^{nk}$ to get for the expected longest common prefix

$$\frac{1-m^{(1-n)k}}{m^{n-1}-1}.$$

The following Maple code was used to verify this formula.

Q :=
proc(n, k, m)
option remember;
    local d, ind, pref, pos, res, idx1, idx2;

    res := 0;

    for ind from m^(n*k) to 2*m^(n*k)-1 do
        d := convert(ind, base, m);

        for pref from 0 to k-1 do
            for pos from 1 to n-1 do
                idx1 := (pos-1)*k + pref;
                idx2 := pos*k + pref;

                if d[idx1+1] <> d[idx2+1] then
                    break;
                fi;
            od;

            if pos < n then
                break;
            fi;
        od;

        res := res + pref;
    od;

    res/m^(n*k);
end;

X := (n, k, m) -> (1-m^((1-n)*k))/(m^(n-1)-1);

Longest Common Prefix among $n$ strings with length $k$

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in COMPUTER-SCIENCE

Trending Questions

Popular # Hahtags

Popular Questions