Find all the strings in $S$ that form $Q$ after deleting some of their characters

41 Views Asked by At

Given a set of Strings $S$ with a constant-size alphabet, find all the strings in $S$ that form the string $Q$ after deleting some of their characters.

If you are familiar with SQL it can also be asked as: what is the best algorithm to implement SQL like query with multiple % signs?

For example, below is a list of strings that much fly

  • Buterfly
  • falseWay
  • flyfly
  • officePlay

And here are some that don't

  • Fly
  • fl
  • yfl

What is the best algorithm for solving this problem?

* You can assume that $S$ is sorted or structured in any other way.

2

There are 2 best solutions below

4
On

One simple approach is to scan the string from the start looking for "f". If you find that, start from there looking for "l". If you find that, look for "y"

13
On

A simple solution

Assuming that each entry in $S$ contains at most one instance of the wanted string $Q$, you may do the following:

result; //Array that contains the strings that contain Q.
for each string s in S loop do:
count = 0; //Counter that keeps the current character of Q.
  for each character c in s loop do:
    if c==Q[counter] then:
      counter++;
  if counter==Q.length-1 then:
    result.add(s);
return result;

Evidently, the above algorithm has an $O(nm)$ complexity, where $n=\# S$ and $m=\max\{\#s\mid s∈S\}$ - the maximum length of each string in $S$.

A more complex solution

One may wish to allow for multiple occurences of $Q$ in an entry in $S$ and may also want to know the number of ways $Q$ appears in each string. For this purpose, we will need some kind of recursion:

function like(s,Q,i,j,count):
//count is the number of times Q appears in s.
//s is a string in which we seek to find $Q$.
//i is an integer indicating the current character of s.
//j is an integer indicating the current character of Q we are looking for.
if k>s.length-1 then:
  return;
for each k>i-1 loop do:
  if s[k]==Q[i] then:
    if i==Q.length-1 then:
      count++;
      return;
    else:
      like(s,Q,k+1,i+1,count);
return;

Then, we may search $S$ using:

for each string s in S loop do:
  count = zeroes; //Array containing zeroes, idexed by strings in S.
  like(s,Q,0,0,count[s]);

Intuitively, we follow a DFS strategy and loop within the string searching initially for the first character and searching for the first character of $Q$. If we find it, we start from the point we have found it and search for the next character of $Q$ and so on.

Evidently, this is a more expensive strategy which, however, finds all occurences of the string $Q$ in $S$.