How many substrings can be formed from a string of length n? BUT THE SUBSTRINGS LENGTH ARE LIMITED

980 Views Asked by At

string = "abcd" I'm looking for the formula which gives all the possible substrings but the substrings are limited in length.

For example all possible substrings for "abcd" are 4*(4+1)/2 = 10 But here I'm looking for all the possible substrings which have a max length of 2, the result is 7 but what is the formula ?

thanks a lot for your help

EDIT: I'm interested of the number of substring possible, in order, with a max length of 2, for example "abcd" => a, b, c, d, ab, bc, cd => 7 "1111" => 1, 1, 1, 1, 11, 11, 11 => 7

A comment from @lulu that makes my question more clear: I believe the OP is requiring the substring to consist of consecutive elements of the string. Furthermore, judging from the answer to my question about 1111, the OP isn't interested in the string itself...just in its start and end location

2

There are 2 best solutions below

0
On BEST ANSWER

To sum up the discussion in the comments: the reference to substrings is misleading as the OP is not interested in the characters that make up the substring, only in the possible locations of their first and last elements. Thus, the question is asking "given two natural numbers $m≤n$, how many pairs $(i,j)$ are possible with $1≤i<j≤n$ and $j-i≤m$?"

Let $F(n,m)$ denote the desired answer.

Example: for $m=2$, there are $n$ places that might be the first element of a length $1$ substring, and there are $n-1$ that might be the first element of a continuous length $2$ substring. Thus $$F(n,2)=n+(n-1)=2n-1$$

In general, there are $n-(k-1)$ places that might be the start of a continuous length $k$ substring. Thus the answer is $$F(n,m)=n+(n-1)+\cdots + (n-(m-1))=m\times n -\sum_{i=0}^{m-1} i=m\times n-T_{m-1}$$

Where $T_i$ denotes the $i^{th}$ Triangular Number. Thus $$T_i=\frac {i(i+1)}2$$

Examples:

$$F(n,2)=2n-T_1=2n-1$$ $$F(4,4)=4\times 4-T_3=16-6=10$$

5
On

For "abc" valid substrings of length $2$ or less are "a", "b", "c", "ab", "bc", so $5$

For "abcd" we have $7$, as noted in your question, for "abcde" we have $9$, and for "abcdef" we have $11$

From this pattern, we can guess that, for a string of length $m$, there will be $2m - 1$ substrings of length $2$ or less

If we want length $3$ or less, then for "abc", we have $6$, "abcd" we have $9$, for "abcde" we have $12$ and for "abcdef" we have $15$

Here we can guess that there will be $3m-3$ substrings of length $3$ or less in a string of length $m$

Similarly, for length $4$ or less, we have $4m-6$ substrings in a string of length $m$

So, we can see the first term is simply $nm$ and we consider the second term of each formula. We note that for a substring of length $1$, this term would be zero, and so we are looking at the sequence $0,1,3,6,\ldots$. We note that a formula for this sequence (found through Wolfram|Alpha) is $$\frac 12(n-1)n$$

Therefore, we propose that, for substrings of length $n$ or less in a string of length $m$, we have $$nm - \frac12(n-1)n$$

We can then check this by counting other strings and substrings