There is something about counting bitstrings and the format of the solutions I didn't really understand yet.
Given a bitstring problem that asks to elementarily count how many bitstrings of length 36 there are that contain exactly 5 $\{1\}$ in the first 10 posititions and the $\{10100101\}$ substring in the last 20 positions, finding it out is pretty straightforward.
Since the first 10 positions need to contain exactly 5 $\{1\}$, you find out the number of all the possible combinations containing maximum five ones:
$${10 \choose 5}$$
We have no information about the next 16 positions, so that is $2^{16}$ more combinations, and thus far we have:
$${10 \choose 5} \cdot 2^{16}$$
many combinations.
Now the substring:
there are $\{10100101\} \cdot 2^{12}$ bitstrings with no overlapping; $20 - 8 + 1 = 13 \rightarrow {13 \choose 1} \cdot 2^{12}$
there are $\{1010010100101\} \cdot 2^{7}$ bitstrings with single overlapping, same goes as you overlap the starting $1$ and the one at the end, so ${8 \choose 1} \cdot 2^{7}, {6 \choose 1} \cdot 2^{5}$
there are $2\cdot \{10100101\} \cdot 2^{4}$ bitstrings as the substring repeats itself once, so ${6 \choose 2} \cdot 2^{4}$
there are $\{101001010010100101\} \cdot 2^{2}$ bitstrings with double overlapping, and considering also all the other possibilities you have ${3 \choose 1} \cdot 2^{2}, {1 \choose 1} \cdot 2^{0}, {1 \choose 1} \cdot 2^{0}$
Since $\{10100101\} \cdot 2^{12}$ "spans" all the other combinations below and are thus included within it, shouldn't we remove all the overlapping and repeating substrings from the universe $\{10100101\} \cdot 2^{12}$? Given the number of substrings above, how should I then build the solution?
In order to calculate the number of binary strings of length $20$ containing the substring $10100101$ pretty much of all the hard work is already done. We just have to use the inclusion-exclusion principle and put all the intermediate results together.
We can check the result by applying another approach called the Goulden-Jackson Cluster Method.
According to the paper (p.7) the generating function $f(s)$ is \begin{align*} f(s)=\frac{1}{1-ds-\text{weight}(\mathcal{C})}\tag{1} \end{align*} with $d=|\mathcal{V}|=2$, the size of the alphabet and $\mathcal{C}$ the weight-numerator of bad words with \begin{align*} \text{weight}(\mathcal{C})=\text{weight}(\mathcal{C}[10100101]) \end{align*}
The coefficient $[s^n]$ in $f(s)$ gives the number of binary words of length $n$ which does not contain $10100101$. Since we want to count the number of words of length $20$ which do contain the word we take the generating function \begin{align*} \frac{1}{1-2s}=1+2s+4s^2+8s^3+\cdots \end{align*} which counts all binary words and subtract $f(s)$ from it.