Let $\mathcal{S}$ be the set of strings of length at most $100$, drawn from a finite alphabet $A$; for any $s \in \mathcal{S}$, $s = a_1 , . . . , a_{100}$ where each $a_j \in A$. Let $\mathcal{K} : \mathcal{S} × \mathcal{S} \rightarrow \mathcal{R}$ be defined for $s_1 , s_2 \in \mathcal{S}$ by $\mathcal{K}(s_1, s_2)$ as: $\mathcal{K}(s_1, s_2)$ is the number of unique substrings that $s_1$ and $s_2$ have in common.
For example, $A = \{a, e, i, o, u\}, \ s_1 = aauuee, \ s_2 = auue$ implies that $\mathcal{K}(s_1 , s_2 ) = 9$ since the only common substrings are $a,\ u,\ e,\ uu,\ ue,\ uue,\ au,\ auu,\ auue$. I have to prove that $\mathcal{K}$ is a valid kernel.
I did the following:
We need a $\phi : \mathcal{S} → \mathcal{R}^{n}$ for some finite $n$ such that $\mathcal{K}(s_1 , s_2 ) = \phi^{T} (s_1 )\phi(s_2 )$. It is easy to suggest such a $\phi$ for $n = |S|$ $=$ number of possible substrings of size upto $100$ composed of characters from $A$. Specifically, $n= \sum_{j=1}^{100}|A|^{j}$ . We will also consider the set of these strings $\mathcal{S}$ sorted in alphabetical order so that we know what the $i^{th}$ string in it refers to.
$\phi(\mathcal{S}) \in \mathcal{R}^n$ such that the $\phi_i(s) = 1$ if an only if $i^{th}$ substring is contained in $s$ and $\phi_i (s) = 0$ otherwise.
Let $\delta(i, s) = 1$ if and only if the $i^{th}$ string of $\mathcal{S}$ is a substring of $s$ and $\delta(i, s) = 0$ otherwise. It is thereafter easy to verify that $\sum_i \phi^T (s_1 )\phi(s_2 ) = \sum_i \delta(i, s_1 )\delta(i, s_2 ) = \mathcal{K}(s_1 , s_2 )$. The existence of such a $\phi(.)$ proves that $\mathcal{K}(s_1 , s_2 )$ is a kernel.
But how to do this using Mercer's theorem and Gram matrix? How to prove for the positive-semidefinite Gram matrix, without finding the actual $\phi(.)$?
NOTE: This is an instance of cross-posting. I have asked this same question on stats.SE.