Please refer the question below, is this a valid kernel? As per my understanding of String kernels, the similarity is in the count of similar strings and not their position, as that can be expressed as a dot product of two feature vectors
Valid Mercer's Kernel can be defined as below:-
$k(x,x^{'})= \phi^{T}(x)\phi(x^{'})$
Bioinformatics is a major application area of machine learning. One common type of biological data is DNA sequences, which are strings over an alphabet of 4 symbols, usually denoted $$\{A,C,T,G\}$$. Suppose I want to define a kernel function over such data for use in an SVM. I propose a kernel K that counts the number of position-wise matches between two DNA sequences. For instance, $K(ACTGG, ATCG) = 2$, and $K(AACTCG, ACCTGGA) = 4$. Prove whether or not K is a valid
I tried constructing with $L={A,C,T,G}$ alphabets and counting the number of times the said character appears so for e.g. $\phi(ACTGG) = 1, 1, 1, 2$ and $\phi(ATCG)=1,1,1,1$. However, when I take the dot product of feature vectors - $\phi^{T}(x)\phi(x^{'})$ that gives me total matches instead of position of matches.