Is it possible to define a metric over a set of elements $e=(x,y)$ where $x,y\in \{*,0,1\}$, $*$ being the wildcard symbol?
For simplicity, assume all words of length 2, i.e. $0*$, $11$ and $**$.
First try was to redefine Hamming distance $d$ from
the number of positions at which the corresponding symbols are different
to
the number of positions at which the corresponding symbols are contradictory
But then for example $d(0*,11)=1$ while $d(0*,**)+d(**,11)=0$, which contradicts the triangle inequality and therefore it is not a metric.
For later generalization: I would keep fixed-length words, but use a finite alphabet that is larger than just 2. The wildcard can replace any character and I want to capture somehow the notion of $e_1$ "contradicts" $e_2$.
More formally, I want to define "equality", that is $d(e_1,e_2)=0$ as "in all positions, the symbols are either equal, or at least one of them is $*$". If this is not possible, perhaps at least have that the distance between any two words where "in all positions, the symbols are either equal, or at least one of them is $*$" is always lower than the distance between any two words where this is not the case.
EDIT:
It was suggested to use $max_i(d(a_i,b_i))$ for the distance between two words $a,b$, where
- $d(a_i,b_i)=0$ iff $a_i=b_i$
- $d(a_i,b_i)=\frac{1}{2}$ iff $a_i\neq b_i \wedge(a_i= * \vee b_i=*)$
- $d(a_i,b_i)=1$ iff $a_i\neq b_i \wedge a_i\neq * \wedge b_i\neq *$
Following the intuition of an edit distance with intermediate wildcard, one could also say it is possible to either change a symbol directly at cost of $1$ or first change it to the wildcard for $\frac{1}{2}$ and then again from wildcard to the other symbol for another $\frac{1}{2}$.
Then $d(a,b)=\sum_i(d(a_i,b_i))$ seems nicer, but as pointed out, we end up with the problem that i.e. the distance between non-contradictory $***$ and $111$ is larger than the distance between contradictory $111$ and $110$.
Does this mean it is impossible to combine the symbol distances by summation?
At least not entirely:
- $d(a,b)=0$ iff $a=b$
- $d(a,b)=1$ iff $a\neq b \wedge (\exists i: a_i\neq b_i \wedge a_i\neq*\wedge b_i\neq*)$
- $d(a,b)=(\frac{1}{2}-\frac{1}{2n})+\sum_i^n \frac{1}{2n}[a_i=*\oplus b_i=*]$ else
As Rahul mentioned, this is not possible due to the triangle inequality.
This however should be possible.
We can define the metric on the set $\{0,1,*\}$ via $$ d(0,1)=1,d(0,*)=\frac12,d(1,*)=\frac12. $$
This can even be generalized to words of fixed length $n$: Let $w_x = x_1x_2\dots x_n$ and $w_y = y_1y_2\dots y_n$ be words. Then we can define $$ d'(w_x,w_y) = \max_{i=1,\dots n} d(x_i,y_i) $$ using the metric from above.
This does satisfy the requirements, and words $w_x,w_y$ are "equal" (in the sense that you formulated) if and only if $d'(x,y)\leq\frac12$.