Can Anyone Suggest a Continuous Generalization of Hamming Distance?

191 Views Asked by At

I'm an EE, so please forgive if this question is dumb.

I need a continuous generalization of Hamming distance. It needs to be both integrable and differentiable.

A quick Google Scholar search for "continuous hamming distance" (no quotes in search bar) did not yield anything relevant.

But I'm not a mathematician, so if there is a family of functions which formalizes the concept I don't even know what it might be called.

Perhaps the most obvious choice is to use MMSE:

        N-1
dist = sigma (x0 - x1)^2
        n=0

My question is twofold:

1.) Is there already a well known continuous valued generalization of Hamming distance?

2.) Has anyone rigorously studied functions of the form:

     N-1
y = sigma f(xn0, xn1)^p
     n=0

performed all the normal proofs to spell out when this function is meaningful, figured out the limit as n -> infinity, etc.

As I've said before, I'm not a mathematician, so if different tags should be used please make suggestions. Thank you so much!

1

There are 1 best solutions below

0
On BEST ANSWER

As requested, summary of comments.

I think a `natural' generalization would be

$$\tag{1} d_\epsilon(f,g)=\int dx \ H(|f(x)-g(x)|-\epsilon) $$

Where $H$ is the Heaviside step function. This yields the length over $x$ where $f$ and $g$ differ by more than $\epsilon$. $d$ in equation (1) does not vary smoothly as $f$ and $g$ vary, as a consequence of the `coarse graining'.

If a function other than Heaviside is desired, we can write down a generalization

$$\tag{2} d(f,g)=\int dx \ D(|f(x)-g(x)|) $$

Where $D$ is any increasing function with the properties

$$ D(0)=0 \\ \lim_{x \to \infty}D(x)=1 $$

I have chosen these properties so that heuristically and non-rigorously, $d$ behaves like the Hamming distance, ie.$d(f,f)=0$, $d(f,g)→L$ when $f$ and $g$ are `most different' and $L$ is the length of $x$ over which $f$ and $g$ are defined. $d$ in equation (2) yields a weighted length over which $f$ and $g$ differ. An example function based on the sigmoid would be

$$ D(x)=\tanh(x/\epsilon) $$

Where now $\epsilon$ sets the length scale. As $\epsilon \to 0$, $|f-g|$ must be smaller to be considered `the same'.

The continuous version of the RMSE you mention is

$$\tag{3} d(f-g)=\sqrt{\int dx \left(f(x)-g(x) \right)^2} $$

which is the $L^2$ norm of $f-g$. Such metrics are well studied, the keywords to search for are $L^p$ spaces.

If you want some way to quantify the difference between functions, then equation (3) is convenient because it is well known and all of the proofs have already been worked out, etc. If you specifically want a way to quantify the difference between functions that is to be the analog of the Hamming metric, then you'd want to work with things like equation (1) or (2), the difficulty is then showing precisely in which way these reduce to the Hamming metric and determining if they fulfill the usual nice properties for a metric.