What is the distribution of distances between two random points in RGB space?

266 Views Asked by At

Suppose we pick pairs of triples from $\{ 0, 1, 2, \dots, 255\}^3$ with a uniform distribution and would like to find a closed form for the distribution of the Euclidean distances

$$ d((x_1, x_2, x_3), (y_1, y_2, y_3)) = \left( \sum_{j=1}^3 (x_j-y_j)^2 \right)^{1/2}.$$

The motivation for this question comes from the cubic lattice that is one of the RGB color spaces. It would be nice to have an exact probability distribution because it would likely have interesting applications in image processing.

This question gives the answer for when a cube includes all real numbers in it, and there's Mathai et al.$^\color{magenta}{\star}$, but what about when it is discrete, like for a cubic lattice?


$\color{magenta}{\star}$ Arak M. Mathai, Peter Moschopoulos, Giorgio Pederzoli, Distance between random points in a cube [PDF], Statistica, Volume 59, Number 1, 1999.

3

There are 3 best solutions below

9
On BEST ANSWER

Brute force computation works, with the following trick.

The square root expression for the distance $d$ between $(x_1, x_2, x_3), (y_1, y_2, y_3)$ is equivalent to

$$ d^2 = (x_1 - y_1)^2 + (x_2 - y_2)^2 + (x_3 - y_3)^2 $$

Each term on the RHS is a perfect square and is bounded by $256^2$. There are $256$ perfect squares less than $256^2$ (including $0$). We can trivially enumerate them. For each pair $(x,y), 0 \le x,y \le 255$ we can compute $(x-y)^2$ and build a frequency table $T$ containing entries $\langle (x-y)^2, \nu_{(x-y)^2}\rangle$ where $\nu_{(x-y)^2}$ is the frequency of $(x-y)^2$. We treat this table as a set and identify the entries in it by the pair $(s,\nu)$.

For each triple $((s_1, \nu_1), (s_2, \nu_2), (s_3, \nu_3)) \in T \times T \times T$, we compute the pair

$$ (s,\nu) = (s_1 + s_2 + s_3, \nu_1 \nu_2 \nu_3) $$

Call this result set $U$.

$U$ gives the distribution of the squares $s = d^2 \lt 256^2$. We should aggregate the frequency by $s$.

i.e.,

$$V = \{(s, \nu) : \nu = \sum_{(s^{\prime}, \nu^{\prime}) \in U} \nu^{\prime} \text{ where $s^{\prime} = s$ }\}$$

Since we are interested in the distribution of $d$, the positive square root of $d^2$ only, we use the same aggregate frequency. i.e., the map

$$ (s, \nu) \rightarrow (\sqrt{s}, \nu) $$

The following C# program implements the above:

public class Program
{
    public static IEnumerable<long> DiffOfSquares()
    {
        const int limit = 256;
        for(int i = 0; i < limit; i++)
        {
            for(int j = 0; j < limit; j++) {
                yield return (i - j)*(i - j);
            }
        }
    }

    public static IEnumerable<(long dsquared, long count)> SumOf3DiffOfSquares()
    {
        var q = 
            from dos in DiffOfSquares()
            group dos by dos into g
            select (dos: g.Key, count: g.Count());
        var diffOfSquares = q.ToArray();

        foreach(var t1 in diffOfSquares) {
            foreach(var t2 in diffOfSquares) {
                foreach(var t3 in diffOfSquares)
                {
                    yield return (
                      dsquared: t1.dos + t2.dos + t3.dos, 
                      count: t1.count * t2.count * t3.count
                    );
                }
            }
        }
    }

    public static void Calculate()
    {
        var q = 
            from entry in SumOf3DiffOfSquares()
            group entry by entry.dsquared into g
            select (
              dsquared: g.Key, 
              count: g.Select(x => x.count).Sum()
            );

        foreach(var entry in q)
        {
            Console.WriteLine(entry);
        }
    }

    public static void Main(string[] args)
    {
        Calculate();
    }
}
0
On

My code is ugly and brutish. I am not accepting my own answer. I am going to try to improve it though in the meantime.

Directly below gives a grid table in Mathematica for the frequency of distances that are nonnegative integers,$0,1,2...$ Finding the solution for integers is easier, because it does not involve filtering empty sets.

Insert[Join[{{"d","Frequency"}},
Table[{i,Total[Map[Length,Map[Permutations,Threaded[{256,256,256}]-
PowersRepresentations[i^2, 3, 2]]]
*Apply[Times,(Threaded[{256,256,256}]-PowersRepresentations[i^2, 3, 2]),{1}]]},
    {i,1,255}]]//Grid,Alignment->Left,2]

Directly below gives a theoretical grid table in Mathematica for the frequency of distances that are surds, like $\sqrt{2},\sqrt{3},\sqrt{5}...$. For the RGB color model, there should be at most $194634$ distances. Some distances cannot be written as a sum of three cubes. Ideally, the program would not even check those instead of having to filter. This program times out in WolframCloud though.

surdDistances= Table[{n+Floor[0.5+Sqrt[n]],PowersRepresentations[n+Floor[0.5+Sqrt[n]],3,2]},{n,1,194634}];
            
filteredSurdDistances= Select[surdDistances, Last[#] != {} &];

Insert[Join[{{"d","Frequency"}},
Table[{Sqrt[filteredSurdDistances[[i]][[1]]],
Total[Apply[Times,Threaded[{256,256,256}]-filteredSurdDistances[[i]][[2]],{1}]]},
{i,1,Length[filteredSurdDistances]}]]//Grid, Alignment->Left,2]
0
On

Let $X$ be the random variable you defined. For a natural number $n$, the probability $\mathbb P[X=\sqrt n]$ is going to be some multiple of the number of ways of writing $n$ as a sum of three squares, up to boundary effects. There exist relatively simple ways of counting the number of representations of $n$ as sums of two or four squares, but which rely on the prime factorization of $n$ which already makes them complicated. For three squares, I think it's fair to say that there's no closed form. For example if $n=4^a(8b+7)$ for positive integers $a,b$ then there are no representations as sums of three squares and consequently $\mathbb P[X=\sqrt n]=0$.

These difficulties, of course only exist if you insist on computing the PDF of $X$ in closed form. If you instead ask for $\mathbb P[X\leq t]$ as a function of $t$, you can get fairly good approximations. In fact, that is a three-dimensional version of the Gauss circle problem.