How to efficiently create balanced KD-Trees from a static set of points

4.6k Views Asked by At

From Wikipedia, KD-Trees:

Alternative algorithms for building a balanced k-d tree presort the data prior to building the tree. They then maintain the order of the presort during tree construction and hence eliminate the costly step of finding the median at each level of subdivision. [..] This algorithm presorts n points in each of k dimensions

I fail to understand how that is actually done. Consider this example:

Lets say I have 5 points in an array.

0 = (4, 7)
1 = (2, 9)
2 = (5, 4)
3 = (3, 6)
4 = (2, 1) 

I suppose we now want to create 2 sorted arrays, one by X and one by Y ?

Sort them by X     Sort them by Y
1 = (2, 9)         4 = (2, 1) 
4 = (2, 1)         2 = (5, 4)
3 = (3, 6)         3 = (3, 6)
0 = (4, 7)         0 = (4, 7)
2 = (5, 4)         1 = (2, 9)

We start creating the KD-Tree. Lets split by X axis, at the first step. We select the median of the points, 3 = (3, 6), so the Left and right trees will like this:

Left  (first half)         
1 = (2, 9)   
4 = (2, 1)     

Right  (second half)
0 = (4, 7)      
2 = (5, 4)  

Now we want to sort the Left and the Right trees by Y axis. How are we supposed to make use of the points that we sorted previously by Y axis ?

The only solution in my eyes is to re-sort the Left and Right trees respectively, by Y axis. What is Wikipedia talking about ?

2

There are 2 best solutions below

1
On BEST ANSWER

If you want to make use of pre-sorted lists, the the idea is to use the fact that the median will be located at $i = floor(\frac{N}{2})$, where $N$ is the length of the list you're looking for.

The nice thing about splitting the list is that when we recurse, we can simply use the sub-lists from $[0, i)$ and $[i, N)$ to find the median again.

However, as you noticed, we need to be smart about splitting the two lists in the K-D tree. The general idea is if we pick a point $P = (x_0, y_0)$ to split with, we need to split both the $X$-sorted list with respect to $x_0$ and the $Y$ sorted list with respect to $x_0$. (We split both according to $x_0$, not one with $x_0$ and one with $y_0$)

In the given example,

0 = (4, 7)
1 = (2, 9)
2 = (5, 4)
3 = (3, 6)
4 = (2, 1) 

When we split with the point 4 = (2, 1), the

$X$ list splits as: (it splits at $x_0 = 2$)

1 = (2, 9)       
4 = (2, 1)
----------  
3 = (3, 6)
0 = (4, 7) 
2 = (5, 4)

Clearly, the two sub-lists are still sorted by $x$

the $Y$ list splits at $x_0 = 2$

4 = (2, 1) 
1 = (2, 9)
----------
2 = (5, 4)
3 = (3, 6)
0 = (4, 7)

Notice that the two subsections of the $Y$ are still sorted with respect to the y coordinate, but they were split along the x coordinate.

Implementing this is exactly like implementing the reverse of the merge operation from merge sort.

Now that you have the $X$ and $Y$ sublists for the recursion, you can continue building the K-D tree.

17
On

I think you're confusing a couple of things here:

  1. When you split along an axis, you take a plane perpendicular to that axis and choose points that are "in front of" and "behind" this plane. While you're correct in saying that when you divide along the X axis you can simply look at the $x_0$ from the point $P(x_0, y_0)$, in general, you would like to actually create the equation of a plane and check which side the point is to the plane. Example - point position relative to plane

  2. You split points into buckets in a step-by-step process, one plane at a time. You're trying to do both X and Y at the same time - don't. Do it one at a time, building up one level of the tree for one plane.

So, in your case, first take

0 = (4, 7)
1 = (2, 9)
2 = (5, 4)
3 = (3, 6)
4 = (2, 1) 

Now, split along 4 = (2, 1) as you suggested, giving

Left of X = 2         
   1 = (2, 9)   
   3 = (3, 6)     
Right of X = 2
   0 = (4, 7)      
   2 = (5, 4)

Now that you have the left and right halves, we need to find a division along the Y axis for both the X axis halves (the divisions can be the same, or they can be different for the left and right subtrees of the X axis)

For the left subtree

1 = (2, 9)   
3 = (3, 6)   

We could pick the median of the Y $y_0 = 6$ from $(3, 6)$

Similarly, for the right subtree

0 = (4, 7)      
2 = (5, 4)

We pick the median of Y being $y_0 = 4$ from $(5, 4)$

Hence, the resulting tree will be:

Left of X = 2
    Left of Y = 6
        3 = (3, 6)     
    Right of Y = 6
        1 = (2, 9)   

Right of X = 2
    Left of Y = 4
        2 = (5, 4)
    Right of Y = 4
        0 = (4, 7)