I split de world in X random polygons.
Then I am given a coordinate C1, for instance (-21.45, 7.10), and I want to attribute the right polygon to this coordinate.
The first solution is to apply my ‘point_in_polygon’ algorithm (given a set of coordinates that defines a polygon and a coordinate that defines a point, tell me if the point is inside or not) on each polygon until I find the right one.
But that is very expensive if I have a lot of points to put in a lot of polygons.
An improvement on that relies on the following idea:
To optimise the search, I create a grid (a collection) with a step n, k where I already attribute each pair of coordinates such that:
for i=-180 to 180 step n
for j = -90 to 90 step k
grid.add(i,j)
Then I create a dictionary, and for each pair in the collection I find the corresponding polygon
For each g in grid
For each p in polygons
If point_in_polygon(g,p) == True
my_dict(g) = p
Then, when I receive C1, I look for the closest coordinate in my grid, let’s say g1.
Thanks to my_dict, I can get quickly p1 = my_dict(g1)
Then I compute point_in_polygon(C1, p1) which is likely to be true. If it’s not, I find the closest g which is assigned to a different polygon, and I redo a test. Etc. until I have found the right polygon.
Now, the question is: what are the optimal n, k to create the grid?
So that I can find the right polygon in the minimum number of steps. I don’t want it too low, because the search of the closest g which is assigned to a different polygon might be expensive. I don’t want it too high as well, because then I might be missing some polygons and then the search never converges.
I am not sure if this is a programming problem, a maths problem, or just something I can only(?) find empirically, that's why I ask it here.
Any inputs appreciated!
I guess it is somewhat standard to use a tree-like decision procedure. Using scalar multiplications, it is easy to check whether a point is on the "left" or "right" of a given line. If you cut the plane along a line (for simplicity, consider only lines that are prolonged polygon edges), the two half-maps will typically have less polygons than the full map (but some polygons may occur on both halves, so in worst case, both halves still have the same number of polygons). Pick a line for which the larger of the two polygon counts is as small as possible (as a tie-breaker, perhaps make the larger of the two edge counts as small as possible). Then recurse for the two halves until you arrive at cases with only one polygon remaining. Once you have prepared a decision tree this way, you can determine the polygon a point is in with a handful of scalar product computations.
Remark: The best method to pick a line to check against in each step may depend on the expected distribution of queries. The above tries to keep the worst case low, but if the query points are uniformly distributed over the map, you may prefer being as close as possible to halving the area instead of halving the number of polygons. Completely different methods may apply if queried points tend to be close to previously queried points.