Average distance from $k$ points

56 Views Asked by At

I have kind of strange problem which I don't understand how to approach.

Let's say we have a user $u$ on a squared map of size $A\cdot A$. The user can be placed on any square of the map. We then have $k$ interesting points uniformly distributed on the map, and we want to estimate the average distance from $u$ to any point closer to him than a certain area, which is given by a "precision" parameter. So let's say that user is placed on a square $a_u$, then with $\lambda = 0$ the location of the user would be $a_u$, if instead $\lambda = 1$ then the user location can be anywhere in $([a_u - 1;a_u + 1],[a_u - 1;a_u + 1])$ (representing $x$ and $y$ coordinates).

So to determine the closest points w.r.t. the location of the user at a maximum distance $\delta$, we have to consider the location with possible uncertainty. In case the location is certain ($a_u$), then we need to compute the distance between each point in $([a_u - \delta ;a_u + \delta],[a_u - \delta ;a_u + \delta])$, if instead we have uncertainty due to $lambda > 0$, then we must estimate all the points in $([a_u - \lambda - \delta ;a_u + \lambda + \delta],[a_u - \lambda - \delta ;a_u + \lambda + \delta])$.

How to compute this? It seems to me that there should be some sort of probability to estimate the location of the user, then adding the uncertainty due to $\lambda$ and estimating an average distance between any possible location and any interesting point in $A\cdot A$. I'm looking for ideas.

1

There are 1 best solutions below

0
On

There are two things to consider when you no longer know the user's position $a_n$ for certain:

  • How to adjust the distance from each interesting point to most accurately predict the distance to the user's actual position, and
  • How to adjust the region to look for interesting points to most accurately predict the average distance from the user's actual position.

Consider the distance problem from the "interesting point" of view. The user's location has some probability distribution around $a_n$. If that distribution is symmetric - the user is equally likely to be on one side of $a_n$ as the other - then they balance: the expected distance from the point to the user is exactly the distance to $a_n$. So no adjustment is needed. The best estimate that can be made for the distance to the actual user is the distance to $a_n$.

And a similar thing happens with the region. If you look at a larger region, you are going to pick up more points. These extra points are futher away from the user than would normally be included, so the result is they will increase the average distance. This is not going to be a good estimate of the calculation you would make if you knew the user's exact location. It will generally be too high. You do not want to enlarge the region where you are looking for interesting points.

Instead consider the difference between the $\delta$-regions around $a_n$ and around the user's actual location. These two regions will generally overlap (assuming $\lambda$ is not large), and since they are the same size, the area in each region not included in the overlap will be the same. Further, the average distance of these two areas from their respective centers, $a_n$ and the actual location, will be the same. Thus the expected contribution of any interesting points to the average distance will be the same.

In fact, since the interesting points are uniformly distributed, the expected average distance from interesting points inside any square of side length $2\delta$ to its center will be constant for all squares fully on the map. You can calculate it by integration - though the answer depends on how you are measuring distance. You didn't say, but it sounds like you are using a taxicab metric, where total distance is the sum of the vertical and horizontal distances: $d((x_1,y_1), (x_2, y_2)) = |x_1 - x_2| + |y_1 - y_2$. If so, the expected average distance is just $\delta$.


But of course there are a couple of weasellings in that analysis. It assumes the the user's location is always equally distributed in $a_n + (\pm \lambda, \pm \lambda)$. When you are in the center of the map, this will be true. But if you are near enough to an edge of the map, the $\delta$-region around $a_n$ or the actual location may extend beyond the map, making the key assumption of the above analysis false. This edge effect will cause the expected average distance to drop.