I'm working on a project related to making broadband available in rural areas. As part of this project, I'm looking at the shortest distances between connected properties and those that are not connected. Consider the plot below: image show scatter with 25 white dots and 5 black dots
If the black dots represent connected properties and the white dot unconnected properties, I want to find the shortest tree so that all of the white properties are connected, directly or indirectly, to a black property. To do this I have written a method that finds the closest white dot to a black dot, connects those two dots, changes the white dot to a black dot and then finds the nearest remaining white dot to the larger set of black dots. This procedure repeats until all dots are coloured black:
Does this method find the optimal path for connecting dots in all cases? How can I prove that it is the optimal method? If this is a common problem I'd appreciate a steer to let me know what terminology is used in the area.
The code below is the metod implemented in R. It's not optimal in terms of efficiency, but I hope by understanding the problem better I'll be able to improve the implementation.
library(tidyverse)
library(SearchTrees)
library(sp)
# generate random white dots and less random black dots
set.seed(1)
x <- runif(25, 1,100) %>% as.integer()
y <- runif(25, 1,100) %>% as.integer()
white <- cbind.data.frame(x,y)
x <- c(28,12,25, 80, 60)
y <- c(80,63,25, 30, 60)
black <- cbind.data.frame(x,y)
p <- ggplot(white) +
geom_point(aes(x = x, y = y), color = "white") +
geom_point(data = black, aes(x = x, y = y), color = "black") +
theme_gray()
p
# Calculate nearest neighbours based on starting conditions, then each iteration of updated conditions
lines <- list()
for(i in 1:nrow(white)){
tree <- createTree(coordinates(black))
inds <- knnLookup(tree, newdat=coordinates(white), k=1)
distList <- list()
for(j in 1:nrow(white)){
dist <- (white[j,1] - black[inds[j],1])**2 + (white[j,2] - black[inds[j],2])**2 # don't need to take root as it won't change the order
distList[[i]] <- dist
}
# Add distances to dataframe and select closest
white$distNN <- unlist(distList)
closest <- white[white$distNN == min(white$distNN),] %>% head(1) %>% rownames() %>% as.integer()
# record coordinates for line between white point and closest black point
lines[[i]] <- cbind(white[closest,1:2],black[inds[closest],1:2])
# move 'cloest' white point from white list to black list
black <- rbind(black, white[closest,1:2])
white <- white %>% slice(-closest)
}
network <- as.data.frame(do.call(rbind, lines))
network$group <- 1:nrow(network)
network2 <- rbind(network[,c(1,2,5)], network[,c(3,4,5)])
# plot connected dots
ggplot() +
geom_line(data = network2, aes(x = x, y = y, group = group)) +
geom_point(data = black, aes(x = x, y = y), color = "black") +
theme_gray()
Not a solution, but too long for a comment.
I'm not sure I understand your description. It seems to come to an end after you have connected the first four white dots, but at that point, there must still be many unconnected black dots.
If you mean to say that you continue until all the black dots are connected, then you almost surely don't have an optimum method, because there is a path between every pair of dots. If there is a path of length more than $1$ between two of the (originally) white dots, then one of the links in this path is extraneous, since if we delete it, each balk dot is still connected to a white dot.
The problem you are considering is related to finding a minimum-weight spanning tree, and there are well-known algorithms, such as Prim's algorithm, and Kruskal's algorithm to solve it. However, it doesn't seem that you want a tree, in which there is a path between every pair of vertices.
I would say you are looking for a minimum-weight spanning forest, in which each component contains one of the white vertices. That is, you want four trees, each containing one of the white dots, such that the total weight of the edges is a minimum.
My first stab at it would be to first, construct a minimum spanning tree. There is a uniques path between every pair of white vertices. On each such path that has more than one link, delete the edge of maximum weight.
This needs some work, because it is possible that a white dot is on the path between two other white dots, so we can't exactly carry out the program described in the previous paragraph, and even if we can, it's not clear that it would result it an optimal solution.