A plague spreading on a graph

266 Views Asked by At

This is a problem I heard from a friend. I didn't figure it out, so I want to post it here because it's quite an interesting problem.

The island pictured below is made up of nine towns. One day, a plague arrives at $G$. Though the citizens do their best to contain the plague, each road between an infected and uninfected village has a $10\%$ chance every day to spread it to the uninfected one. How long do we expect it to take until the whole population is infected?"

enter image description here

My attempt:

Probability is not my strong suit, so I couldn't get very far with this problem. The naive way to go about it is to build a probability tree, but I feel like there has to be a simpler way to do with the degrees of the vertices. The problem I keep running into, though, is that you can easily calculate probabilities for the first day - 10% to each neighboring town, 20% overall, but past that you have too many possible infection maps to do any meaningful calculations.

I also just noticed that the last node to be infected (in terms of probability) will be the farthest from $G$, which is $A$ - 4 edges away. Because each day there is a 10% chance of one of those edges being infected, we have an average time of $10*4$=40 days to get to $A$. Is this a correct solution?

If not, what is the solution to this problem?

EDIT: The answers posted thus far have been amazing, but this problem was given to mathematically talented high school students so there has to be a simple solution (in terms of math used, not the argument itself). I'll be offering a bounty for such an answer when it's available.

2

There are 2 best solutions below

6
On

As been already mentioned here, one way to approach this problem is through Markov Chains. Given that solution heavily relies on connectivity graph, I doubt it is possible to get a closed form.

For your particular setup, you have a Markov Chain with $2^9 = 512$ states, each state might also be represented by a vector of $9$ numbers, being $1$ if corresponding town is infected and $0$ otherwise, also this vectors for convenience can be enumerated as decimals $0 - 511$.

Then you will have to build transition matrix, which is based on connectivity graph. I'll give an example on simplified version with $4$ towns, where $A \ \rightarrow \{B\}$, $B \rightarrow \{A, C, D\}$, $C \rightarrow \{B, D\}$, $D \rightarrow \{B, C\}$. Suppose you have state $\{0, 1, 1, 0\}$. Possible transitions from it are to:

  • {1, 1, 1, 0} - plague travels from $B$ to $A$, with probability $0.081$;
  • {0, 1, 1, 1} - plague travels from $B$ to $D$ or from $C$ to $D$, with probability $0.171$;
  • {1, 1, 1, 1} - both $A$ and $D$ get infested, probability $0.019$;
  • {0, 1, 1, 0} - no new infestations, probability $0.729$.

Having completed this for all possible $512$ states (obviously, states $\{0, 0, \ldots, 0\}$ and $\{1, 1, \ldots, 1\}$ are absorbing), you'll get transition matrix $M$, where $M[S_0, S_1]$ is transition probability from state $S_0$ to $S_1$.

After that, if $T \in \mathbb{Z}^+$ is the day when last town(s) get infested, $M^T[S_0, \{1, 1, \ldots, 1\}]$ gives you probability distribution of time $T$. $M^T$ is matrix $M$ multiplied by itself $T - 1$ times, and $S_0$ is your starting state. For any finite $T$ you'll have $M^T[S_0, \{1, 1, \ldots, 1\}] < 1$, but it converges to one fairly quickly.

I actually got interested and took some time to code this in Python. After $500$ days $M^{499}[S_0, \{1, 1, \ldots, 1\} > 1 - 10^{-15}$, pretty much enough, I would say. I got $E[T] \approx 34.554$ days, and probability density graph below. After $79$ days all towns would be infested with probability $> 0.99$.

Probability distribution f(T)

Interestingly enough, starting plague in the most remove town $A$ doesn't help much - $E[T]$ becomes $38.038$ days, only little more than $3$ extra days on average. On the other hand, if plague start in central town $E$, $E[T]$ becomes $24.814$ days, and $64$ days is enough to get all towns infested with probability $> 0.99$.

ADDITION: I've corrected the example on calculating transition probabilities and results and run simulations to check them. Based on $200000$ rounds for each case, I've got $E[T] = 34.539$ days when plague starts in $G$, and $E[T] = 38.063$ days when plague starts in $A$.

0
On

Another way to approach this, more accessible to high school students than Markov chains, is through simulation. Rather than try to solve the problem exactly, simulate the spread of infection many times on a computer and calculate the average time to infect all towns. Here's a little python script that does this.

import random
a,b,c,d,e,f,g,h,i = 'A B C D E F G H I'.split()
edges = {a:{b},b:{a,e},c:{e,f},d:{e,g},e:{b,c,d,h,i},
         f:{c,i},g:{d,h},h:{e,g,i},i:{e,f,h}}
for x,y in edges.items():
     for z in y: assert x in edges[z]
random.seed()
universe={a,b,c,d,e,f,g,h,i}
trials = 1000
average = 0 
for _ in range(trials):
     infected = {a}
     days = 0
     while infected != universe:
          days += 1
          roads = {(town,x) for town in infected for x in edges[town]}
          for road in roads:
               if random.random()<.1:
                    infected.add(road[1])
     average += days
print(average/trials)

I ran this once and got $37.927.$ There are several possible improvements. First, run more than $1000$ trials. Second, make a histogram of the results, instead of just computing the average. Third, you can compute the standard error, to compute a confidence interval for your estimate.

Since the size of the transition matrix grows exponentially with the number of towns, I would guess that if anything like this is really done in epidemiology, it is more likely done with simulation than Markov chains, although I'm sure the simulation would be much more sophisticated than this.

EDIT

With 100,000 trials, I consistently get an average of $38$, very different from fanvacoolt's answer. I can't see any mistake in my code, so I'm going to have to try the Markov chain approach -- but not tonight.

EDIT I wrote a script to compute the expected number of days until all towns are infected, using Markov chains and the fundamental matrix. It produced a result in perfect agreement with the simulations, so it must be correct. Here's the script:

import numpy as np
from itertools import combinations
from itertools import chain

a,b,c,d,e,f,g,h,i = 'A B C D E F G H I'.split()
edges = {a:{b},b:{a,e},c:{e,f},d:{e,g},e:{b,c,d,h,i},
         f:{c,i},g: {d,h},h:{e,g,i},i:{e,f,h}}
universe={a,b,c,d,e,f,g,h,i}
for x,y in edges.items():
    for z in y: assert x in edges[z]
weight={a:1,b:2,c:4,d:8,e:16,f:32,g:64,h:128,i:256}

def powerset(iterable):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = set(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

def index(s):
    "convert a set of towns to an array index"
    return sum(weight[town] for town in s)

def transitionMatrix():
    "construct and return the trnsition matrix"
    P=np.zeros((512,512)) # transition matrix
    for old in powerset(universe):
        idx = index(old)
        roads = {(town,x)  for town in old for x in edges[town] if x not in old}
        r = len(roads)
        for t in powerset(roads):
            k=len(t)
            prob = .1**k*.9**(r-k)
            spread = {x[1] for x in t}
            new = set(old) | set(spread)
            jdx=index(new)
            P[idx, jdx] += prob
    return P

P = transitionMatrix()
Q= P[1:511,1:511]  # transitions between transient states
del(P)
N = np.linalg.inv(np.eye(510)-Q)  #fundamental matrix
[email protected]((510))

def steps(s):
    "Average number of days until all twons infected starting from set s"
    return N[index(s)-1]  # row 0 was deleted from transition matrix

print(steps({'A'}))

Running this produces 38.0376337969. Even with only $1000$ trials, the simulation script consistently produces values close to $38$.

ONE LAST THING

If we consider only the states that are actually reachable starting with only town A infected, there are 51 states. Here is a python script that computes them.

from itertools import combinations
from itertools import chain

a,b,c,d,e,f,g,h,i = 'A B C D E F G H I'.split()
edges = {a:{b},b:{a,e},c:{e,f},d:{e,g},e:{b,c,d,h,i},
              f:{c,i},g:{d,h},h:{e,g,i},i:{e,f,h}}
universe={a,b,c,d,e,f,g,h,i}

def powerset(iterable):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = set(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

def possible(s):
    '''
    s is a possible state if it includes A and the induced subgraph is connected
    '''
    if a not in s: return False
    Q = {a}
    visited = set()
    while Q:
        x = Q.pop()
        visited.add(x)
        for y in edges[x]:
            if y in s and y not in visited:
                Q.add(y)
    return len(s)==len(visited)

possibilities = []
for s in powerset(universe):
    if possible(s): possibilities.append(s)

print(len(possibilities))
for p in possibilities:print(''.join(sorted(p)))