How many cells are expected?

282 Views Asked by At

Suppose in the beginning, one cell was present in a Petri dish, and nutrients for cell division were provided in abundance. It is given that the probability for a cell to divide is $\sin(t^\circ)$, where $t$ (in minutes) is the time elapsed since its last division. What is the expected number of cells in the dish after $t$ minutes? Assume the cells won't die, or, cease to exist.

A troubled friend of mine asked me this question. I became as troubled as him afterwards.

I limited the discussion to $0 \leq t < 90$ for simplicity's sake.

My approach

Denote $p_m$ as the probability of having $m$ cells after $t$ minutes.

Expected number of cells = $\sum^{\infty}_{m=1}mp_m$

Consider $m=1$. If there was still 1 cell in the dish after $t$ minutes, then that one cell must not have divided itself at the first place, so $p_1=1-\sin(t)$. Looks innocent enough.

Consider $m=2$. The only case is that the 1 cell initially present had divided into 2 daughter cells at some time $x$, where $x\leq t$, AND that both the 2 daughter cells did not further divide themselves from time $x$ to $t$. So the probability is $$p_2=\int_0^t \sin(x)[1-\sin(t-x)]^2 \text dx$$

Consider $m=3$. Now either one of the daughter cells must have divided itself. Say that this second division occurs $y$ minutes since the first division occured. Also taking the symmetry between the two daughter cells into consideration, the core of the integral is $$2\sin x \sin y [1-\sin(t-x)][1-\sin(t-x-y)]^2$$ enter image description here

My queries

  1. When $m=3$, from/to where should I integrate $\text d x$ and $\text d y$? Can I write something like the following, if not, how should I formulate $p_3$?

$$p_3 = \int_0^t\int_0^{t-t_1}\int_0^{t_1} 2\sin x \sin y [1-\sin(t-x)][1-\sin(t-x-y)]^2 \text dx \text dy \text dt_1$$

  1. The probability tree gets really complicated when $m \geq 4$, and the integral itself requires $m-1$ variables. Is there a much simpler way of thinking this problem? (Is it even logically sound to find the probabilities using integration...?)

  2. Can a computer program help predict the expected number of cells? What would the results be?

Any help is heartily appreciated... :(

3

There are 3 best solutions below

7
On

For every $t\geqslant0$, let $C(t)$ denote the set of cells present at time $t$, $A_x(t)$ the age of cell $x$ in population $C(t)$, and for every regular function $g$,

$$\mathbb F(g,t)=E\left(\sum_{x\in C(t)}g(A_x(t))\right)$$

For example, the unique cell in $C(0)$ has age $0$ at time $0$, hence $F(g,0)=g(0)$ for every $g$.

Each $x$ in $C(t)$ produces a new cell $y$ in $C(t+dt)$, of age $A_y(t+dt)$ which is of order $dt$, with probability $\kappa(A_x(t))dt+o(dt)$, where, by hypothesis, $$\kappa(a)=\cos(a)\,\mathbf 1_{[0,\pi/2]}(a)$$ Furthermore, for every cell $x$ in $C(t)$, $A_x(t+dt)=A_x(t)+dt+o(dt)$. Thus, $$\mathbb F(g,t+dt)=E\left(\sum_{x\in C(t)}g(A_x(t)+dt)+g(0)\kappa(A_x(t))dt\right)+o(dt)$$ which is also $$\mathbb F(g,t+dt)=E\left(\sum_{x\in C(t)}g(A_x(t))+\left[g'(A_x(t))+g(0)\kappa(A_x(t))\right]dt\right)+o(dt)$$ which yields

$$\partial_t\mathbb F(g,t)=\mathbb F(g'+g(0)\kappa,t)$$

Edit: Consider the function $\varsigma(a)=\sin a$, then $\varsigma(0)=0$, $\kappa(0)=1$, $\varsigma'=\kappa$ and $\kappa'=-\varsigma$ on $[0,\pi/2]$, hence , on $[0,\pi/2]$, $$\partial_t\mathbb F(\varsigma,t)=\mathbb F(\kappa,t)\qquad\partial_t\mathbb F(\kappa,t)=-\mathbb F(\varsigma,t)+\mathbb F(\kappa,t)$$ which implies $$\partial^2_{tt}\mathbb F(\kappa,t)=\partial_t\mathbb F(\kappa,t)-\mathbb F(\kappa,t)$$ Using the initial conditions $$\partial_t\mathbb F(\kappa,0)=\mathbb F(\kappa,0)=1$$ this shows that $$\mathbb F(\kappa,t)=e^{t/2}(\cos(\sqrt3t/2)+\sin(\sqrt3t/2)/\sqrt3)$$ or, equivalently, $$\mathbb F(\kappa,t)=\Re \frac{2e^{u^2t}}{u\sqrt3}\qquad u=e^{i\pi/6}=\frac{\sqrt3+i}2$$ Likewise, $\mathbb F(1,0)=1$ and $$\partial_t\mathbb F(1,t)=\mathbb F(\kappa,t)$$ hence, for every $t$ in $[0,\pi/2]$, using the identity $u^3=i$, $$\mathbb F(1,t)=1+\Re \frac{2(e^{u^2t}-1)}{u^3\sqrt3}=1+\frac2{\sqrt3}\Im e^{u^2t}$$ that is, to sum up, for every $t$ in $[0,\pi/2]$,

$$\#C(t)=1+\frac2{\sqrt3}e^{t/2}\sin\left(\frac{\sqrt3}2t\right)$$

1
On

Did's answer is testament to the fact that the question you are asking is not a simple one. Though the field of branching processes is well studied, your question does not fit into the textbook parts of the theory for (at least) two reasons:

  • This is an example of a continuous time branching process: i.e. at random points in time, a branch occurs. The theory of continuous time branching processes is much `thinner' than their discrete time counterpart: the most famous of which is the Galton-Watson Process.

  • Most analysis of continuous time processes assumes that the time between branches is exponentially distributed; this facilitates computations considerably, as it enables models to be re-cast in the context of a well known class of models (homogeneous continuous time Markov Processes); this is, for instance, the basic building block of standard models in queuing theory.

With this, and Did's answer, in mind: I will not try and solve the problem, but rather point out a couple of things in your calculations.

The important thing to note is that $F(t) = \sin(t)$ is the cumulative distribution function of the division time. That is, if $T$ denotes the random time after a cell appears that it splits, then

$$\mathbf{P}[T \leq t] = \sin(t).$$

To calculate $p_2$,let us introduce the notation: $T_0$ the time at which the intial cell splits,and $T_{11},\,T_{12}$ are the times at which the two children split; note that the variables $T_0, T_{11}-T_0$, and $T_{12}-T_0$ all are independently distributed according the distribution $F$. Then as you correctly identified

$$p_2 = \mathbf{P}[\{T_{0} <= t\}, \cap\, \{T_{11} >t\},\cap \, \{T_{12} > t\}]$$

we can then manipulate this using the [law of total probability], and conditioning on the specific value of $T_0$, that is

\begin{align*} p_2 &= \int_{0}^t \mathbf{P}[\{T_{0} <= t\}, \cap\, \{T_{11} >t\},\cap \, \{T_{12} > t\} \, | \, T_0 = x] \,\mathbf P[T_0 = x] dx \\ & = \int_{0}^t \mathbf{P}[\{T_{11} >t \},\cap \, \{T_{12} > t\} \, | \, T_0 = x] \,\mathbf P[T_0 = x]dx \\ & = \int_{0}^t \mathbf{P}[T_{11} >t \, | \, T_0 = x] \, \mathbf{P}[T_{12} >t \, | \, T_0 = x] \,\mathbf P[T_0 = x]dx \end{align*} where in the last line we relied on independence of $T_{11},T_{12}$ given $T_0 =x$. Then using the fact that $T_{11} - T_0$ has the distribution $F$ we have $$\mathbf P[T_{11} > t \, | T_0 = x] = \mathbf P[T_{11} - T_0 > t - x] = 1 - \mathbf P[T_{11} - T_0 \leq t - x] = F(t-x). $$ So the integral above becomes $$p_2 = \int_{0}^t \big(1 - F(t-x)\big)^2 \mathbf P[T_0 = x] d x = \int_{0}^t \big(1 - \sin(t-x)\big)^2 \mathbf P[T_0 = x] d x.$$

So this looks similar to the formula you gave, the important issue being: what is the correct formula for $P[T_0 = x]$? Strictly speaking, the value of this is $0$, since the probability that a continuous distribution is equal to any given value $x$ is $0$:

$$\mathbf P[ T_0 = x] = \lim_{y \downarrow x} \mathbf P[T_0 \in [x,y]] = \lim_{y \downarrow x} F(y) - F(x) = 0,$$

So in the above where we wrote $\mathbf P[T_0 = x]$ we should really have written $f(x)$, the probability density function of $T$ at $x$ $$f(x) = \frac{d}{dy}F(y)_{|y = x}.$$ The function $f$ has the property that for infinitesimally small $dx$ $$ \mathbf P[ x < T < x + dx] = f(x)dx,$$ it is this term that we were simplifying as $\mathbf P[T =x]$ above; in our particular case $$f(u) = \frac{d}{du} F(u) = \cos(u),$$ and returning to the computation of $p_2$ we have

$$p_2 = \int_{0}^t \cos(x) \big(1 - \sin(t-x) \big)^2 d x.$$

Hopefully that is helpful for your own understanding of probability, if not in helping you fully solve the problem itself.

2
On

Let $f(T)$ be the expected number of cells after $T$ time units have elapsed. Clearly $f(0) = 1$. If the cell does split, expected number of cells is $\int_0^T 2f(t)p(t)dt$, where probability density function $p(t) = \cos(t)$; the probability otherwise is $1 - \sin(T)$. Hence, we have $$f(T) = 1 - \sin(T) + \int_0^T 2f(t)\cos(t)dt$$ Substituting $g(T) = f(T)\cos(T)$, we have $$g(T) = \cos(T)\Bigl[1 - \sin(T) + 2\int_0^T g(t)dt\Bigr]$$ Differentiating both sides w.r.t. $T$, we have $$\begin{align} g'(T) &= \cos(T)[2g(T) - \cos(T)] - \sin(T)\Bigl[1 - \sin(T) + 2\int_0^T g(t)dt\Bigr] \\ &= \cos(T)[2g(T) - \cos(T)] - g(T)\tan(T) \\ &= -g(T)[\tan(T) - 2\cos(T)] - \cos^2(T) \end{align}$$ This is a first order differential equation. Solving, we have $$f(T) = \frac{1}{2} \Bigl[1 + e^{2\sin(T)}\Bigr]$$

Afterwards, we can run computer simulations over discrete number of small time intervals to verify our solution. Here is a sample code (C++).

#include <cmath>
#include <cstdio>
#include <random>
#include <vector>

const float epsilon = 1e-4;
std::default_random_engine gen;
std::uniform_real_distribution<float> dist(0.0, 1.0);

int trial(const float duration = M_PI / 2) {
    std::vector<float> v(1, 0);
    for (float t_e = 0.0; t_e < duration; t_e += epsilon) {
        int n = 0;
        for (float& x : v) {
            x += epsilon;
            if (cos(x) * epsilon > dist(gen)) x = 0.0, ++n;
        }
        v.resize(v.size() + n);
    }
    return v.size();
}

int main() {
    int N, S = 0; scanf("%d", &N);
    for (int t = 1; t <= N; ++t) {
        int a = trial();
        printf("T%03d: %02d\n", t, a);
        S += a;
    }
    printf("Mean: %f\n", 1.0 * S / N);
}

Note that $\mu \approx 4.2$ and $f(\pi / 2) = 4.1945$.