I am investigating the properties of consensus protocols in distributed systems consisting of $N$ nodes. In this case consensus just means that every node in the system agrees on which server to use for communication. Every node sends a message to the agreed upon server in a fixed time interval $t$ at the same time. Whenever a node sends a message, there's a probability $p$ that the connection to the server has been closed and will then not be able to send messages. The probability $p$ is the same for all of the nodes in the system.
One of the nodes in the system is randomly selected to be a 'coordinator node' which always will have up to date information on how many nodes that are still connected to the server. Once a certain number of nodes has lost connection to the server, the coordinator will find a server that everyone in the system can connect to and start the process all over again.
The condition for the coordinator to find a new server for all of the nodes can be expressed as
$\text{Number of nodes still connected} \leq \alpha \cdot N$
where $0 \leq \alpha \leq 1$.
Now to my question. It feels like it should be possible to calculate the amount of coordinations done after a given amount of time intervals and a failure probability $p$.
Essentially what I need is a function that takes some amount of time intervals and gives the most likely amount of coordinations required during this time.
My first inclination is a more simulation-based approach, modelling the system as a simple Markov chain. This is nice, since it is easy to extend the model with more parameters later if needed (although I believe the situation here could also be solved or at least estimated analytically). Here's a simplistic description of how one might approach it.
Let $N_t$ be the number of nodes connected at time $t$. At every discrete time step, each node has a probability $p$ of disconnecting. So, the transition function looks like $$ P(N_t=n|N_{t-1}=s) = \begin{cases} \displaystyle \binom{s}{k}p^k(1-p)^k & \text{ if }n=s-k, k\in\mathbb{Z}_{\geq 0}\\ 0 & \text{ if } n>s \;\land\; s>\alpha N\\ 1 & \text{ if } n=N\;\land\;s\leq \alpha N \end{cases} $$ where the first case is the binomially distributed number of nodes after the time step, the second case captures that the number of nodes does not increase until a re-coordination happens, and the third case defines a coordination (it says that if $N_{t-1}$ is below your threshold, $N_t$ will be $N$ again; you may want to allow disconnects at the $t$th time step here too though if you wanted).
Then your transition matrix $M$ has entries $m_{ij}=P(N_t=j|N_{t-1}=i)$. Given an initial distribution $x_0$, you can compute the distribution at time $t$ as $x_t=x_0M^t$. Notice that $x_t$ is a vector with $i$th entry $x_t(i)$ giving the probability that $N_t=i$. (Not sure how well you know Markov chains, sorry).
Anyway, from this you can do many things (simulate the chain and count the number of time a coordination occurs, for instance).
More formally, let $$Q_t=\textbf{1}[N_t=N]$$ be 1 if a coordination occurs at time $t$ and zero otherwise. Then the total number of coordinations after $T$ time steps is a random variable: $$ C=\sum_{t=1}^T Q_t $$ I think you are interested in the expected value of $C$ (i.e. the number of coordinations you'd expect, given initial distribution $N$ and fixed $\alpha, T$), written $\mathbb{E}[C]$. You can estimate this by simulating the chain for $T$ time steps $m$ times, counting the number of coordinations $C_i$, and then computing $$ \widehat{C} = \frac{1}{m} \sum_i C_i $$
Apologies if I misconstrued anything in your post!