Let's consider a square lattice of cells. Each cell can be either occupied by a species (1 or 2) or be empty (0). Each cell can be either in state 1, 2 or 0.
In the pair approximation model, I would like to compute the clustering of the species, i.e. the clustering of the occupied cells ($+$). It is define as:
- $C_{++} = \frac{q_{+|+}}{\rho_{+}} = \frac{\rho_{++}}{\rho_{+}^2}$
Where
- $q_{+|+}$ is the conditional probability to find an occupied cell in the surrounding cells knowing that the focal cell is occupied
- $\rho_+$ is the density of occupied cells in the landscape, defines as: $\rho_+ = \rho_1 + \rho_2$
$\rho_{++}$ is the density of occupied cell pairs in the landscape, defines as: $\rho_{++} = \rho_{11} + \rho_{12} + \rho_{21} + \rho_{22}$
$q_{i|j} = \frac{\rho_{ij}}{\rho_{i}}$
One approach successfully describes the clustering but the second does not and I do not find why.
A first approach (Works)
$q_{+|+} = \frac{\rho_{++}}{\rho_{+}}$
$\rho_{++} = \rho_{11} + \rho_{12} + \rho_{21} + \rho_{22}$
Knowing that $\rho_{12} = \rho_{21}$:
- $q_{+|+} = \frac{\rho_{11} + 2\rho_{12} + \rho_{22}}{\rho_{1} + \rho_{2}}$
Hence:
\begin{align} C_{++} & = \frac{\rho_{11} + 2\rho_{12} + \rho_{22}}{\rho_{1} + \rho_{2}} \times \frac{1}{\rho_{+}} \\ & = \frac{\rho_{11} + 2\rho_{12} + \rho_{22}}{(\rho_{1} + \rho_{2})^2} \end{align}
This approach works, I have checked it by running simulation of cellular automata.
A second approach (Do not works)
Let's define $q_{+|+}$.
It is the probability for one cell of state $1$ to be surrounded by occupied cells + the probability for one cell of state $2$ to be surrounded by occupied cells. It means that:
$$ q_{+|+} = q_{+|1} + q_{+|2} $$
and q_{+|1} and q_{+|2} can be defined as:
- $q_{+|1} = q_{1|1} + q_{2|1}$
- $q_{+|2} = q_{1|2} + q_{2|2}$
So:
\begin{align} q_{+|+} & = q_{1|1} + q_{2|1} + q_{1|2} + q_{2|2} \\ & = \frac{\rho_{11}}{\rho_{1}} \frac{\rho_{12}}{\rho_{1}} + \frac{\rho_{21}}{\rho_{2}} + \frac{\rho_{22}}{\rho_{2}} \\ & = \frac{\rho_{11} + \rho_{12}}{\rho_{1}} + \frac{\rho_{21} + \rho_{22}}{\rho_{2}} \\ & = \frac{\rho_{2}(\rho_{11} + \rho_{12})}{\rho_{1}\rho_{2}} + \frac{\rho_{1}(\rho_{21} + \rho_{22})}{\rho_{1}\rho_{2}} \\ & = \frac{\rho_{2}(\rho_{11} + \rho_{12}) + \rho_{1}(\rho_{21} + \rho_{22}) }{\rho_{1}\rho_{2}} \\ \end{align}
Obviously this formulation of $q_{+|+}$ is different for the first one. In simulation, this formulation gives $C_{++}$ twice higher than with a first approach.
I do not know what is the mistake in this approach but I guess that the following assertion is false:
$$q_{+|+} = q_{1|1} + q_{2|1} + q_{1|2} + q_{2|2}$$
I would be very helpful for me to know why.
Indeed, your assertion is wrong:
$$q_{+|+} = \frac{\rho_{++}}{\rho_{+}} = \frac{\rho_{11} + \rho_{12} + \rho_{21} + \rho_{22}}{\rho_{1} + \rho_{2}} \ne \frac{\rho_{11} + \rho_{12}}{\rho_{1}} + \frac{\rho_{21} + \rho_{22}}{\rho_{2}} = \frac{\rho_{+1}}{\rho_{1}} + \frac{\rho_{+2}}{\rho_{2}} = q_{+|1} + q_{+|2}. $$
In particular, if $\rho_{1} = \rho_{2}$, then $\rho_{1} + \rho_{2} = 2\rho_{1} = 2\rho_{2}$, and so the left hand side will work out to exactly $\frac12$ times the right hand side.
(The same will also happen if $q_{+|1} = q_{+|2}$; in particular, that means that the assertion will always be off by exactly a factor of $\frac12$ for well mixed systems, where $q_{a|b} = \rho_a$ for all states $a$ and $b$. On the other extreme, if the states 1 and 2 are highly clustered such that $\rho_{12} = \rho_{21} \approx 0$, you can basically have $q_{+|1}$ and $q_{+|2}$ take any arbitrary values independently of each other and have $q_{+|+}$ be anywhere in between them, depending on the ratio of $\rho_1$ and $\rho_2$.)
The source of your confusion seems to be a basic misunderstanding of conditional probabilities. In particular, while it's true that $\mathrm{Pr}[A \text{ or } B \mid C] = \mathrm{Pr}[A \mid C] + \mathrm{Pr}[B \mid C]$ whenever $A$ and $B$ are mutually exclusive events, this additivity only holds when the probabilities are conditioned on the same event $C$. In deriving your incorrect formula, you've basically asserted that $\mathrm{Pr}[C \mid A \text{ or } B] = \mathrm{Pr}[C \mid A] + \mathrm{Pr}[C \mid B]$, which does not hold except in degenerate cases.