Testing for multiple diseases with dependent prevalences - extending bayes rule

75 Views Asked by At

A group of people is tested for multiple (8) different diseases $\{A,B,C,D,E,F,G,H\}$ with 8 tests, one for each disease.

First, my Notation:

$A$ = the event of having disease $A$

$\overline{A}$ = the event of not having disease $A$

$\delta_A$ = the event of the test for disease $A$ being positive (= the test says the person has disease $A$)

$\overline{\delta_A}$ = the event of the test for disease $A$ being negative (= the test says the person does not have disease $A$)

In the scientific literature, I found estimates for

1) the prevalences of the diseases $(P(A), P(B), P(C), ...)$

2) estimates of the pairwise dependencies of those diseases $(P(A \cap B), P(A \cap C), P(B \cap C),...)$

3) the sensitivity (given a person has disease A, what is the probability of correctly diagnosing disease A) and the specificity (given a person has not disease A, what is the probability of correctly diagnosing that the person has not disease A) of each of the tests $(P(\delta_A|A), P(\overline{\delta_A}|\overline{A}), P(\delta_B|B), P(\overline{\delta_B}|\overline{B}), ...)$ All of those are quite high, e.g. 0.95 or higher. Also, the tests are independent!

For example,

$P(A \cap \delta_A \cap B \cap \overline{\delta_B} \cap C \cap \delta_C \cap \overline{D} \cap \overline{\delta_D} \cap E \cap \delta_E \cap F \cap \overline{\delta_F} \cap G \cap \delta_G \cap \overline{H} \cap \overline{\delta_H})$

is the Probability of having disease A and being correctly diagnosed as A, also having disease B but beeing wrongly diagnosed as not having disease B, having disease C with a correct diagnosis of disease C, etc.

As you can see, there are $2^{16} = 65536$ possible combinations of those events.

I have two questions

First question: What is the probability of beeing diagnosed as having none of the diseases?

To make this happen, it is necessary, that each of the tests has a negative result.

This probability is the sum of the probability of having no disease and beeing correctly diagnosed as having none of the diseases, and having at least one disease but being incorrectly diagnosed as negative for all diseases you have:

$P(diagnosed \; as \; negative \; for \; all) = P(diagnosed \; as \; negative \; for \; all \cap having \; no \; disease) + P(diagnosed \; as \; negative \; for \; all \cap having \; at \; least\; one \; disease)$

To estimate this probability, I think it is reasonable to make some assumptions.

1) This probability is dominated by the first part of the sum because of the high specificities and sensitivities of the tests.

2) The probability of having at least 2 diseases and being incorrectly diagnosed as healthy for both of them is so small, that it is okay to disregard it. This follows from the fact, that the specificities are that high (> 0.95). So for example if someone has diseases A and B, and the specificity of the tests is $P(\overline{\delta_A}|\overline{A}) = 0.95, P(\overline{\delta_B}|\overline{B}) = 0.95$ the probability for both tests being incorrectly negative is $(1-0.95) \cdot (1-0.95) = 0.0025. $

Therefore, I only need to estimate the probabilities of 9 different combinations:

1) $P(\overline{A} \cap \overline{B} \cap \overline{C} \cap \overline{D} \cap \overline{E} \cap \overline{F} \cap \overline{G} \cap \overline{H} \cap \overline{\delta_A} \cap \overline{\delta_B} \cap \overline{\delta_C}...)$ (no disease and correctly diagnosed)

2) $P(A \cap \overline{B} \cap \overline{C} \cap \overline{D} \cap \overline{E} \cap \overline{F} \cap \overline{G} \cap \overline{H} \cap \overline{\delta_A} \cap \overline{\delta_B} \cap \overline{\delta_C}...)$ (disease A, but incorrectly diagnosed as healthy)

3) $P(\overline{A} \cap B \cap \overline{C} \cap \overline{D} \cap \overline{E} \cap \overline{F} \cap \overline{G} \cap \overline{H} \cap \overline{\delta_A} \cap \overline{\delta_B} \cap \overline{\delta_C}...)$ (disease B, but incorrectly diagnosed as healthy)

...

9) $P(\overline{A} \cap \overline{B} \cap \overline{C} \cap \overline{D} \cap \overline{E} \cap \overline{F} \cap \overline{G} \cap H \cap \overline{\delta_A} \cap \overline{\delta_B} \cap \overline{\delta_C}...)$ (disease H, but incorrectly diagnosed as healthy)

However, those probabilities can not be computed directly from the ones I have, because I only know the pairwise dependencies of the diseases (like $P(A \cap B)$), but not for example $P(A \cap B \cap C)$. At the same time, those probabilities can not contradict a bunch of equations, like for example $P(A \cap B \cap C) = P(A|B \cap C)\cdot P(B \cap C)$.

So I think this questions boils down to: How should I estimate those 9 probabilities from the knowledge I have?

I also posted this question in a purely mathematical form, without context, here Estimate $P(A \cap B \cap C)$ from $P(A \cap C), P(B \cap C), P(A \cap B)$

(The reason why I posted it 2 times is, because I think 1) It is an interesting question on its own (without the context from this post) 2) I think there might be another solution to this problem, so it is also helpfull for me if people tell me that this approach is not the best one. But if this kind of dublication is not welcome, I will of course delete one of the questions!)

Second question What is the probability of having at least one disease, given all tests are negative

This is equal to

$1-P(having \; no \; disease | all \; test \; are \; negative)$ =

$1-P(\overline{A} \cap \overline{B} \cap \overline{C} \cap \overline{D} \cap \overline{E} \cap \overline{F} \cap \overline{G} \cap \overline{H} |\overline{\delta_A} \cap \overline{\delta_B} \cap \overline{\delta_C} \cap \overline{\delta_D} \cap \overline{\delta_E} \cap \overline{\delta_F} \cap\overline{\delta_G} \cap \overline{\delta_H})$

I figured out, that for two diseases A and B, Bayes Rule extends to

$P(\overline{A} \cap \overline{B}|\overline{\delta_A} \cap \overline{\delta_B}) = \frac{P(\overline{\delta_A} \cap \overline{\delta_B}|\overline{A} \cap \overline{B})P(\overline{A} \cap \overline{B})}{P(\overline{\delta_A} \cap \overline{\delta_B}|\overline{A} \cap \overline{B})P(\overline{A} \cap \overline{B}) + P(\overline{\delta_A} \cap \overline{\delta_B}|\overline{A} \cap B)P(\overline{A} \cap B) + P(\overline{\delta_A} \cap \overline{\delta_B}|A \cap \overline{B})P(A \cap \overline{B}) + P(\overline{\delta_A} \cap \overline{\delta_B}|A \cap B)P(A \cap B)}$

As you see, for 8 diseases there would be $2^8 = 256$ terms in the denominator.

With the assumption of the tests beeing independent $P(\overline{\delta_A} \cap \overline{\delta_B}|\overline{A} \cap \overline{B}) = P(\overline{\delta_A}|\overline{A} \cap \overline{B})P(\overline{\delta_B}|\overline{A} \cap \overline{B})$, this simplifies to

$P(\overline{A} \cap \overline{B}|\overline{\delta_A} \cap \overline{\delta_B}) = \frac{P(\overline{\delta_A}|\overline{A} \cap \overline{B})P(\overline{\delta_B}|\overline{A} \cap \overline{B})P(\overline{A} \cap \overline{B})}{P(\overline{\delta_A}|\overline{A} \cap \overline{B})P(\overline{\delta_B}|\overline{A} \cap \overline{B})P(\overline{A} \cap \overline{B}) + P(\overline{\delta_A}|\overline{A} \cap B)P(\overline{\delta_B}|\overline{A} \cap B)P(\overline{A} \cap B) + ...}$

With the assumption that the test for one disease is independent of having a different disease $P(\overline{\delta_A}|\overline{A} \cap \overline{B}) = P(\overline{\delta_A}|\overline{A})$, this further simplifies to

$$P(\overline{A} \cap \overline{B}|\overline{\delta_A} \cap \overline{\delta_B}) = \frac{P(\overline{\delta_A}|\overline{A})P(\overline{\delta_B}|\overline{B})P(\overline{A} \cap \overline{B})}{P(\overline{\delta_A}|\overline{A})P(\overline{\delta_B}|\overline{B})P(\overline{A} \cap \overline{B}) + P(\overline{\delta_A}|\overline{A})P(\overline{\delta_B}|B)P(\overline{A} \cap B) + ...}$$

In this equation, all probabilities are known. However, in the case of 8 diseases, terms like $P(A \cap \overline{B} \cap C \cap D \cap E \cap F \cap G \cap \overline{H})$ enter the equation, and I get the same problem as in question one, where I have to estimate those probabilities from my limitied information. Additionally, i still have 256 terms in the denominator.

To be honest, I don't really know how to proceed with this question. It seems to be much more difficult than the first one. So any general idea or hint is very welcome!

A little remark: Of course, I do not expect a full solution from anyone. But it seems to me that this is a quite important question (If you are tested for a bunch of diseases, and the doctor tells you you are healthy, what is the probability of really having none of them?), but I did not find literature about it. So article or book tipps are also very welcome!

1

There are 1 best solutions below

0
On

I think I found a solution by myself:

https://math.stackexchange.com/a/3455809/728989

It includes estimating a latent multivariate normal distribution and sampling from it, to estimate probabilities like $P(A \cap B \cap C\ \cap D)$