Conditional probability and independence question

74 Views Asked by At

Problem:

Scientists are developing testing methods for a certain type of disease. They discovered that a certain genetic marker is associated with the disease: 0.5% of the general population (including those with and without the genetic marker) are afflicted with the disease, 0.1% of the general population have the genetic marker, and 20% of those with the genetic marker will eventually contract the disease. The researchers developed a test that is 95% accurate: the chance that an individual with the marker tests positive for the marker is 95%, and the chance that an individual without the marker tests negative for the marker is 95%.

Question - Suppose that whether someone will contract the disease is independent of whether the genetic test gives the correct result - the factors that affect whether the test is accurate for a given individual are different from those that affect whether or not the individual will contract the disease. Calculate the probability that someone will eventually contract the disease, given that he or she tests positive for the genetic marker.


This is supposed to be a standard conditional probability question, but the reason this question has me stumped is that I have difficulty understanding what it's telling me about the assumption it made.

So if I let:

D - event where one gets the disease,

M - event where one gets the genetic marker,

P - event that the test is positve

then exactly what events are independent here? The question says that:

"Suppose that whether someone will contract the disease is independent of whether the genetic test gives the correct result - the factors that affect whether the test is accurate for a given individual are different from those that affect whether or not the individual will contract the disease"

Does this mean that D is independent to M? Or is it that D is independent to P?

I first assumed that D is independent to P, but I didn't really understand how to solve for P( D | P ) given the assumption.

I would appreciate it if someone could clarify how to approach this problem using the 'independence' provided by the question.

1

There are 1 best solutions below

0
On BEST ANSWER

Let the sample space $X$ be the given population.

Define events $D,M,T$ by

    $D$ is the subset of $X$ consisting of those people who will eventually contract the disease.

    $M$ is the subset of $X$ consisting of those people who have the marker.

    $T$ is the subset of $X$ consisting of those people who test positive for the marker.

and let $D',M',T'$ denote the complements of $D,M,T$, respectively.

Our goal is to compute $P(D|T)$.

A Venn diagram will be helpful:

enter image description here

In the above diagram, the $7$ variables $$ d,m,t\\ dm,dt,mt\\ dmt $$ represent the respective probabilities for the corresponding regions.

Applying the given information, we get \begin{align*} &\text{---------------------------------------------------------------------------}\\[1pt] &P(D)=.005\\[3pt] &\!\!\!\implies\;d+dm+dt+dmt=.005\\[3pt] &\!\!\!\implies\;d+dm+dt+dmt=\frac{1}{200}\qquad(\text{eq}1)\\[1pt] &\text{---------------------------------------------------------------------------}\\[1pt] &P(M)=.001\\[3pt] &\!\!\!\implies\;m+dm+tm+dmt=.001\\[3pt] &\!\!\!\implies\;m+dm+tm+dmt=\frac{1}{1000}\qquad(\text{eq}2)\\[1pt] &\text{---------------------------------------------------------------------------}\\[1pt] &P(D|M)=.2\\[3pt] &\!\!\!\implies\;\frac{P(D\cap M)}{P(M)}=.2\\[3pt] &\!\!\!\implies\;\frac{dm+dmt}{.001}=.2\\[3pt] &\!\!\!\implies\;dm+dmt=\frac{1}{5000}\qquad(\text{eq}3)\\[1pt] &\text{---------------------------------------------------------------------------}\\[1pt] &P(T|M)=.95\\[3pt] &\!\!\!\implies\;\frac{P(T\cap M)}{P(M)}=.95\\[3pt] &\!\!\!\implies\;\frac{mt+dmt}{.001}=.95\\[3pt] &\!\!\!\implies\;mt+dmt=\frac{19}{20000}\qquad(\text{eq}4)\\[1pt] &\text{---------------------------------------------------------------------------}\\[1pt] &P(T'|M')=.95\\[3pt] &\!\!\!\implies\;\frac{P(T'\cap M')}{P(M')}=.95\\[3pt] &\!\!\!\implies\;\frac{1-(m+t+mt+dmt)}{.999}=.95\\[3pt] &\!\!\!\implies\;1-(m+t+mt+dmt)=(.95)(.999)\\[3pt] &\!\!\!\implies\;m+t+mt+dmt=\frac{1019}{20000}\qquad(\text{eq}5)\\[1pt] &\text{---------------------------------------------------------------------------}\\[1pt] &P\bigl(D|(T\cap M)\bigr)=P(D|M)\\[3pt] &\!\!\!\implies\;\frac{P\bigl(D\cap(T\cap M)\bigr)}{P(T\cap M)}=P(D|M)\\[3pt] &\!\!\!\implies\;\frac{dmt}{mt+dmt}=.2\\[3pt] &\!\!\!\implies\;dmt=(.2)(mt+dmt)\\[3pt] &\!\!\!\implies\;dmt=\Bigl(\frac{1}{5}\Bigr)(mt+dmt)\qquad(\text{eq}6)\\[1pt] &\text{---------------------------------------------------------------------------}\\[1pt] &P\bigl(D|(T'\cap M')\bigr)=P(D|M')\\[3pt] &\!\!\!\implies\;\frac{P\bigl(D\cap(T'\cap M')\bigr)}{P(T'\cap M')}=\frac{P(D\cap M')}{P(M')}\\[3pt] &\!\!\!\implies\;\frac{d}{1-(m+t+mt+dmt)}=\frac{d+dt}{.999}\\[3pt] &\!\!\!\implies\;\frac{d}{(.95)(.999)}=\frac{d+dt}{.999}\\[3pt] &\!\!\!\implies\;d=(.95)(d+dt)\\[3pt] &\!\!\!\implies\;d=\Bigl(\frac{19}{20}\Bigr)(d+dt)\qquad(\text{eq}7)\\[1pt] &\text{---------------------------------------------------------------------------}\\[1pt] \end{align*} Thus we have a system of $7$ linear equations in $7$ unknowns.

Solving the system yields $$ d=\frac{57}{12500},\;\;\;m=\frac{1}{25000},\;\;\;t=\frac{1249}{25000}\\ dm=\frac{1}{100000},\;\;\;dt=\frac{3}{12500},\;\;\;mt=\frac{19}{25000}\\ dmt=\frac{19}{100000} $$ hence $$ P(D|T) = \frac{P(D\cap T)}{P(T)} = \frac{dt+dmt}{t+dt+mt+dmt} = \frac{43}{5115} \approx .008406647116 $$