Bayesian Model with Hierarchical Structure

131 Views Asked by At

I asked this on Cross Validated but found no answers, so i will try here. I have a table with the following variables:

  • level of poisonous gas (radon) in a house

  • type of house (With a basement or without)

  • a county in Minnesota where the house is located (84 of those)

  • level of uranium in the soil in each of the counties.

Now I am supposed to do two things:

  • model this as a hierarchical bayesian network using directed acyclic graoph (DAG)
  • complete the network by specifying all probability distributions.

The information I am also given are as follows:

  • uranium is the source of radon
  • radon comes from the ground

So from what I understand to complete the first task I need to find out what is dependent on what. So:

level of radon in the house depends on what county the house is in, whether the house has a basement (is built into the ground), and how much uranium is in the soil for a given county

level of uranium in the soil depends on what county it is in

those are the only two I am sure of. Perhaps we also have that

type of the house depends on the level of uranium in the soil and type of the house depends on what county it is in? I came up with this diagram: (Forgive my terrible skills in powerpoint) enter image description here

And if this is correct (something I am not sure), how would I specify the prob. distributions? I am completely new to Bayesian stats, any sort of help is appreciated

1

There are 1 best solutions below

1
On BEST ANSWER

Let's write:

  • $R \equiv$ level of poisonous gas (radon) in a house
  • $B \equiv$ type of house (With a basement or without)
  • $C \equiv$ a county in Minnesota where the house is located (84 of those)
  • $U \equiv $ level of uranium in the soil in each of the counties.

Not required, but just for context: Note that $P$ is a positive random variable, so you might choose to model $P \sim $exponential$(\cdot)$. $B \sim$ Bernoulli$(\cdot)$ since it is binary. $C \sim$ Categorical since it can be one of 84 possible values, and $U$ is another positive variable, so you could use the exponential again (with a different parameter).

Now, we can use a DAG (Directed Acyclic Graph) to represent the joint distribution, i.e. the distribution of $(P, B , C ,U)$. In general, lets say we have random variables $(S_1, S_2, \dots, S_n)$ represented as a DAG, then the joint distribution would be:

$$ P(S_1, \dots, S_n) = \prod_{i=1}^n p(S_i| \text{parents}(S_i)) $$

The parents of $S_i$ are the nodes in the DAG that have an arrow pointing towards $S_i$. Now, back to the question, using the information given:

  • if U is the source of Radon (R), then $R$ is dependent on $U$. So we expect the distribution of $U$ to be dependent on $R$, in other words, we have an arrow from $U$ to $R$.
  • Radon comes from the ground, so i assume that $R$ is dependent on whether the house has a basement or not.. since it defines how close the house is to the ground, so we have an arrow from $B$ to $R$.
  • uranium in the ground is based on the county.. so an arrow from $C$ to $U$. Here we would also have an arrow from county to Radon, but i think since Radon is based on U, and U is based on $C$, we don't need this extra arrow.
  • We can also assume that house types are similar in the same counties, so we have an arrow from $C$ to $B$.

Note that there is some arbitrary choices here.. the structure of the DAG is largely up to the modeler.. In the end we have:

$$ P(C, U, B, R) = P(C) P(U|C) P(B|C) P(R|U,B) $$ enter image description here