Prior BPN based on Multi Linear Regression Model Output and Monte Carlo Simulations

16 Views Asked by At

On page 286 in the Prediction of road accidents: A Bayesian hierarchical approach paper.

The passage describes the construction and parameter learning of Bayesian Probability Networks (BPNs), specifically focusing on the steps involved in creating a prior BPN and updating it to a posterior BPN:

  • Prior BPN Construction: The prior BPN is initially constructed using the outcomes of a regression analysis. The inference engine of Genie 2.0 is applied to construct the network and calculate the marginal probability distribution functions. Monte Carlo simulations are performed using estimated distributions of regression coefficients and covariance matrices of error terms to establish predictive probability density functions of response variables Y, which will be discretized into 48 states, and then fill the conditional probability tables of the prior BPN. Simulations are also used to extrapolate outcomes to areas without observations.

  • Posterior BPN Update: The EM algorithm updates the prior BPN to the posterior BPN. Observations from the risk-indicating variables and response variables are recorded in contingency tables using information from the development dataset. Parameter learning uses Bayesian inference and the EM algorithm, with an experience factor that gives almost no weight to prior information. Only domains of the prior BPN with available information in the development dataset are updated.


My question:

Do we use Monte Carlo simulations initially to fill the "probabilities" in the conditional probability tables (CPT) and establish a prior BPN (i.e., to find the frequencies of all combinations of the X values under the distribution of the Y response variable generated by Monte Carlo simulations), then use the dataset as evidence to update and learn further parameters and probabilities, resulting in a posterior BPN?

Or

Do we use the dataset from the beginning to fill the CPT with probabilities and learn parameters and probabilities by finding the frequencies of occurrences in the dataset from the beginning under the Y Distribution generated from Monte Carlo Simulations (i.e., find the frequencies of the X values combinations, and hence the Conditional Probabilities under the simulated Y to fill the initial CPT "probabilities" and cells)?

Your help is much appreciated!