How to create a probability density function from a set of multivariate data

779 Views Asked by At

I am trying to create a simple implementation of the Bayes decision rule with minimum error criterion and I am running into a problem. Specifically, if I have a data set consisting of a number of feature vectors stored in rows, how can I generate a probability density function from this data?

Also, how can I do this if some of the data is discrete, some is continuous, and some is missing? For example, let us assume each feature vector, x, has three elements.

x = [ a, b, c]

where;

  • a is categorical data and will be an element of the set {0, 1, 2, 3}
  • b is continous data and will be in the range [0,1]
  • c is also continous data in the range [0,1], but may be missing for some feature vectors

I want to be able to calculate the likelihood of a feature vector, x, based on the total data set or given that x is from a subset, w, of the total data set.

p(x) = ? and p(x|w) = ?