I have some difficulties using the Bayes rule and the calculations used for estimation of a given class in a classification task if we know the distribution (and its parameters) of the feature vector:
Let $Y$ be a random variable which can take 3 possible values (classes) $i=A, B, C$ Let $X$ be a feature vector consisting of 3 components/dimensions $x=[x1, x2, x3]$ and we know that $X$ is derived from a multivariate beta distribution (a Dirichlet distribution) which also means that each vector component is a continuous variable.
I know that given the bayesian classification rule we need to find $Y_i$ for which the $P(y_i|x)$ is the maximum value. So, in the Bayes theorem we get: $P(y_i|x) = \frac{p(x|y_i)*P(y_i)}{P(x)}, i=A,B,C$ Since we are looking for the max value, we can ignore the denominator, because it does not change. Now, what I can't understand is: how to find $p(x|y_i)$ by using the information that $x$ has been drawn from a Dirichlet distribution (we assume that we know the parameters of the distribution). I don't understand how to do the calculations in order to estimate $P(y_i|x)$ by using the probability density function (pdf) as given here: https://en.wikipedia.org/wiki/Dirichlet_distribution. $P(y_i)$ is equal for each $i$, that is our prior is uniform, so it can also be ignored.