I have trouble understanding an equation in a book I'm reading. Basically,
Consider a decision rule that divides the input space into regions $R_k$ called decision regions, one for each class, such that all points in $R_k$ are assigned to class $C_k$. A simple example is considered to explain how to find the optimal decision rule, a case with two classes. A mistake occurs when an input vector belonging to class $C_1$ is assigned to class $C_2$ or vice versa.
Now, the book says:
The probability of this occurring is given by:
$$ \begin{align*} p(mistake) &= p(x \in R_1,C_2) + p(x \in R_2, C_1) \\ &= \int_{R_1} p(x, C_2)dx + \int_{R_2} p(x, C_1) dx \end{align*} $$
I'm extremely confused and would appreciate any help; my questions:
- How can you integrate probabilities and assign the sum to a probability? (Doesn't that violate the primary condition that a probability value has to fall between 0 and 1?
- Even if I assume that the individual terms are probability density functions and that's what they're integrating to result in a probability; again; how can you guarantee that the sum of the two terms will fall between 0 and 1? [Since, $ 0 \le a \le 1$, $0 \le b \le 1$ does not guarantee $0 \le a + b \le 1$]
- If they are probability density functions (the two functions being integrated), how does integrating help in getting the $p(mistake)$?
The book goes further by saying:
We are free to choose the decision rule that assigns each point $x$ to one of the two classes. Clearly to minimize $p(mistake)$ we should arrange that each $x$ is assigned to whichever class has the smaller value of the integrand in the above equation. Thus, if $p(x, C_1) > p(x, C_2)$ for a given value of $x$, then we should assign that $x$ to class $C_1$.
Thank you so very much!
Notice that the regions $R_1$ and $R_2$ are disjoint. I assume that the function $p(x, y)$ represents a joint probability. In particular, a mixed discrete-continuous density. In this case, this means that $$\int_{-\infty}^\infty p(x, C1) + p(x, C2) dx = 1.$$
If $R_1 \cup R_2 = \mathbb R$, that is, they are disjoint and also make up the whole of $\mathbb R$, then there's no doubt that $$\int_{R_1} p(x, C2) dx + \int_{R_2} p(x, C1) dx \leq \int_{-\infty}^\infty p(x, C2) dx + \int_{-\infty}^\infty p(x, C1) dx = 1.$$
In this context, the function $f(x) = p(x, C1) + p(x, C2)$ represents the marginal distribution for $x$.