Bayesian decision theory - Minimax risk in minimum-error-rate classification

674 Views Asked by At

Looking at the below condition for the minimax Bayes risk in minimum-error-rate classification (assuming the simple scenario where there are only 2 states of nature $\omega_{1}$ and $\omega_{2}$, and $x \in \mathbb{R^{d}}$), what could intuitively explain the symmetry (with regards to the conditional distributions/decision regions)?

$\int_{R_{1}}{p(x|\omega_{2})}dx = \int_{R_{2}}{p(x|\omega_{1})}dx$ where $R_{1}$ (resp. $R_{2}$) is the region of $\mathbb{R^{d}}$ consisting of the points $x$ classified as $\omega_{1}$ (resp. $\omega_{2}$).

Additionally, the author is assuming that the total risk as a function of $p(\omega_{1})$ is a differentiable function in order to justify the existence of the minimax. This clearly does not look that "obvious" to me...

N.B.: I'm reading Pattern Classification by Duda, Stork, Hart (2nd Edition). This looks like the least mediocre treatise I could find on this topic, although there clearly are quite a few hidden mathematical "approximations / shortcuts / assumptions" in the text. Is there a better, more rigorous way to become familiar with this topic, e.g. through a deep-dive in "raw" mathematical statistics (for which it seems a lot easier to find rigorous books)?

1

There are 1 best solutions below

0
On BEST ANSWER

Although I'm not sure what is happening in the book, I'll try to fill in the gaps. An important first fact is that:

If a Bayes classifier has constant risk, then it is minimax.

This fact is easy to prove. For each classifier, $\delta$, let $R_{\delta}$ denote its risk. Also, let $\delta^{*}$ be a Bayes classifier with respect to prior $p(\omega_1)$ and that has constant risk. For every classifier, $\delta$:

\begin{align*} \sup_{\omega}R_{\delta^*}(\omega) &\leq R_{\delta^*}(\omega_1)p(\omega_1) +R_{\delta^*}(\omega_2)p(\omega_2) & \text{constant risk} \\ &\leq R_{\delta}(\omega_1)p(\omega_1) +R_{\delta}(\omega_2)p(\omega_2) & \delta^* \text{ is Bayes} \\ &\leq \sup_{\omega}R_{\delta}(\omega) \end{align*} Since $\delta$ was arbitrary, $\delta^* $ is minimax.

Next, some guesswork is required. Consider the loss function $L(\delta,\omega)=\mathbb{I}(\delta \neq \omega)$. This is loss function is also called the $0/1$ loss, since it is $0$ if your classifier gets the correct label and $1$, otherwise.

If $L(\delta,\omega)=\mathbb{I}(\delta \neq \omega)$, then $R_{\delta}(\omega_i) = \mathbb{P}(\delta \neq \omega_i|\omega_i)$

Using your notation, \begin{align*} R_{\delta}(\omega_1) &= \mathbb{P}(\delta \neq \omega_1|\omega_1) \\ &= \mathbb{P}(R_2|\omega_1) \\ &= \int_{R_2}f(x|\omega_1)dx \end{align*} Similarly, $R_{\delta}(\omega_2)=\int_{R_1}f(x|\omega_2)dx$.

Now, you can obtain the symmetry by putting the facts togerther. If $\delta^*$ is a Bayes estimator, then you can prove that it is minimax by showing that it has constant risk. That is, if $R_{\delta^*}(\omega_1)=R_{\delta^*}(\omega_2)$. If you are using the $0/1$-loss then the condition, $\int_{R_2}f(x|\omega_1)dx=\int_{R_1}f(x|\omega_2)dx$ follows.