Minimax risk algebra in Pattern Classification by Duda, et. al.

104 Views Asked by At

I'm currently working through Pattern Classification by Duda, Stork, and Hart, and I've reached a small roadblock I was hoping to find some help with. In Section 2.3, page 11, the authors convert the standard total risk equation into the minimax risk equation. I think I understand the motivation (since the minimax risk must by definition be independent of the prior probabilities, we write the total risk in terms of one prior using the fact that $P(\omega_1) = 1 - P(\omega_2)$, then set the coefficient of the prior equal to zero), but despite spending some time and notebook paper on it, I can't follow the algebra of this following transformation. Note that $R_1, R_2$ are the regions for which the classifier in question predicts class $1$ and class $2$ respectively, denoted as $\omega_1, \omega_2$. $$R = \int\limits_{R_1} [\lambda_{11}P(\omega_1)p(\textbf{x}|\omega_1) + \lambda_{12}P(\omega_2)p(\textbf{x}|\omega_2)] \ \mathrm{d}\textbf{x} \quad + \\ \int\limits_{R_2} [\lambda_{21}P(\omega_1)p(\textbf{x}|\omega_1) + \lambda_{22}P(\omega_2)p(\textbf{x}|\omega_2)] \ \mathrm{d}\textbf{x}$$

becomes, with the book's provided identities of $P(\omega_2) = 1 - P(\omega_1)$ and $\int\limits_{R_1} p(\textbf{x}|\omega_1) \ \mathrm{d}\textbf{x} = 1 - \int\limits_{R_2} p(\textbf{x}|\omega_1) \ \mathrm{d}\textbf{x} $

$$R(P(\omega_1)) = \lambda_{22} + (\lambda_{12} - \lambda_{22})\int\limits_{R_1} p(\textbf{x}|\omega_2) \ \mathrm{d}\textbf{x}$$

$$+ \quad P(\omega_1)\bigg[(\lambda_{11} - \lambda_{22}) \ - (\lambda_{21} - \lambda_{11})\int\limits_{R_2} p(\textbf{x}|\omega_1) \ \mathrm{d}\textbf{x} \ - (\lambda_{12} - \lambda_{22})\int\limits_{R_1} p(\textbf{x}|\omega_2) \ \mathrm{d}\textbf{x} \bigg] $$

1

There are 1 best solutions below

2
On

The goal is to express the risk as a function of $P(w_1)$. We work term by term :

First term : Pull $P(\omega_1)$ out of the integral since it is independent of $\mathbf{x}$ so that it equals $P(\omega_1) \int\limits_{R_1} \lambda_{11}p(\textbf{x}|\omega_1) \ \mathrm{d}\textbf{x}$

Then use the fact $\int \limits_{R_1} p(\textbf {x}|\omega_1)\ \mathrm{d}\textbf{x} = 1 - \int \limits_{R_2} p(\textbf {x}|\omega_1)\ \mathrm{d}\textbf{x}$ so that it equals

$\lambda_{11} P(\omega_1) - P(\omega_1)\int \limits_{R_2} \lambda_{11} p(\textbf {x}|\omega_1)\ \mathrm{d}\textbf{x}$

Second Term : Use $P(\omega_2) = 1 - P(\omega_1)$ to get $\int \limits_{R_1}\lambda_{12}p(\textbf{x}|\omega_2)\ \mathrm{d}\textbf{x} - P(\omega_1)\int \limits_{R_1}\lambda_{12}p(\textbf{x}|\omega_2)\ \mathrm{d}\textbf{x}$

Third Term : Leave as is, so equal to $\int\limits_{R_2} \lambda_{21}P(\omega_1)p(\textbf{x}|\omega_1) \ \mathrm{d}\mathbb{x}$

Fourth Term : Same as the first term with 1 and 2 swapped. Then use $ P(\omega_2) = 1-P(\omega_1)$ to get :

$\lambda_{22} -\lambda_{22} P(\omega_1) - \int \limits_{R_1} \lambda_{22} p(\textbf {x}|\omega_2)\ \mathrm{d}\textbf{x} + P(\omega_1) \int \limits_{R_1} \lambda_{22} p(\textbf {x}|\omega_2)\ \mathrm{d}\textbf{x}$

Finally add everything up and pool together all constant terms and terms linear in $P(\omega_1)$ respectively to get :

$R = \lambda_{22} + (\lambda_{12} - \lambda_{22})\int\limits_{R_1} p(\textbf{x}|\omega_2) \ \mathrm{d}\textbf{x}+ \quad P(\omega_1)\bigg[(\lambda_{11} - \lambda_{22}) \ + (\lambda_{21} - \lambda_{11})\int\limits_{R_2} p(\textbf{x}|\omega_1) \ \mathrm{d}\textbf{x} \ - (\lambda_{12} - \lambda_{22})\int\limits_{R_1} p(\textbf{x}|\omega_2) \ \mathrm{d}\textbf{x} \bigg] $

Note : The OP has made a typo in the final form of $R$ in the second term, which should read as it is above in my answer. This is in agreement with the reference cited.