Bayesian expected loss (Risk)

871 Views Asked by At

I'm reading the book "Pattern Classifications" from Duda, Hart & Stork. In chapter 2, they discuss about Bayesian expected loss (risk) but they never mention what happens when there's a rejection cost. This is what I think is correct, but I don't know:

$\omega_1, \omega_2, \dots, \omega_c$: states of nature.

$\alpha_1, \alpha_2, \dots, \alpha_a$: possible actions.

$\lambda(\alpha_i|\omega_j)=\lambda_{ij}$: loss when I decide $\alpha_i$ and the true state of nature is $\omega_j$

So, I observe a particular $x$ and choose action $\alpha_i$. If the true state of nature is $\omega_j$, I have a loss $\lambda(\alpha_i|\omega_j)$. The probability that the true state of nature is $\omega_j$ is $P(\omega_j|x)$.

So, the expected loss (risk) when I decide $\alpha_i$ is:

$$R(\alpha_i|x) = \sum_{j=1}^{c} \lambda(\alpha_i|\omega_j) P(\omega_j|x)$$

And the overall risk is:

$$R = \int R(\alpha(x)|x) p(x) dx$$

To minimize $R$, I have to choose the $\alpha(x)$ that minimizes $R(\alpha(x)|x)$ for all $x$.

In the case that I have $c$ classes, I have:

$$R(\alpha_1|x) = \lambda_{11}P(\omega_1|x)+\lambda_{12}P(\omega_2|x)+\cdots+\lambda_{1c}P(\omega_c|x) \\ R(\alpha_2|x) = \lambda_{21}P(\omega_1|x)+\lambda_{22}P(\omega_2|x)+\cdots+\lambda_{2c}P(\omega_c|x) \\ \vdots \\ R(\alpha_c|x) = \lambda_{c1}P(\omega_1|x)+\lambda_{c2}P(\omega_2|x)+\cdots+\lambda_{cc}P(\omega_c|x) $$

I have a cost of rejection $\lambda_r$. So now I say (tell me if this is correct):

I decide $\omega_1$ if $R(\alpha_1|x)$ < minimum$\{R(\alpha_2|x), \dots, R(\alpha_c|x) \} + \lambda_r$

I decide $\omega_2$ if $R(\alpha_2|x)$ < minimum$\{R(\alpha_1|x), R(\alpha_3|x), \dots, R(\alpha_c|x) \} + \lambda_r$

$\vdots$

I decide $\omega_c$ if $R(\alpha_1|x)$ < minimum$\{R(\alpha_1|x), \dots, R(\alpha_{c-1}|x) \} + \lambda_r$

Is it OK to put the $\lambda_r$ just like that?

2

There are 2 best solutions below

0
On

The short answer is no. The long answer is following.

Bayesian expected risk minimization

The first remark is opposite to my short answer: you may put the $\lambda_r$, but it will not be risk minimization problem anymore (non-Bayesian problem).

If I've understood you properly, you've found out that risk minimization is not enough for variety of applications, and want to give an ability for your system to say "I don't know what it is".

Loss units

You should keep the Physics in your mind: you cannot add meters with kilograms, so the loss should be expressed in same units as rejection cost.

Sometimes it's impossible: assume you've decided to transport chickens, but bad road may occur and in the case of truck destruction you will lose lives of innocent chickens and maybe driver.

Another example is the choice of Neo from the "Matrix: Reloaded" movie: if you remember, the Architect had proposed him to save his beloved Trinity or save the humanity (several of them, but nonetheless). We can say that "One Trinity is less than 7 citizens of Zion", but he decided to save Trinity and as much citizens as he could. His choice may be expressed as "Trinity rejection cost is $\infty$ [love points]" using your terminology, but what if we add, say, Morpheus here? He may have lower rejection cost than Trinity, but higher rejection cost than all other citizens gathered together ($\frac{\infty}{2}$?).

Further reading

In bibliographical and historical remarks to the second chapter of the book you can find valuable remarks regarding your question — where to read about non-Bayesian methods.

Below you can see the reference to Chao K. Chow and mention of reject rate — those references should help you.

One of the approaches I know is min-max. For example, you want to create an antivirus system and say "I want to catch as much viruses as I can, but I don't want it to recognize regular files as infected". Let's have two suffices: $v1$ for virus, $v2$ for another virus (a little generalization) and $r$ for regular files. You may create something like $$ \begin{cases} \sum\limits_{x \in X} q\left( \omega_r \; \middle| \; x \right) \cdot \left( \mathbb{P}\left( x \; \middle| \; \omega_{v1} \right) + \mathbb{P}\left( x \; \middle| \; \omega_{v2} \right) \right) \to \min \\ \sum\limits_{x \in X} \left( q\left( \alpha_{v1} \; \middle| \; x \right) + q\left( \alpha_{v2} \; \middle| \; x \right) \right) \cdot \mathbb{P}\left( x \; \middle| \; \omega_r \right) \le \lambda_r \\ q\left( \alpha_{v1} \; \middle| \; x \right) + q\left( \alpha_{v2} \; \middle| \; x \right) + q\left( \omega_r \; \middle| \; x \right) = 1 \\ q\left( \alpha_{v1} \; \middle| \; x \right) \ge 0 \\ q\left( \alpha_{v2} \; \middle| \; x \right) \ge 0 \\ q\left( \alpha_r \; \middle| \; x \right) \ge 0 \end{cases} $$ This example says

  • Let's minimize occurrences of viruses which are classified as regular files
  • We expect to have not more than $\varepsilon$ regular files which are recognized as viruses

The last four lines of the system of equations leads to deterministic classification strategy, which will say $1$ for the variant it "thinks" to be right and $0$ for wrong one.

Conclusion

You can create any problem you want to, but first of all you should understand your aim and then search for the tools which fit the problem.

0
On

As I understand well, your cost if a fixed value $\lambda_r$, basically what you can do in a Bayesian setting is to consider rejection as a new action, let us call it $\alpha_{c+1}$ whose value is constant ($\lambda_r$), meaning that its cost is $$R(\alpha_{c+1}|x)=\lambda_r$$ whatever the probability distribution over $x$. You then just have to compare this values to all the others, and choose rejection if it is the lowest.