Simplifying risk equation for 0/1 loss in machine learning

50 Views Asked by At

I'm currently studying machine learning using the textbook Introduction to Machine Learning 3e (Ethem Alpaydin, 2015) and had a question regarding the derivation of a particular equation. For those curious, this specific equation is found on page 53 of 3.3: Losses and Risk.

More specifically, the context of this particular excerpt is where we perform classification and how we can measure loss and risk.

Let us define action $\alpha_i$ as the decision to assign the input to class $C_i$ and $\lambda_{ik}$ as the loss incurred for taking action $\alpha_i$ when the input actually belongs to class $C_k$. The expected risk for taking action $\alpha_i$ is:

$$ R(\alpha_i\ |\ \pmb{x}) = \sum_{k = 1}^K \lambda_{ik} P(C_k\ |\pmb{x})$$

and we choose the action with minimum risk:

$$\text{choose}\; \alpha_i\; \text{if}\; R(\alpha_i\ |\ \pmb{x}) = \text{min}_k R$$

Let us define $K$ actions $\alpha_i$ ($i = 1, \dots, K$). In the special case of the $0/1$ loss where

$$\lambda_{ik} = \begin{cases}{0\quad \text{if}\; i = k} \\ 1\quad \text{if}\; i \ne k\end{cases}$$

The risk of taking action $\alpha_i$ is:

$$ \begin{align} R(\alpha_i\ |\ \pmb{x}) & = \sum_{k = 1}^K \lambda_{ik}P(C_k\ |\ \pmb{x}) \\\ & = \sum_{k \ne i} P(C_k\ |\ \pmb{x}) \\\ & = 1 - P(C_i\ |\ \pmb{x}) \end{align} $$

because $\sum_{k}P(C_k\ |\ \pmb{x}) = 1$.

In the last portion of the excerpt, I'm having trouble understanding how we get from $\sum_{k \ne 1}P(C_k\ |\ \pmb{x})$ to $1 - P(C_i\ |\ \pmb{x})$.

My understanding of how we got from the first line with $\lambda$ to the second line is that we simply removed the cases where $k = i$ since $\lambda = 0$ in those cases anyway and therefore we only have to consider cases where $k \ne i$.

Also, as is stated in the very last line, since

$$\sum_k P(C_k\ |\ \pmb{x}) = \sum_{k = i}P(C_k\ |\ \pmb{x}) + \sum_{k \ne i}P(C_k\ |\ \pmb{x}) = 1$$

we can obtain

$$\sum_{k \ne i}P(C_k\ |\ \pmb{x}) = 1 - \sum_{k = i}P(C_k\ |\ \pmb{x})$$

Assuming that my understanding is correct up until here, I'm having trouble understanding how

$$\sum_{k = i}P(C_k\ |\ \pmb{x}) = P(C_i\ |\ \pmb{x})$$

How was this derivation made? Any tips or advice are appreciated. Thank you.

1

There are 1 best solutions below

0
On

For any real numbers $a_1,\dotsc, a_K$, if $$a_1+\dotsc + a_{i-1} +a_{i}+a_{i+1} +\dotsc+a_{K}=1$$ then, by subtracting $a_i$ from both sides, we have $$\sum_{k\neq i, \quad 1\leq k \leq K} a_k = 1-a_i. $$ Apply this to $a_k=P(C_k|x)$.