Computing an expectation based on accuracy of future data as a random variable for a binary classifier.

Question

Computing an expectation based on accuracy of future data as a random variable for a binary classifier.

64 Views Asked by Bumbble Comm At 30 Mar 2026 - 12:15

I am reading Peter Flach's introduction to machine learning and the following paragraph p.345 (top of the page) here for measuring the performance of a binary classifier.

In machine learning the situation is usually more concrete, and our experimental objective – accuracy, say – is something we can measure in principle, or at least estimate (since we’re generally interested in accuracy on unseen data). However, there may be unknown factors we have to account for. For example, the model may need to operate in different operating contexts with different class distributions. In such a case we can treat accuracy on future data as a random variable and take its expectation, assuming some probability distribution over the proportion of positives pos. Since $acc = pos·tpr + (1 − pos)·tnr$ where (tpr = true positive rate, tnr = true negative rate), and assuming we can measure true positive and negative rates independently of the class distribution, we have (assuming a uniform distribution over pos)

$\mathbb{E}[acc] = \mathbb{E} [pos·tpr +(1−pos)·tnr] = \mathbb{E}[pos]tpr+E[1-pos]tnr = tpr/2+tnr/2$.

I do not understand this statement. What is the relationship between the proportion of positives and the random variable (the accuracy of the model)? Any insights appreciated.

Original Q&A

There are 2 best solutions below

Bumbble Comm On 31 Oct 2019 - 6:46

They are using the total law of probability,

\begin{align}\mathbb{E}[acc] &=\mathbb{E}[acc|pos]Pr(pos)+ \mathbb{E}[acc|neg]Pr(neg)\\ &=tpr\cdot \frac12 + tnr\cdot \frac12\end{align}

They assume the proportion of positive and negative data to be both $\frac12$.

If we know the data comes from the positive class, the accuracy out of that class is known as tpr. Similarly for tnr.

**Bumbble Comm** · Accepted Answer

I think this is what the author is saying:

The formula $acc = pos \cdot tpr + (1 - pos) \cdot tnr$ describes how $acc$ varies as a function of $3$ parameters: $pos, tpr, tnr$.
- The formula itself is based on the law of total probability, but that's irrelevant here. Here's a useful example:
- $acc = P(Test=Disease)$
- $pos = P(Disease=1)$
- $tpr = P(Test=1\mid Disease=1)$
- $1-pos = P(Disease=0)$
- $tnr = P(Test=0 \mid Disease=0)$
The parameters $tpr = P(Test=1\mid Disease=1)$ and $tnr = P(Test=0 \mid Disease=0)$ are constants via "assuming we can measure true positive and negative rates independently of the class distribution".
The parameter $pos = P(Disease=1)$ is unknown via "the model may need to operate in different operating contexts with different class distributions", so we model $pos$ itself as a random variable, representing "some probability distribution over the proportion of positives pos" (presumably, varying due to "different operating contexts").
Then by linearity we have:

$$\mathbb{E}[acc] = \mathbb{E} [pos·tpr +(1−pos)·tnr] = \mathbb{E}[pos]tpr+\mathbb{E}[1-pos]tnr $$

IMHO everything so far is non-controversial. However, the author then made one more assumption:

By further "assuming a uniform distribution over pos" we have $\mathbb{E}[pos] = 1/2$
- This is because $pos \in [0,1]$ obviously, so a uniform $pos \in [0,1]$ will have mean $1/2$.

IMHO this last assumption might be completely unrealistic (in real life). E.g. if positive means a person having a rare-ish disease, then it's hard to imagine a real-life scenario where $pos \sim Unif(0,1)$ is a good assumption - instead $pos$ will usually be a r.v. highly concentrated near $0$.

Hope this makes sense?

Computing an expectation based on accuracy of future data as a random variable for a binary classifier.

There are 2 best solutions below

Related Questions in PROBABILITY

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions