Active vs Passive statistical learning: How do we say which one is better?

55 Views Asked by Bumbble Comm At 31 Mar 2026 - 8:52

In statistical learning theory, to pose a regression/classification problem, one starts by selecting a set of points $\{X_1, \dots, X_n\}$ then, one labels this to get a dataset $S=\{(X_1,Y_1), \dots, (X_n, Y_n)\}$. Afterwards, one solves an inference problem by minimising the empirical loss function. This approach is usually referred to as Passive Learning (PL). However, the PL approach suffers from two problems: sampling bias and data redundancy.

Hence, an Active Learning (AL) Paradigm has been suggested. In Pool-based sampling (one of the AL strategies) one starts by defining a Pool of big amount of unlabelled data, say e.g. $\{X_1, \dots, X_{100n}\}$, label only few examples (say only 100), and then performs an iterative procedure to select the next batch of the data to label. To this end, many algorithms have been suggested in literature (like uncertainty sampling and Query by Committee). My questions are:

1- Are there any theoretical results on how to compare between two AL algorithms, e.g. something like, Algorithm A is better than Algorithm B if so and so holds. If yes, have they been applied to compare any algorithms? If no, why?

2- Are there theoretical results on how to compare an AL-paradigm to the PL paradigm? i.e. how do we tell AL is better? and under what conditions?

Original Q&A

Active vs Passive statistical learning: How do we say which one is better?

Related Questions in STATISTICS

Related Questions in STATISTICAL-INFERENCE

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions