In statistical learning theory, to pose a regression/classification problem, one starts by selecting a set of points $\{X_1, \dots, X_n\}$ then, one labels this to get a dataset $S=\{(X_1,Y_1), \dots, (X_n, Y_n)\}$. Afterwards, one solves an inference problem by minimising the empirical loss function. This approach is usually referred to as Passive Learning (PL). However, the PL approach suffers from two problems: sampling bias and data redundancy.
Hence, an Active Learning (AL) Paradigm has been suggested. In Pool-based sampling (one of the AL strategies) one starts by defining a Pool of big amount of unlabelled data, say e.g. $\{X_1, \dots, X_{100n}\}$, label only few examples (say only 100), and then performs an iterative procedure to select the next batch of the data to label. To this end, many algorithms have been suggested in literature (like uncertainty sampling and Query by Committee). My questions are:
1- Are there any theoretical results on how to compare between two AL algorithms, e.g. something like, Algorithm A is better than Algorithm B if so and so holds. If yes, have they been applied to compare any algorithms? If no, why?
2- Are there theoretical results on how to compare an AL-paradigm to the PL paradigm? i.e. how do we tell AL is better? and under what conditions?