Maximizing correct(or incorrect) labeling and prove its bound.

23 Views Asked by At

Suppose there are M workers labeling N items (items have true label +1/-1). For each worker, it has $\frac{1+p_i}{2}$ $(-1\le p_i\le 1)$ successful probability to recover the true label of an item.

Given the context, now we have a $N\times M$ random matrix $F$ where $F_{ij}$ is the label of item $i$ by worker $j$. My interest is: finding an algorithm to return N label where, with 90% probability, we have $O(\frac{\ln(MN)}{M})$ correct labels or $O(\frac{\ln(MN)}{M})$ incorrect labels.

My thought is to find the column with $max\{|p_i|\}$ so that this worker's labels are at least consistently right/wrong.

Current strategy is to find column sum, sqraue it, subtracted by $N$, then take absolute value, and use the column with highest value.

Some basic reasoning (starting point), for column $j$ (use $l$ to denote label):

$$E((\sum\limits_{p=1}^n F_{pj})^2)=E(\sum\limits_{p,q=1}^n F_{pj}F_{qj})=E(N+\sum\limits_{p\neq q}^n F_{pj}F_{qj})=N+p_j^2\sum\limits_{p\neq q}^n l_pl_q$$

The last step holds since $E(F_{pj}F_{qj})=l_pl_q((\frac{1+p_j}{2})^2+(\frac{1-p_j}{2})^2)-2l_pl_q((\frac{1+p_j}{2})(\frac{1-p_j}{2}))=l_pl_q p_j^2$ for $p\neq q$

So this looks good, as expectation shows some indication of $p_j^2$: after subtracting $N$ the expectation reveals the magnitude of $p_j^2$, but I have struggled for a few weeks to prove, by doing this algorithm, I can get the desired bound.

And hints/helps are super appreciated.