Multi Label Classification: Union of two Binary Sets

108 Views Asked by At

I have started Data Science on my own and was looking at evaluation metrics. I came across this measure of Accuracy with the equation $$ \frac{1}{p}\sum_{i=1}^{p}\frac{\vert Y_i \cap Z_i \vert}{\vert Y_i \cup Z_i \vert}$$. I particularly want to know if $Y= \{0,0,1,1,0\}$ and $Z=\{1,0,1,0,0\}$. For the second case where the elements in the set are not equal for example if $Y= \{0,0,1,1,0\}$ and $Z=\{1,0,1\}$, then what would be $ Y_i \cap Z_i$ , $ Y_i \cup Z_i$ , $\vert Y_i \cap Z_i \vert$ and $\vert Y_i \cup Z_i \vert$. I know the set operations of union and intersection but here i am confused in the context of Multi Label classification. Thanks

1

There are 1 best solutions below

0
On

It seems to be a confusion of two common representations for a classification in multi-label classification.

Your example seems to deal with five labels (also called classes), so lets name labels $A,B,C,D$ and $E$. Lets say that a particular object $x$ is classified with only labels $C$ and $D$. The two common representations for a classification are:

  • The set of relevant labels representation. For $x$ it will be the set $\{C,D\}$.
  • The one-hot vector representation. For $x$ it will be the vector $(0,0,1,1,0)$. The $0$ in first position means that $x$ is not classified with $A$ and a $1$ in the third position means $x$ is classified with $C$.

You provided the Example-Based Accuracy equation in the set representation, which is defined as: $$ \frac{|Y\cap Z|}{|Y\cup Z|}, $$ except that your definition makes an average over $p$ samples. But your $Y$ and $Z$ are represented in the one-hot representation, which is incompatible with this accuracy definition.

There is also a definition for Example-Based Accuracy in the one-hot representation: $$ \frac{\sum_{i=1}^n Y_i\cdot Z_i}{\sum_{i=1}^n Y_i + \sum_{i=1}^n Z_i - \sum_{i=1}^n Y_i\cdot Z_i}, $$ where $n$ is the number of possible labels (5 in your example).