Help with a proof (relation between classifier accuracy and $L_{0}$ pseudo-norm)

42 Views Asked by At

I am currently working on a proof to relate a metric with the $L_{0}$ pseudo-norm.

The metric is classification accuracy:

$$\mathrm{Acc} = \dfrac{\mathrm{TP}+\mathrm{TN}}{\mathrm{TP}+\mathrm{FN}+\mathrm{FP}+\mathrm{TN}}$$

Using the denominator as a reference, the variables indicate true positive, false negative, false positive and true negative respectively.

Let there be $M$ examples. $\mathcal{Y}$ is a set containing the class indicators $y_{m}$ of the $m$-th sample. There are only two classes in this problem.

__

What I have done so far:

The denominator is always a fixed value, assuming that the number of samples do not change. We can completely replace the denominator by just $M$.

To achieve $\mathrm{Acc} = 1$, $\hat{y}_{m} = y_{m}$ for all $m$. Let the vector $\mathbf{g}$ contain the evaluations of a $0\text{-}1$ gain function where $1$ is assigned when the prediction matches the true class, else it is assigned $0$. Then we can write the following:

$$\mathrm{Acc} = \dfrac{1}{M} \sum^{M}_{m=1}g_{m}$$

This is similar in essence to the following:

$$\mathrm{Acc} = \dfrac{1}{M} \lVert \mathbf{g} \rVert_{0}$$

The number of non-zero elements in $\mathbf{g}$ is equivalent to $\#\mathcal{Y}_{=}$, where the latter set contains all the examples satisfying $\hat{y}_{m} = y_{m}$. A set of parameters $\bf{\Theta}$ act upon the examples such that $\mathrm{Pr}\{\hat{Y} = Y \vert \bf{\Theta} \}$. For some given set of parameter values, the probability will (or at least is expected to) change, and so will $\mathcal{Y}_{=}$.

__

What I'm attempting to do is to prove that accuracy is a non-convex non-smooth objective function with respect to a set of model parameters that act on the examples (not on the classifier). Although I did not explicitly state how they are related, I assume this is straightforward since changing model parameters will change the example distribution.

P.S. I'm not much a maths person, so any maths input would be very appreciated.

P.S.2 Sorry if the short proof undermines anyone or the concept of mathematical proof itself!