For a Bayes classifier, can we prove that adding noise to data does not increase its accuracy?

39 Views Asked by At

In terms of a Bayes classifier, it's intuitive to consider that adding noise to data CANNOT increase the accuracy.

Taking a binary classification problem as an example. The data distribution is $(x,y)\sim D =\mathcal{N}(y,1)$, where $y\in\{-1,1\}$ and $p(y=-1)=p(y=1)$. It is obviously that the Bayes classifier is: $$ \hat{y}=\begin{cases} -1, & \text{if x < 0}\\ 1, & \text{otherwise} \end{cases} $$ The accuracy is denoted as $B(D)$, and it is calculated by: $$ B(D)=\int\max_yp(y|x)p(x)\text{d}x $$ Next, let $\epsilon\sim\mathcal{N}(0,1)$ to denote the noise. By adding noise to the original data we can get $z=x+\epsilon$, and the noise-added distribution is denoted by $D_N$. It turns out that $(z,y)\sim D_N=\mathcal{N}(y,2)$. Its Bayes classifier is the same as the above-mentioned one. Its coresponding accuracy is deplicted by $B(D_N)$ and obviously we have $B(D_N)\leq B(D)$.

The problem is, how to prove that adding noise to data cannot increase the accuracy?

The problem is mathmatically defined as follows. Let $(x,y) \sim D$ be the distribution of the original data, where $x$ is data and $y$ is label. Denote $\epsilon \sim N$, where $\epsilon$ is of the same dimension with $x$ and $N$ is a distribution of a certain noise (not limited to Gaussian noise). Note that $\epsilon$ is independent with $x$ and $y$. Let $z=x+\epsilon$ be a noise-added data, and denote the noise-added distribution as $(z,y) \sim D_N$. Proving: $B(D_N)\leq B(D)$.