As far as I understand, for a binary classifier which outputs are either 0 or 1, the accuracy is same as 1 - MSE where MSE stands for Mean Square Error. MSE is a smooth function, therefore it is differentiable everywhere. Thus, can we conclude that accuracy (1 - MSE) is also a smooth function? I have read many times that accuracy is not a differentiable function and because of that it cannot be used as loss function in gradient-based optimization algorithms. Can some one please clarify it for me?
Thanks a lot.