Pardon if this seems off topic, I was reading this recent theory paper in machine learning by Kenji Kawaguchi and Leslie Pack Kaelbling
https://arxiv.org/pdf/1901.00279.pdf
and the authors seem to suggest in section 2.2 that cross-entropy loss for classification is not twice differentiable. This seems wrong, I thought it was $C^\infty$.
What am I missing?
If I'm reading the paper correctly, I think the problem was with PA2, not with the $C^2$ requirement. Although one does have problems at zero, which they may have also meant. The PA2 requirement requires that the loss can be written $L(f,y) = \ell(-yf)$, which doesn't seem doable for the CE loss.