Deep Learning: First Loss in Epoch Should be $-\log(1/C)$?

52 Views Asked by At

I found the following introductory slides on deep learning https://www2.cisl.ucar.edu/sites/default/files/0900%20June%2023%20Kashinath.pdf

On slide 61, there is a practical tips that says:

[...] if you are using a negative log-likelihood for a $10$-classes classification problem you expect your first loss to be $\sim -\log\left(1/C\right) = -\log\left(1/10\right) \approx 2.3$

(Note that $\log$ denotes the natural logarithm.)

My question is:

Assuming that we are starting from randomly initialized weights (an assumption that is also made on the slide), how can we prove that we expect the first loss to be around $-\ln(C^{-1})$, where $C$ denotes the number of classes?