What temperature of Softmax layer should I use during neural network training?

10.5k Views Asked by At

I've written GRU (gated recurrent unit) implementation in C#, it works fine. But my Softmax layer has no temperature parameter (T=1). I want to implement "softmax with temperature": $$ P_{i} = \frac{e^{\frac{y_{i}}{T}}}{\sum_{k=1}^{n}e^{\frac{y_{k}}{T}}} $$ but I can not find any answers to my question: should I train my neural network using T=1 (my default training), or I should use some specific value somehow related to value, which I intend to use during sampling?