Strange peaks in error graph during gradient descent

211 Views Asked by At

I am training my neural network (2 layers GRU x 512 neurons each, softmax output layer, cross-entropy error - total of ~2.5M parameters) language model on J.R. Martin's book (network tries to predict next letter based on previous steps). My training set consists of 1.3MB of text, divided in 50 chunks (batches). Sequence length is 50. Network memory is reset at start of each epoch - when I start to feed my training data to it from the beginning once more. With these settings each epoch is 267 iterations (each iteration - one sequence of 50 steps) I use RMSProp with momentum and weight decay for optimization.

So, I got this graph of error (not smoothed by any filters for debug reasons): Cross entropy vs iterations You can clearly see peaks at rate of exactly epoch restart. Is it normal? Why there is a peak in error when I reset memory and data set starts over? Does it affect training performance?

Thanks.

P.S. Also, another thing bothers me: very first iterations always looks like this (in any task, with any training data): Start err peak Is it OK to have this very high peak in the very begining, after ~10-15 iterations? It does not appear later in training, only on very start.

1

There are 1 best solutions below

0
On

Ok, I found a bug in training algorithm, wich lead to dataset position reset few iterations before network memory reset. So network started to read the same text once more but with some old unrelated context, and this caused prediction errors. So it's all fine now.

But question about second graph still stands, this peak is still there (does not seem to cause any trouble in later training, though)