Neural network predicts one class only

374 Views Asked by At

I was implementing a neural network with the MNIST dataset, and I am having issues in terms of classifying $10$ classes. I am pretty sure the implementation of feed-forward and back propagation is correct, and I have used Xavier initialization to initialize the weights. When running mini-batch gradient descent, with only $1$ epoch, the lowest loss I have obtained with standard deviation is $0.3288$ as shown in the graph below: enter image description here

Complete graph of $J(\theta)$:

enter image description here

Note that I'm using a zero-indexed implementation, which means $\frac{\partial{J(\theta)}}{\partial{\Theta^{(l)}}} = a^{(l)}\delta^{(l - 1)}$, and $a^{(l)} = \sigma((\Theta^{(l - 1)})^Ta^{(l)})$

The model has $2$ hidden layers with $10$ nodes each (excluding the bias term). Activation functions for layers $a^{(l)}, \text{ where } l > 0$

As my training data is normalized, normalized test data is used to test the accuracy of the model. Despite a low cost, my model only predicts one class.

You can find the source code here.

To summarize, I have two questions: $1$. What does negative cost indicate? $2$. Why is my model only predicting one class?