I am calculating the softmax function over a matrix containing random float values using the following methods:
- row-wise
- column-wise
- Considering the whole matrix
After calculating the values, I have drawn the heatmaps of each resulting matrix. As I have noticed, the patterns in the heatmap (the relative sizes of each cell) are the same for all three methods. I drew this for several random matrices and experienced the same.
What is the reason for this?
If you need to experiment, I have created a Google Colab notebook with Python code.