I'm currently reading this paper 'An alternative Softmax Operator for Deep Reinforcement Learning' but not having a solid foundation in maths notation is making it quite difficult.
I attempted applying the softmax operator (MellowMax) to the matrix [1, 2, 3, 4] with w = 2 but I'm going wrong somewhere along the way. I'd imagine it would be fairly straightforward for someone with a degree in maths but my lack of knowledge in this area is making it difficult. I know from this video 'An Alternative Softmax Operator for Reinforcement Learning' that I should be getting 3.3794 with matrix [1, 2, 3, 4] and w = 2 but I'm getting 1.46. My workings are below. I think I'm just applying the above MellowMax equation incorrectly due to my lack of knowledge of maths notation. Any help would be appreciated. Thanks