Equivariance in deep learning, it is actually defined as a property of some function $f$ to permute the outputs according to permutation of inputs, i.e.
$$f(PAP^T) = Pf(A)P^T$$
for some permutation matrix $P \in \{0,1\}^{n \times n}$, and $A \in \mathbb{R}^{n \times n}$. But I also find the following (guess equivalent) statement,
$$f(P^TAP) = P^Tf(A)P.$$
I know that applying $P$ on the left ($PA$) it will permute the rows of $A$ while permuting on the right ($AP$) we will permute the columns. Why should I use $P^T$? I don't fully understand the definition.