I have a programming assignment that asks me to do mini-batch training. In particular, we are working with the MNIST dataset, which contains 60000 training samples. I would like to figure out the most efficient way to shuffle these images. The idea is to find a bijective hash (or permutation) $H$ on $\{0, 1, \cdots, 59999\}$ such that $X_{\text{shuffled}}[H[i]] = X_{\text{original}}[i]$ effective shuffles the dataset. In other words, $H$ maps the $i$-th element in the original dataset to the $H[i]$-th element in the shuffled dataset. Additionally, the permutation $H$ should have a long cycle, so that I won't get the same order every few shuffles. To clarify, I will do successive shuffles based on the current one, e.g.
\begin{aligned} X_{\text{shuffled}}[H[i]] =& X_{\text{original}}[i] \\ X_{\text{doubly shuffled}}[H[i]] =& X_{\text{shuffled}}[i] \\ X_{\text{triply shuffled}}[H[i]] =& X_{\text{doubly shuffled}}[i] \\ \cdots \end{aligned}
By "$H$ should have a long cycle", I mean I don't want to see something like $X_{\text{triply shuffled}} = X_{\text{original}}$.
I heard that I can let $H[i] = (A \times i) \bmod 60000$, where $A$ is an integer coprime with 60000. I picked $A = 999999000001$ with the hope that such a large prime can give me some randomness, but it just maps everything to themselves
>>> np.all(np.array([(999999000001 * i) % 60000 for i in range(60000)]) == np.arange(60000))
True
On the other hand, a small $A$ gives a more promising result, but it still does not seem very random
>>> np.sum(np.array([(11 * i) % 60000 for i in range(60000)]) == np.arange(60000))
10
>>> [(11 * i) % 60000 for i in range(60000)][-10:]
[59890, 59901, 59912, 59923, 59934, 59945, 59956, 59967, 59978, 59989]
There are other methods like doing a Fisher-Yates shuffle on the indices $0, 1, \cdots, 59999$ and using it as $H$, but I am not sure how that will work since essentially I am calling Fisher-Yates once and use its result successively.
How can I improve the randomness and period of $H$?
As a bonus, why is $(A \times i) \bmod 60000$ guaranteed to be bijective when $\operatorname*{GCD}(A, 60000) = 1$? I know nothing about number theory but I am curious.