Assume I have $N$ examples of datapoints with $d$ features each and I want to sample $n$ times from $N$ through a sampling function $s$, where $n \ll N$ and $d$ is the same for each datapoint.
Further assume that $s$ selects datapoints such that each new sample must have the maximum distance to the previous sample in feature space. If my goal is to maximally reduce entropy, i.e. learn the most about $N$ by sampling $n$ times, is the sampling function $s$ the best way to guarantee that? Or are there exceptions when choosing the sample that has the maximum distance to previous one in feature space might not be a good idea?
If my question is not well-defined, I appreciate any suggestion for corrections.
Described as you did, the sampling function can end up being deterministic. For example, suppose your $N$ samples are taken as a subspace of $\{0, 1\}^d$, containing the vectors $u=(0, 0, \dots, 0)^T$ and $v=(1, 1, \dots, 1)^T$. Then, if your first sample is $u$, and the distance is the Hamming distance, your function $s$ will sample $u, v, u, v, \dots$, which is deterministic.
Also, I am not exactly sure what you are trying to do. Are you sampling? If not, what are you trying to learn from the $N$ points? If you have no prior information about the $N$ elements, sampling uniformly at random might be the best thing, but I am not entirely sure it is your goal.