In the excellent paper Wu, Chao-Yuan; Manmatha, R.; Smola, Alexander J.; Krähenbühl, Philipp (2017): Sampling Matters in Deep Embedding Learning. Available online at https://arxiv.org/pdf/1706.07567.
the authors describe a uniform sampling strategy for the triplet loss. They show empirically and mathematically that random sampling, hard triplet mining and semi-hard triplet mining yield semi-optimal results which is why they present the distance weighted uniform sampling strategy. They say that they "sample uniformly according to distance, i.e. sampling with weights $q(d)^{−1}$." "Formally, given an anchor example $a$, distance weighted sampling samples negative pair $(a, n*)$ with $Pr(n* = n|a) ∝ min (λ, q^{-1}(D_{an}))$."
Here for me two questions arise: The first is of "did I get it right" nature while the second is more technical.
First: Do I get it right, that in this context $q^{-1}(D_{an}))$ means that they return the smallest probability for which $q(p) > D_{an}$ where $q$ is a uniform distribution and hence $q(D_{an})$ is according to the definition of the uniform distribution always the same.
Second: How do they technically / algorithmically get the $D_{an}$ values to then sample uniformly from them (Here I assume that I understand the procedure correctly)? My assumption is the following: For each batch they compute all $D_{an}$ distances between the ancor and the negatives in the batch and then order the distances from smallest distance to largest distance to then sample from them uniformly.
I hope somebody can help me here.