0
I have $n$ data points that run in hundreds of millions. Ideally, I want to connect them with each other (based on a condition), run random walks on this interaction network, and make some inferences based on the random walks. This is intractable.
Naturally, I would like to subsample this data, connect with each other, and run random walks. What should be this sub-sampling size so inferences made on the small network are reflected in some confidence with regards to the original network?