My job is to identify if the two given datasets are same. This is to be done on computers using some programming language (C++).
Since the data could be huge, I don't want to read all the elements of one set and compare with the other one. I guess, it is okay, if one could tell me that there is, for example, 95% probability that two data sets are same or that one is subset of the other. I want some mathematical/probabilistic/statistical method of comparison where I don't have to perform a brute force method.