Need a probabilistic approach to determine if a data-set A includes all the elements of data-set B

29 Views Asked by At

My job is to identify if the two given datasets are same. This is to be done on computers using some programming language (C++).

Since the data could be huge, I don't want to read all the elements of one set and compare with the other one. I guess, it is okay, if one could tell me that there is, for example, 95% probability that two data sets are same or that one is subset of the other. I want some mathematical/probabilistic/statistical method of comparison where I don't have to perform a brute force method.