I understand what the probability distribution is.
I also have a personal understanding/interpretation of the concept of distribution of a dataset. Whenever I see this expression I imagine a graph with frequency as the y-axis and the members of the data set on the x-axis, for each of them(members of the data set) the graph containing a point at the corresponding frequency level.
Is this the correct interpretation ? Is "distribution of a datset" = "probability distribution" ? To me it doesn't look like the two concepts are the same thing.(probably subtly related but not the same thing)
I was unable to find a standard definition of this concept. Can you provide me with a pointer to a resource defining it ?
When authors say: "Two data sets drawn from the same underlying distribution", what exactly do they mean by "underlying distribution" ? Do they mean the same thing as I mentioned above, i.e. a graph like :frequency vs each member of the data set ?
I think what you mean distribution of a dataset actually refers to the distribution of data instances. For a general dataset, you do not know the length, so if you want to define a probability density (assume continuous case) for all possible datasets, you would have to assume infinity dimension. But in practical, every instance in the dataset have a fixed length representation, which corresponds to $d$-dimensional vector, and it is easy to assign a probability density to every such $d$-dimensional point.
For your question,
1) I think "distribution of a datset"="distribution of all instances"
2) I don't think there exists formal definition.
3) I think you are correct.