Fixed distribution

989 Views Asked by At

In literature covering statistical learning theory they mention that the training data $(\mathbf{x}_1, y_1),...,(\mathbf{x}_n, y_n)$ are generated from an unknown but fixed joint distribution $ P(\mathbf{x},y)$, $\mathbf{x} \in \mathbb{R}^k \ y \in \mathbb{R}$. What do they mean that the distribution is fixed?

1

There are 1 best solutions below

0
On BEST ANSWER

It means that there exists a single, unknown distribution $\mathbf{P}$ which generates the samples. More pragmatically, that means it is assumed that all $(\mathbf{x}_i,y_i)$ are identically distributed: the distribution $\mathbf{P}$ does not "change over time."

This assumption is crucial for many of these results: you assume that the examples you learn from come from the same distribution; then, after learning, you evaluate how well your hypothesis does with regard to that same distribution. (All points in the training and the testing set come from the same distribution $\mathbf{P}$. You do not know $\mathbf{P}$, but it's something that exists and does not change.)