Lecture notes with unclear assumption about statistical learning theory

138 Views Asked by At

Consider the following introduction to a framework for statistical learning theory, taken from here:

enter image description here

enter image description here

These paragraphs contain a number of unclear points (in decreasing order or importance for me):

1) Is the domain of the random variables $(X_i,Y_i)$ also $\mathcal{X}\times\mathcal{Y}$, like its codomain?

2) What are the "usual topologies" on $\mathcal{X},\mathcal{Y}$? The notes don't make any assumption about what the sets $\mathcal{X},\mathcal{Y}$ might be, so it seems very awkward to suddenly assume they carry some topology. Furthermore it is stated that every function is assumed to be a Borel function, and thus implicitly every domain and codomain of every function are assumed to carry a topology. Weird.

3) Why do we have two notation for the probability of an event, $P(A)$ and $\mathbb{P}[A]$? Doesn't make any sense to me.

1

There are 1 best solutions below

1
On BEST ANSWER

Here's a way to frame statistical learning into the usual probability theory setting.

Consider $(\Omega,\mathcal A, Q)$ an arbitrary probability space. Let $(\mathcal X,\mathcal B(\mathcal X))$ and $(\mathcal Y,\mathcal B(\mathcal Y))$ be the input and output spaces equipped with their Borel sigma-algebra. Popular choices are $\mathcal X=\mathbb R^d$ and $\mathcal Y=\mathbb R$.

Let $P$ be a probability measure over $(\mathcal X \times \mathcal Y, \mathcal B(\mathcal X) \otimes \mathcal B(\mathcal Y))$ and $(X,Y):\Omega \to \mathcal X \times \mathcal Y$ be a random element. $(X,Y)$ is distributed according to $P$ if the pushforward measure of $Q$ by $(X,Y)$ is $P$: $$\forall C\in \mathcal B(\mathcal X) \otimes \mathcal B(\mathcal Y), Q((X,Y)\in C) = P(C)$$