First of all, I do not ask this on CrossValidated since I'm speaking here of mathematical statistics, which explicitly belongs to mathematics.
I am currently working on rank tests in order to write an introduction on this theory. My main reference work is Theory of Rank Tests (Hájek, Šidák and Sen, 1999, 2nd edition)
They begin, naturally, by defining the terms and the general framework they will use, as follows:
2.1.1 Probability space and observations From a probabilistic point of view, a random experiment is represented by a probability space $(\Omega,\mathcal{A},P)$, where $\Omega$ is the space of all possible outcomes of the experiment, $\mathcal{A}$ is a $\sigma$-field determining how finely the outcomes are distinguished, and $P(\cdot)$ is a probability measure. In this setup, observations are defined as $\mathcal{A}$-measurable functions.
Until here, I understand perfectly. It is the classical framework.
From a statistical point of view, an experiment is described by an indexed set of observations $X=(X_{1},X_{2},\dots,X_{N})$, each observation $X_{i}$ taking its values $x_{i}$ on the real line or on a proper subset of the real line. The set of observations is determined by its (joint) distribution function $$\begin{align}F(x_{1},\dots,x_{N})&=P(X_{1}\le x_{1},\dots,X_{N}\le x_{N}) \tag{1}\\ &-\infty<x_{1},\dots,x_{N}<+\infty \end{align}$$
Here, I understand that $P$ in $(1)$ is the induced probability measure on $(E^N,\mathcal{E}_{N})$ by $\hat{P}$ where $X_{i}:(\Omega,\mathcal{A},\hat{P})\to(E,\mathcal{E})$ is a $(\mathcal{A},\mathcal{E})$-measurable function for all $i=1,\dots,N$ ($\mathcal{E}$ and $\mathcal{E}_{N}$ are appropriate $\sigma$-fields).
Is that correct?
The book continues with:
2.1.2 Statistics, $\sigma$-fields and $\lambda$-fields A set of observations $X$ together with a distribution function $F(x)$ generates a probability space $(\mathcal{X},\mathcal{A},P)$ as follows: $\mathcal{X}$ is the space of all possible values $x=(x_{1},\dots,x_{N})$ of $X$, i.e. it is an N-dimensional Euclidean space; $\mathcal{A}$ is the $\sigma$-field of Borel subsets of $\mathcal{X}$, i.e. the smallest $\sigma$-field containing all events appearing on the right side of $(1)$; finally, $P(\cdot)$ is the probability measure uniquely determined by $(1)$. The set of observations may be considered as an identity map $X(x)=x,\,x\in\mathcal{X}$.
[...] As a rule, the probability distribution $P(\cdot)$ will be determined by a density $p$, $$P(A)=\int_{A}p(x)\text{d}x\tag{$A\in\mathcal{A}$}$$
This is where I do not fully understand what they mean. They consider $X$ as being defined as follows: $$X:(\mathcal{X},\mathcal{A},P)\to(\mathcal{X},\mathcal{A},P):x\mapsto X(x)=x$$
And then, while usually $P(X\in A)$ was a notation for $P(X^{-1}(A))$, we have now $P(X^{-1}(A))=P(A)$ since $X$ is the identity. But what motivates such a framework? I mean, why would someone consider $X$ as an identity map. I must admit that I am more used to statistics viewed from a probabilistic point of view (actually, it does not make sense to me to speak about "probalistic point of view" and "statistical point of view"). What does motivate someone to consider the "observation set" (whose components are usually considered as a random variables, as far as I know) as an identity map? Why would we need to consider it that way? Why can't we stay in a general "probabilistic point of view" with $X_{i}$ being general random variables?
I understand it can be a very basic question, but I am quite confused with such a choice in the framework.
That's about representation of probability spaces. The original definition is: for random experiment we need to have some sample probability space, and observations are measurable maps from that space to the range space. These maps induce the joint probability measure. Now, usually you know what is the range space, and what is the joint distribution, so 2.1.2 says how to build sample probability space out of it: just take this space itself (of course, there are some other ways to do that, but this one always works). Since random experiment also requires observation maps, you need to define them as well: such that they push joint distribution to joint distribution. The identity map for sure would work in this case. This construction often appears in other areas of probability, such as theory of stochastic processes, and is often called canonical probability space for process etc.