I have a question concerning the the inverse mapping in the image . text extracted from Casella Statistical inference
$g^{-1}(A) = \{ x \in \chi : g(x) \in A\}$
I know the idea that they want to express "for all the $x$ in $\chi$, that gives us the values in set A, but is the inverse mapping "$g^{-1}(A)$" a proper function mapping $y \in A$ back to $x \in \chi$, as it is possible that many $x$ give the same $g(x)$. And later we will see that
$$ \begin{aligned} P(Y \in A) &=P(g(X) \in A) \\ &=P(\{x \in \mathcal{X}: g(x) \in A\}) \\ &=P\left(X \in g^{-1}(A)\right) . \end{aligned} $$
which directly express the probability in terms of $g^{-1}(A)$, doesn't it cause problem as $g^{-1}(A)$ only gives 1 output whereas there are possibilities that satisfy the condition $\{ x \in \chi : g(x) \in A\}$?
Thanks
Output of $g^{-1}(A)$ is a set: $g^{-1}(A) = \{x \in \mathcal{X}: \ g(x) \in A\}$. this generalization into set mapping was built to be not contradicting with "ordinary" function $g(x)\leftrightarrow g(\{x\})$. For example, we can say, that
$$g(\{x_1,x_2\})= \{g(x_1),g(x_2)\} = \{y_1,y_2\}$$ and $$g^{-1}(\{y_1,y_2\}) = \{x_1,x_2,x_3\}, $$ as there may exist $x_3$, such as $g(x_3)=y_1$. For $g=\sin(x)$ and $g^{-1}({a})$ we have all the set of points, for which $\sin(x)=a$. The notation shows that we somehow do not touch the inverse of function $g^{-1}(x)$, but find all $x \in \mathcal{X}$ for which $g(x) \in A$.
Now there's really one step that was omitted. This step was definition of a random variable:
$\Omega$ is a set of elementary outcomes, the event is a subset $A \in \mathcal{F}$ of elementary outcomes: $A \subset \Omega$.
That means that we can operate events (subsets of $\Omega$ and sigma-algebra $\mathcal{F})$: $$(X \in B) \leftrightarrow A = X^{-1}(B) = \{\omega \in \Omega: \ X(\omega) \in B\}.$$
This allows us to find the preimage in $(\Omega, \mathcal{F})$ of every Borel set on reals (an event that corresponds to some value of random variable). We can prove that $X^{-1}(\sigma(\mathcal{C}))=\sigma(X^{-1}(\mathcal{C}))$ for any class of sets $\mathcal{C}$ and this operation preserves algebras. So now we can operate with events that correspond to some values of a random variable: $A \leftrightarrow (X \in B) \leftrightarrow X^{-1}(B)$. Borel sets even give us the cumulative distribution function:
$X$ is measurable $\Leftrightarrow \ \forall x \ \ A_x = \{\omega \in \Omega: \ X(\omega) \leq x\}\in \mathcal{F}$
Now we are prepared to read this notation: probability $P(\cdot)$ means the measure of some set. We can express it in different ways:
$$P(A) = P(X \in B) = P(\{\omega \in \Omega: \ X(\omega) \in B\}) = P(\{\omega \in X^{-1}(B) \}).$$
If we know the values of random variable (a set $B$ that can be single-point, e.g. $X=b$), we can find the subset (event) in $(\Omega, \mathcal{F})$ and calculate its measure.
From another side, for some event in $(\Omega, \mathcal{F})$ we can calculate the values of random variable (and find expectation, for example). One $\omega \in \Omega$ corresponds to a single value of random variable $X(\omega)$.
The example I gave at the beginning can be interpreted as: event $(X \in B)$ can be a result of 3 outcomes, so its probability is $P(\{x_1,x_2,x_3\})$. If we know that outcomes $\{x_1\}$ and $\{x_2\}$ have occurred, we can find the corresponding values of random variable $X$. Sometimes we operate in such probability space that doesn't separate some outcomes, e.g. random variable has the same value on all the elementary outcomes: $X(\omega_1)=X(\omega_2)=X(\omega_3)=b$, then we have no possibility to consider them as different: event is occurred from any of the outcomes, but the probability of the event $(X=b)$ is the measure of $X^{-1}(B)$. For example, we have $4$ kings and 36 cards total. The event is drawing a king and correspond to the value of random variable $X=1$. So, if we have drawn a king, then $X=1$, but the probability of $X=1$ is $4/36$.