Can someone please provide a useful reference on the definition of probabilistic distribution.
A very popular site (top of Google search) states:
A probability distribution is a table or an equation that links each outcome of a statistical experiment with its probability of occurrence.
https://stattrek.com/probability-distributions/probability-distribution.aspx
I feel that this definition is very unsatisfactory. I need a better one with a reference.
Thank you!

To formally introduce the definition of probability distribution one has to have an appropriate notion of probability. Based on the axioms of Probability laid down by Kolmogorov, let's start with a probability space $(\Omega,\mathscr{F},\mu)$ where
Given another measurable space $(R,\mathscr{R})$, a random variable on $\Omega$ taking values on $R$ is a function $X:\Omega\rightarrow R$ such that $X^{-1}(A):=\{x\in\Omega: X(\omega)\in A\}\in\mathscr{F}$ for all $A\in\mathscr{R}$. $X$ is also said to be $(\Omega,\mathscr{F})$-$(R,\mathscr{R})$ measurable.
Definition 1. The distribution of $X$ (which we may denote as $\mu_X$) is defined as the measure on $(R,\mathscr{R})$ induced by $X$, that is $$\begin{align} \mu_X(A):=\mu\big(X^{-1}(A)\big), \quad A\in\mathscr{R}\tag{1}\label{one} \end{align} $$
Note to address one of the concerns of the bounty sponsor Often in the literature (mathematical physics, probability theory, economics, etc) the probability measure $\mu$ in the triplet$(\Omega,\mathscr{F},\mu)$ is also refereed to as probability distribution. This apparent ambiguity (there is no random variable to speak of) can be resolved by definition (1). To see this, consider the identity map $X:\Omega\rightarrow\Omega$, $\omega\mapsto\omega$. $X$ can be viewed a a random variable taking values in $(\Omega,\mathscr{F})$. Since $X^{-1}(A)=A$ for all $A\in\mathscr{F}$ $$\mu_X(A)=\mu(X^{-1}(A))=\mu(A),\quad\forall A\in\mathscr{F}$$
A few examples:
To fixed ideas, consider $(\Omega,\mathscr{F},\mu)=((0,1),\mathscr{B}((0,1)),\lambda_1)$ the Steinhause space, that is $\Omega$ is the unit interval, $\mathscr{F}$ is the Borel $\sigma$-algebra on $(0,1)$, and $\mu$ is the Lebesgue measure $\lambda_1$.
The identity map $X:(0,1)\rightarrow(0,1)$, $t\mapsto t$, considered as a random variable from $((0,1),\mathscr{B}(0,1))$ to $((0,1),\mathscr{B}(0,1))$, has the uniform distribution on $(0,1)$, that is, $\mu_X((a,b])=\lambda_1((a,b])=b-a$ for all $0\leq a<b<1$.
The function $Y(t)=-\log(t)$, considered as a random variable from $((0,1),\mathscr{B}(0,1))$ to $(\mathbb{R},\mathscr{B}(\mathbb{R}))$ has the exponential distribution (with intensity $1$), i.e. $\mu_Y\big((0,x]\big)=1-e^{-x}$
$Z(t)=\mathbb{1}_{(0,1/2)}(t)$, viewed as a random variable from $((0,1),\mathscr{B}(0,1))$ to $(\{0,1\},2^{\{0,1\}})$ has the Bernoulli distribution (with parameter $1/2$), that is $$ \mu_Z(\{0\})=\mu_Z(\{1\})=\frac12 $$
Any $t\in(0,1)$ admits a unique binary expansion $t=\sum^\infty_{n=1}\frac{r_n(t)}{2^n}$ where $r_n(t)\in\{0,1\}$ and $\sum_nr_n(t)=\infty$. It can be shown that the each map $X_n(t)=r_n(t)$ is a Bernoulli random variable (as in example 3). Furthermore, the distribution of $X:(0,1)\rightarrow\{0,1\}^\mathbb{N}$, as a random variable from $((0,1),\mathscr{B}(0,1))$ to the space of sequences of $0$-$1$'s, the latter equipped with the product $\sigma$-algebra (the $\sigma$-algebra generated by sets $\{\mathbf{x}\in\{0,1\}^\mathbb{N}:x(1)=r_1,\ldots,x(m)=r_m\}$, where $m\in\mathbb{N}$ and $r_1,\ldots.r_m\in\{0,1\}$) is such that $\{X_n:n\in\mathbb{N}\}$ becomes an independent endemically distributed (i.i.d.) sequence of Bernoulli (parameter $1/2$) random variable.
Cumulative distribution function
In many applications of Probability, the random variables of interest take values on the real line $\mathbb{R}$. The real line has a natural measurable structure given by the $\sigma$-algebra $\mathscr{B}(\mathbb{R})$ generated by the open intervals in $\mathbb{R}$. This $\sigma$-algebra is known as the Borel $\sigma$-algebra.
It turns out that $X$ is a (real-valued) random variable if and only if $\{X\leq a\}:=X^{-1}((\infty,a])\in\mathscr{F}$ for all $a\in\mathbb{R}$.
The distribution $\mu_X$ of $X$ can be encoded by the function $$F_X(x):=\mu_X((-\infty,x])=\mu(\{X\leq x\})$$
$F_X$ has the following properties: $\lim_{x\rightarrow-\infty}F_X(x)=0$, $F$ is monotone non-decreasing, right-continuous, and $\lim_{x\rightarrow\infty}F_X(x)=1$.
It turns out that any function $F$ that has the properties listed above gives rise to a probability measure $\nu$ on the real line. This is based on basic facts of measure theory, namely the Lebesgue-Stieltjes theorem.
For that reason, $F_X$ is commonly known as the cumulative distribution function of $X$, and very often it is simply referred to as the distribution function of $X$.
Final Comments:
All these things are now discussed in courses on probability. At the basic level -by no means trivial- (Feller, Introduction to Probability, Vol I), people discuss mainly cumulative distribution functions of random variables; at the more advanced level (Feller, Introduction to Probability, Vol II), people work with more general random variables and so the "general" notion of distribution (as in $\eqref{one}$) is discussed.