Mean of a random variable $X$, the expectation value of $X$ is defined as $$ \operatorname E(x)=\mu=\sum_{i=1}^n x_i p_i=x_1 p_1+x_2 p_2+\cdots+x_n p_n $$
Can I have a derivation of this formula from the statistical definition of mean ?
My Understanding
The mean of a statistical data is defined as $$ \bar{x}=\frac{\sum_{i=1}^n x_if_i}{\sum_{i=1}^n f_i} = \frac{\sum_{i=1}^n x_i f_i} n $$ The first expression defines the mean of data points that could be obtained if we perform a random experiment, ie. mean obtained from theoretical predictions using probability theory, and the later expresses the mean of the data points obtained after performing a random experiment, ie. mean obtained after the random experiment is performed. Please correct me if I am wrong !
My Attempt
Consider a random experiment and $X$ is a random variable taking values $\{x_1, x_2, \ldots, x_n\}$ with probabilities $p_i=P(x_i)$ and we are conducting $n(S)=N$ independent trials and the outcomes of the experiment are $\{ x^1, x^2, \ldots, x^N\}$. $$ p_i =\frac{n(x_i)}{n(S)}=\frac{n(x_i)}{N}\implies n(x_i)=N p_i $$ So the frequency of occurrence of $x_i$ is $f_i=N p_i$ $$ \bar{x}=\frac{\sum_{j=1}^n x^j} N = \frac{\sum_{i=1}^nf_i x_i} N \approx \frac{\sum_{i=1}^n N p_i x_i} N = \sum_{i=1}^n p_i x_i = \operatorname E(X) $$
Is there a better explanation/derivation of the expression of the expectation value in probability theory or how else one can show both expression mean the same thing ?
You can't exactly show how to derive the population mean $\mu$ of a discrete random variable $X$ from the sample mean $\bar X$ of $n$ realizations of the random variable because $\mu$ and $\bar X$ are different things. However, a modification of the displayed equation at the end of your Question can show that the principle behind the two is the same. My approach is to give a slightly better version of your last equation.
First, if the observations $X_1, X_2, \dots, X_n$ are put into 'frequency-value' format, you have a frequency $f_j$ for each value $v_j.$ For example, if the $n = 10$ observations are $X = (1, 3, 2, 3, 4, 5, 2, 1, 1, 5),$ then you have $k = 5$ values $(v_j)$ 1,2,3,4,5 with respective frequencies $(f_j)$ 3, 2, 2, 1, 2. Because there are 10 observations the $f_j$'s must sum to 10. [Notice that I use different symbols for $X_i$'s and $v_j$'s in order to avoid confusion. And for convenience we assume that our sample of ten values has taken on all possible population values, otherwise we would have to include extra $v_j$'s with frequencies $f_j = 0.]$
There are two ways to compute $\bar X.$ First,
$$\bar X = \frac{1}{n}\sum_{i=1}^{n} x_i = \frac{1}{10}(1+3+2+3+4+5+2+1+1+5) = 27/10 = 2.7.$$
Second,
$$\bar X = \frac{\sum_{j=1}^k f_jv_j}{\sum_{j=1}^k f_j} = \frac{3(1)+2(2)+2(3)+1(4)+2(5)}{3+2+2+1+2} = 27/10 = 2.7.$$
Using the second expression for the sample mean, one can write
$$\bar X = \frac{\sum_{j=1}^k f_jv_j}{\sum_{j=1}^k f_j} = \frac{\sum_{j=1}^k f_jv_j}{n} = \sum_{j=1}^k (f_j/n)v_j = \sum_{j=1}^k r_jv_j,$$ where the last sum uses the notation $r_j = f_n/n$ for relative frequencies.
Now imagine larger and larger samples as the sample size $n$ approaches the population size $N.$ Then the values $v_j$ remain the same and are called $x_j$'s again to represent the various values the the random variable $X$ can assume. Also, by the Law of Large Numbers, the $r_j$'s converge to probabilities $p_j= P(X = x_j).$ Then we can say that the sample mean $\bar X_n$ of $n$ observations converges ("in probability") to the population mean $\mu.$ We might write this as $$\text{plim}_{n\rightarrow\infty}\bar X_n = \mu = E(X) = \sum_j p_jx_j = \sum_j x_jP(X=x_j),$$ where the sums are taken over all the possible values in the distribution.
As a practical example, consider $X \sim \mathsf{Binom}(n=4, p = 1/2),$ the distribution of the number of Heads when four fair coins are tossed. A sample of size $m = 10$ might have values $x = (3, 0, 4, 3, 1, 0, 2, 3, 2, 1).$ The frequencies of the five possible values are 2, 2, 2, 3, 1, respectively. The sample mean is $\bar X = 19/10 = 1.9.$ We know that the mean of the binomial distribution is $E(X) = \mu = np = 4(1/2) = 2.$ So $\bar X_{10}$ is close to $E(X),$ but not exactly equal. (Simulation in R statistical software.)
Now if we take a very large sample of size $m = 100,000$ (that's like playing $m$ four-toss games). Here are results from one simulation. Notice, in particular, the $\bar X_{100000} = 2.00035 \approx E(X) = 2.$
A histogram of the results is shown below [blue bars] along with the exact probabilities from $\mathsf{Binom}(4, .5)$ [red dots]. The relative frequencies of the sample of size 100,000 are not quite exactly the same as the binomial probabilities, but the differences are too small to see clearly at the resolution of the figure. (In this case the Density scale of the histogram is the same as a Relative Frequency scale, because bars are of width $1.$)