How can I tell the expected value of a random variable looking at its density function's graph?

1.7k Views Asked by At

There's an intuition in me that whenever I look at a graph of a random variable's graph, its expected value should be that specific $x$ that has the maximum function value. That was the case at normal distribution. But I encoutnered other distributions, like the gamma distribution, and it wasn't always the case with that. I read a Wikipedia that is has the expected value of $k\theta$ and for like $k = 2$ and $\theta=5$ case, the corresponding expected value would be $10$, but the function has its maximum value at around $5$.

I thought the density function says how probable certain values are, although the value not displaying its probability of course, but where the function has bigger values, the probability also goes higher. For example: sampling a random variable, then looking at the sample's histogram clearly shows that more numbers were generated near points where the function had bigger values. Also, I thought the expected value names the value which is the most likely to be drawn of a random variable. So I had a general intuition that where the function has its maximum value, that's what its expected value is.

5

There are 5 best solutions below

10
On

The expected value is the point of mass center of the graph. Imagine, that your density is not just the line, but a flat solid figure made of steel. Imagine you density, standing on a needle. So the expected value is the position of the needle, such that your figure neither falls to the left, nor to the rightenter image description here

0
On

Okay, first of all: These are basically only intuitions about what the expected value means. There are two classic results that rigorously puts the expected value into context:

The first is the Markov inequality stating that if $X$ has finite expectation, then $\mathbb{P}(|X|>c)\leq \frac{\mathbb{E}|X|}{c}$ for any $c>0$, so the absolute first moment of $X$ controls the likelihood of getting big observations. If $X$ has variance, then this implies the Chebyshev Inequality: $\mathbb{P}(|X-\mathbb{E}|X||>c)\leq \frac{VX}{c^2}$. Thus, given that $X$ has variance, it will, with high probability, land close to its mean, in a way controlled by the variance.

The second fundamental result is the Law of Large Numbers (which is very much related to the above) stating that if you sample independent $X_n$'s with the same distribution (which has first moment), then the averages $\frac{1}{n}\sum_{k=1}^n X_n$ converge to $\mathbb{E}(X_1)$.

Now, this does not necessarily imply certain of your intuitions.

a) $\mathbb{E}(X)$ has nothing to do with the most likely single observation. Take for instance $X$to have density $$ f(x)=\begin{cases} \frac{1}{2} & x\in[-(N+1),-N]\cup [N,N+1] \\ 0 & else\end{cases} $$

Then, clearly, no matter $N$, we have $\mathbb{E}(X)=0$, but $X$ will never come anywhere close to $0$ (comparing to the above, this implies that the variance of $X$ grows very large as $N$ does).

b) $\mathbb{E}(X)$ cannot reasonably be expected to be read-off from the density function. To see this, consider the family of densities of the exponential distribution $f_{\lambda}(x)=\lambda \exp(-\lambda x)$ for $\lambda>0$. As it turns out, if $X$ has density $f_{\lambda}$, then $\mathbb{E}X=\frac{1}{\lambda}$, yet this grows large as the maximum value of the density grows small, and all the graphs look sort of similar.

So what's happening in your cases? Well, the point is that there is a clear graphical interpretation of the mean value when the density function is suitably symmetric, but as soon as you're looking at something without any symmetries, there isn't much hope to be able to, at a glance, tell the mean value from the graph. This is no different from the fact that the integral of a symmetric function might be easy to guess by drawing it, but the integral of a general function isn't.

0
On

While hunting for peaks in the density function for unimodal random variables can get you close to the expected value, as soon as you have multiple modes (or even if you have highly non-symmetric unimodal distributions like the gamma distribution, as you observed), this "closeness" fall apart.

Fundamentally the problem you are having is that you are trying to rely on intuition too much. The most probable outcome is not the same thing as a weighted sum of outcomes. In symmetric unimodal distributions it will turn out that this weighted sum happens to be the same as the mode, but you can quite easily construct discrete distributions that will highlight how strange things can get.

Consider the random variable $X$, which takes on the value of $0$ with probability $p$ and the value $\frac{1}{(1-p)^2}$ with probability $(1-p)$. As you let $p$ get closer and closer to 1, you would only reasonably expect to see $X$ take on the value $0$, but the expected value becomes arbitrarily large! Even worse, the expected value is not realizable by the random variable itself!

0
On

At least in my opinion Expected value is actually a "misnomer" and correct name is mean value. You are right in your intuition. If you do just one trial, then the most probable value that you are going to get is the one with highest probability. But usually in practice the average value of lots of trials is more important than just one trial and therefore expected value is used. If you make lots of trials and average them you are going to get what is called "expected value" (ideally infinite number of trials).

0
On

As others say, you cannot generally get intuition from PDF, but with CDF you can: The expected value is the point where the area left and right between the CDF and the vertical line through it are equal

Something like this figure

To see why, note that when the expectation is finite $\lim_{x\rightarrow -\infty}xF(x)=0$ and $\lim_{x\rightarrow +\infty}x(1-F(x))=0$. Then, $$ \int_m^{+\infty} (1-F(x))dx-\int_{-\infty}^m F(x)dx =x(1-F(x))\Big]_m^{+\infty} +\int_m^{+\infty}xf(x)dx-xF(x)\Big]_{-\infty}^m+\int_{-\infty}^mxf(x)dx $$ $$ =-m+\int_{-\infty}^{+\infty} xf(x)dx $$ and this becomes zero when $m$ is the expected value.