Why are there some constraints on linearity of expectation?

341 Views Asked by At

I'm quoting from here:

Expectations are linear, for example, $ E[αf(x) +βg(x)] =αE[f(x)] +βE[g(x)],$ when α and β are fixed (not random and not depending on x).

My question about contraints on expectation property. It is clear to me that $ α $ and $ β$ should not depend on $ x$, however I haven't understand why they can not be a random variable which are drawn from a distribution that is independent from distribution of $ x$. Is there a special propert of integration that doesn't allow the random variables to taken out of integration without changing the remaining part of it?

1

There are 1 best solutions below

3
On BEST ANSWER

Let's start by the core idea. The linearity of the expectation can be formulated in a more general way as:

Let $X_1, X_2,\ldots,X_n$ be random variables and $a_1, a_2,\ldots,a_n$ are scalars. Then $E[a_1X_1+a_2X_2+\cdots+a_nX_n] = a_1E[X_1]+a_2E[X_2]+\cdots+a_nE[X_n]$.

From this definition of linearity we have two main points:

  • The $X_i$'s doesn't have to be independent (one constraint less!)
  • The $a_i$'s are scalars, not random variables.

Linearity of the expectation comes from the linearity of the integration (summation) and you would see that this in turn is related to the important concept of a linear combination, where the coefficients of the combination are scalars.

Now, suppose that we insist in using independent random variables instead of scalars as coefficients of the linear combination. Let's explore what are actually the properties that we would use with your definition.

Let me denote $f(X)$ as $X_1$ and $g(X)$ as $X_2$. Let $A$ and $B$ be random variables independent of $X$ and $Y = AX_1+BX_2$. Let $W = AX_1$ and $Z=BX_2$. We have then $Y = W + Z$. How do we compute $E[Y]$? Let's see:

\begin{align} E[Y] &= \iint (w+z)f_{W,Z}(w,z)dwdz\\ &= \iint wf_{W,Z}(w,z)dwdz + \iint zf_{W,Z}(w,z)dwdz\\ &= \int w \left(\int f_{W,Z}(w,z)dz\right) dw + \int z \left(\int f_{W,Z}(w,z)dw\right) dz\\ &= \int w f_{W}(w) dw + \int z f_{Z}(z) dz\\ &= E[W] + E[Z], \end{align}

where in the second inequality we applied the linearity property of the integral, in the fourth one we compute the marginal PDF of one variable from the integration of the joint PDF over the other random variable and in the last one we use the definition of expected value.

Now, how do we compute $E[W]$?

\begin{align} E[W] = E[AX_1] &= \iint \alpha x_1f_{A,X_1}(\alpha,x_1)d\alpha dx_1\\ &= \int \alpha f_{A}(\alpha) d\alpha \int x_1f_{X_1}(x_1) dx_1 = E[A]E[X_1] \end{align} but here we do not use the linearity of the integral! What we have used in the second line to factor out is the independence of the random variables and a corollary of the Fubini's theorem!

So, to get the result

$$E[Y] = E[A]E[f(X)]+E[B]E[g(X)],$$

what we are actually applying is two properties of the expected value, the linearity property and the property of the correlation (expected value of the product) of independent random variables.