So $\mathrm{Var}(X) = \mathrm{E}((X-\mu)^2)$, but how can you subtract a function $(X)$ by a value ($\mu)$? And does it make sense to square a function?
Definition of a random variable $\mathrm{Var}(X)$
138 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 2 best solutions below
On
The answer provided is excellent, however I'll offer a little simpler perspective.
Subtraction, "$-$", is what's called a binary operation on numbers. The subtraction operation "combines" two numbers and outputs a single number (technicalities and generalities aside).
Subtraction of functions is defined in terms of how subtraction of numbers works. We define a new function $h$ by $h(x)=f(x)-g(x)$ for all fixed $x$ values, and then we use the notation $f-g$ to represent $h$.
We have to understand what the formula for variance means: $$ E((X-\mu)^2)= \int_{-\infty}^\infty (x-\mu)^2f_X(x) \ dx\\ $$ where both $x$ and $\mu$ are real numbers, and $f_X(x)$ is the probability function of random variable $X$ (evaluated at $x$, of course). So the actual subtraction only happens between numbers.
Similarly, only numbers are being squared.
This is a very valid question, and one that I didn't really touch until my second time learning undergraduate probability. I am by no means an expert; this is merely the intuition I've developed.
$X$ is a function. What they don't tell you in undergraduate-level probability is that $$X: \Omega \to \mathbb{R}$$ is a "random variable," where $\Omega$ is the sample space of events. You're not particularly interested in the function itself, but values that $\Omega$ map to - i.e., the range of $X$. So technically speaking, it is more correct to write $X(\omega)$ when in undergraduate-level probability, you write $X$.
But this opens up a huge can of worms that, quite frankly, there isn't time to cover in undergraduate-level probability. If we know what $X$ means now, what does $\mu$ actually mean? In the undergraduate sense, you're told some form of $$\mu = \mathbb{E}[X] = \sum_{k \in \mathbb{R}}kf_{X}(k)$$ for $X$ a discrete random variable, but then the following comes to question: what is $f_{X}$, anyway?
Well, you're told that $f_{X}(k) = \mathbb{P}(X = k)$. But what does THIS mean now? We have shown now that $X = k$ is problematic, so $X$ should be $X(\omega)$. But what is $\mathbb{P}$?
It turns out that you need a function $$\mathcal{P}(\Omega) \to [0, 1]$$ - $\mathcal{P}$ denoting the power set. Basically, you are assigning collections of sets (hence why the domain is the power set) in $\Omega$ numbers in $[0, 1]$. In probability in particular, you're told you can take probabilities of unions of sets, intersections, etc....
It turns out that not every collection of sets in $\Omega$ can work this way (i.e., not every set can be unioned, intersected, etc. to give a set with a probability measure), so you need to choose a collection of sets in $\Omega$ (i.e., a subset of $\mathcal{P}(\Omega)$) for which this makes sense. Such a collection of sets is called a $\sigma$-algebra (say $\mathcal{B}$) and we call the triplet $(\Omega, \mathcal{B}, \mathbb{P})$ a probability space. So, as Ian has commented, $$\mathbb{P}: \mathcal{B} \to [0, 1]\text{.}$$- in particular, $\mathbb{P}$ is called a probability measure.
Now back to the original story: what I've revealed to you is that the statement $\mathbb{P}(X = k)$ has many things wrong with it. $X$ should be a function, and furthermore, $X=k$ should be a set. BUT, the range of the values of $X$ are not what actually are used in computing probabilities: NOTICE that the range of $X$ is $\mathbb{R}$ and the domain of $\mathbb{P}$ is $\mathcal{B}$, some collection of subsets of $\Omega$. So indeed, what we need to do is map the range of values of $\mathbb{R}$ to $\Omega$, and THEN apply the probability measure $\mathbb{P}$. So actually, $$\mathbb{P}(X = k) = \mathbb{P}\left(X^{-1}(\{k\})\right)$$ where $X^{-1}$ is the inverse image of $X$ (NOT the inverse). And then you have to make sure that $X^{-1}(\{k\})$ is in a $\sigma$-algebra as well.
Finally, we now know that $$\mu = \mathbb{E}[X] = \sum_{k \in \mathbb{R}}k \cdot \mathbb{P}\left(X^{-1}(\{k\})\right)\text{.}$$
Note that I haven't even touched continuous random variables. I know virtually none of the theory involved there.
This material, and much more, is covered in a graduate-level treatment of probability with measure theory.