Random variables: How would you explain it to a beginner?

2.5k Views Asked by At

Different types of random variables: (discrete) Binomial, hypergeometric, geometric, Poisson (continuous) Uniform, normal, exponential

Random variables are very useful tools when solving simple and complex problems related to probability. They're used in diverse situations in many different forms, so how should you, for instance, describe in very general terms what they are to a student who is just starting to learn about mathematics?

Not really looking for a formal definition here, but more of a "here is how it's relevant" to your studies and your life kind of 101-deal. Something that even a middle school or high schooler could understand.

3

There are 3 best solutions below

1
On

"Let $X$ be the number of times you can a sum of $7$ when you throw three dice. Then the probability distribution of $X$ is given by $\Pr(X=0)=\text{whatever}$, $\Pr(X=1)=\text{whatever}$, $\Pr(X=2)=\text{whatever}$, $\Pr(X=3)=\text{whatever}$." Etc. Then $X$ is an example of a "random variable". For continuous distributions, speak of the probability that $X$ is between two numbers, rather than the probability that $X$ is equal to some number. In other words, I would not start by stating a precise definition of the concept of "random variable".

3
On

It seems to me that an intuitive application where random variables play an important role may help motivate the concept.

Consider the manager of a customer support center who has to decide how many customer support personnel to hire to man the telephone lines. The number of support personnel is dependent on the number of calls that come in; a number that is likely to vary depending on the time of the day, day of the week etc.

Thus, it seems reasonable to assume that the number of calls in any given time period is uncertain with a range of plausible values (say, the number of calls to arrive at the center per minute can range anywhere between 5 to 10). One way to capture the above scenario is to let $N$ be the number of calls that arrive per unit time period.

In the above scenario, we would call $N$ a random variable as we do now know for sure the value we would observe apriori (i.e., we do not know how many calls would come per minute). Then, we can (depending on the situation) assume that $P(N=5) = 0.2$, $P(N=6) = 0.3$ and so on to capture our uncertainty.

The advantage to the above approach is that a student immediately appreciates the practical application and utility of the concept of random variables. The disadvantage, however, is that it requires a more elaborate explanation as the application needs to be sufficiently realistic.

2
On

Suppose you have a space $\Omega$ where you have a probability defined. The possible outcomes of tossing a coin, for example. Now, imagine you are gambling... for each possible outcome it is defined the amount of money you will get (or loose). This is a random variable!!

The catch is that with a function $f: \Omega \to \mathbb{R}$, you can transport the probability defined for $\Omega$ to a probability in $\mathbb{R}$. So, if for instance you get $10$ bucks when you get heads, but looses $5$ for tails, then you can talk about the probability of loosing $5$ bucks when you toss the coin...

Notice that it does not make much sense to ask the expected "value" for heads or tails. If instead of heads and tails you get a coin with faces coloured green and blue, it is probably meaningless to say that in mean, the expected colour is cyan... On the other hand, if you are talking about loosing and gaining money, it makes sense to talk about the expected amount of money you will gain or loose. And this is the expectation.

It is important to emphasise that a "random variable" is NOT a function that gives randomly different values for the same "input". The amount you get for heads or tails is always the same for a fixed $f$! It is just a way to transport the probability in $\Omega$ to a probability in "amounts" (real numbers), so you can talk about expectation.

Usually one is not interested in the random variable itself... people talk about random variable when they just want to talk about the probability they induce in $\mathbb{R}$. This induced probability is the distribution of the random variable.

Two random variables $f$ and $g$ are independent, for example, when knowing or not the outcome of $f$ (in terms of events: $f \in A \subset \mathbb{R}$), makes no difference in determining the probability of $g$'s outcome.