Deriving P(X=x) of unknown distribution

1.3k Views Asked by At

I have been having difficulty in deriving the probability function of unknown distributions. Once I look at the solutions I understand the procedure however I cannot think of the initial step.

Are there tricks which can be used or is it only intuition?

Here are a couple of examples, no need to solve them as I have the solutions, they are just for reference:

  1. Suppose a fair die is tossed three times. Let X be the largest of the three faces which appears. Find the probability function of X as a formula.
  2. Consider a sequence of independent tosses of a fair coin. Let the random variable X denote the number of tosses needed to obtain the first head. Determine the probability function of X and verify it satisfies the necessary conditions for a valid probability function.

Thanks to anyone who responds :) and I wish you an early Merry Christmas!

2

There are 2 best solutions below

0
On

For example 1 it's easier to to do the cumulative probabilities then difference them at the end. $$ Pr(X \le n) = Pr(\text{each roll} \le n) = \left({n\over6}\right)^3 $$ Then $$ Pr(X=n) = P(X \le n) - Pr (X \le n-1) = \left({n\over6}\right)^3 - \left({n-1\over6}\right)^3 $$ $$ = {3n^2 - 3n + 1\over 216} $$ Same approach works for example 2. Try it and see!

0
On

Strategies: Try to match the problem to specific distribution you know about. Both problems are based on repeated Bernoulli trials. The second random variable is geometric. Whenever the max or min of several random variables is mentioned (as in the first problem) consider finding the CDF and the getting the PDF from the CDF.

That goes for continuous distributions as well. For example, the minimum of several exponential random variables is again exponential (with rate equal to the sum of rates of the constituent random variables).

You might want to learn to use R or some other kind of statistical software, especially if there are messy probability computations for well-known distributions such as normal, binomial, Poisson, exponential, and so on. (R is very good free software available at www.r-project.org, and there are blogs online about its use.) Maybe you have access to Matlab or Mathematica at school, but they are too expensive for most students to buy.

Software will not give you a proof or a general formula, but can be useful checking results if you're unsure of the answer.

Here is a simple simulation from R based on a million of your 3-die experiments, checking the formula in @ScottBurns nice Answer (+1), with two to three place accuracy.

m = 10^6;  n = 3;  die=1:6
x = sample(die, m*n, rep=T)
DTA = matrix(x, nrow=m)   # each row of this m x n matrix has 3 faces
w = apply(DTA, 1, max)    # vector of m maximums
table(w)/m                # approx dist'n table--used to make histogram
w
       1        2        3        4        5        6 
0.004643 0.032405 0.088087 0.170723 0.283062 0.421080 

hist(w, br=(0:6)+.5, prob=T, col="wheat", main="Sim. Dist'n of Max on 3 Dice")
i = 1:6;  pdf = (i/6)^3-((i-1)/6)^3  # exact values--dots atop histogram bars
points(i, pdf, col="blue") 

rbind(i, pdf)  # print exact PDF
          [,1]       [,2]       [,3]      [,4]      [,5]      [,6]
i   1.00000000 2.00000000 3.00000000 4.0000000 5.0000000 6.0000000
pdf 0.00462963 0.03240741 0.08796296 0.1712963 0.2824074 0.4212963

The histogram tallies the (relative) frequencies of the simulation of your distribution, the open dots atop histogram bars are exact values. It seems there is about two-place accuracy (roughly the resolution of the graphic image).

enter image description here