understanding probability. memoryless property

193 Views Asked by At
  • edited, I'll make a separate question..

With exponential probability, $Pr(X>s+t | X>s) = P(X>t)$

where X is a waiting time for some event.

Now you estimate $p1 = Pr(X>s)$, at time $t0$ When you reach time s and the event not happened yet, so you update your estimate $p2 = Pr(X>s)$ since X is memoryless.

Now your past self could have foreseen the reasoning of the future self's reasoning.

If I don't see X happening for s period, then I'd have to wait for X for another s period with equal probablity p1. $p1 = Pr(X>s+s|X>s)$

Oh, then, I realize I can apply this reasoning more times. $Pr(X<s+t|X>s) = Pr(X<t) $

Since X happening $0<X<dt, dt<X<2dt, 2dt<X<3dt)$ is disjoint, I can add the probabilities to get the probability of union $0<X<3dt$

$p2 = Pr(X<dt)+Pr(X<2dt|X>dt)+Pr(X<3dt|X>2dt) = 3*pr(X<dt) = Pr(X<3dt)=Pr(X<s)$ where $3dt=s$.

Since pdf is thicker in front, I guess $Pr(X>s) =1-p2 < p1$

(This can't be.. The reasoning must be flawed somewhere...)

I guess the assumption that your time horizon is infinite is contradictory to my usual understanding of the world or my usual way of reasoning.

Where you have taken some time, you have fewer time left, but $\int_0^{\inf} $ assumes the other.
And it is stating your world time is freshly reborn with the same probabilitic structure as far as the event is concerned.. (ok this doesn't sound mathematical expression..)

I must feel I can construct some contradiction out of it.
Although I can see the practicality of reasoning memorylessly, and it could also be a non contradictory assumption, I guess I feel at least there must be an alternative perspective on the assumption or interpretation on what assumptions we are making, I want to know if there's other perspective indeed, or my thinking can be proven falsy.

** this part was actually on the front of original post. but I moved here because this part has been solved

I’m trying to understand the memoryless property.

I think I can think about the same concept using binomial distribution.  
Whether the concept I'm contemplating over is the same as memoryless is not so important, I just want to understand the probablistic reasoning.


Suppose you are doing 10 coin tosses with p=1/2.

> experiment1)
> 
>  You tossed 8 times and saw 8 heads,

 what’s the probability of head at 9th toss?
It’s ½ 

I think this is essentially same thing as ‘memoryless property’ (your future event has nothing to do with the past so you can forget it (memoryless) and you better construct your cdf yet again)
Please correct me if I’m wrong. 

What I’m getting at is, it seems like ‘memoryless property’ is the perspective that you want to see the world from. 

To clarify, suppose another experiment. 



> experiment2)
> 
> Coin is already tossed 10 times, but you just don’t know the outcome
> yet. 
> 
> you get to know peak into what was the outcome. You see the first 8
> tosses and they are 8 heads,

now you want to guess what the remaining tosses are. 
You guess it’s one of HH, HT, TH, TT equally likely. 
The same reasoning as above. 

> experiment3)
> 
> The 10 coin toss outcome is written on a ball and put inside a box,
> and you get to choose 8 balls randomly from a box.  
> 
> You pick 8 balls
> and all says heads, now you want to guess what the remaining tosses
> are.

 

I think you should guess it’s one of HH, HT, TH, TT equally likely. 
Because it’s essentially same experiment as before, we just complicated the mechanism of seeing the random event.


But then I see some other perspective keep bothering me.
Namely,

You draw darts board where board area is divided up proportional to Pr(X=k) k=0 to 10. 
You can actually list all 2^10 possibilities and divide the dart board accordingly. 

Your picking 10 balls are just like shooting a dart. And each area is just likely to be hit. 
You know there are more area with 8 Hs than 10 Hs. 

So when the box says you have at least 8 Hs, and asks which area you hit with your dart, 8s or 10s? I think I would answer 8. 
And this answer is different from the conclusion drawn from the reasoning 8 + (HH, HT, TH, TT equally likely), ie 8 or 10 is equally likely.

So where did I fail? Or where did I switch the perspective?

Or is it just our belief that we should adopt the memoryless thinking because we believe world acts that way?

I don’t know much of physics, but I’m just bringing it up to make an analogy in a hope to make my question clear.
If you adopt block universe perspective where everything is already laid out, the alternative reasoning would be correct? 

So this is my question, 
I feel the memoryless property stems from the perspective how you view the world.  
And I wonder if it’s my delusion and there’s an obvious flaw in my reasoning. 


* edit

I happened to find the flaw(?) or way to align the seemingly imcompatible reasoning.  

When you draw 8 Hs from the random box, if you concentrate on the fact those are the first 8 out of 10 from the box, even though it may not be the first 8 coin tosses.  

The space you are to consider is which outcome in the possible 2^10 outcomes of the 10 ball picking you are in, and consider those. 
Again that becomes HH, HT, TH, TT.

So I thought I constructed the example of seemingly contradictory example, but it wasn't so. 

So I try again..
2

There are 2 best solutions below

1
On BEST ANSWER

What you describe with the binomial model is not memorylessness, but rather, independence of events. An event $B$ is said to be independent of another event $A$ if the probability of observing event $B$ does not depend on having observed $A$; i.e., $$\Pr[B \mid A] = \Pr[B]. \tag{1}$$

A simple example of independent events is the flipping of a fair coin. Suppose event $A$ is that the first flip of the coin shows heads, and event $B$ is that the second flip of the coin shows heads. Then because the outcome of the second flip does not depend on the first, the probability of observing heads on the second coin toss does is not influenced by the outcome of the first toss. This is essentially what you are describing in your binomial model.

A pair of events that are not independent might be if $A$ was defined as above, but now $B$ is the event that, among both the first and the second coin tosses, at least one heads is observed. Then clearly $B$ is not independent of $A$: if $A$ occurs, then the probability of $B$ occurring is $1$, since event $A$ means we have already seen one head in the first two coin tosses. Yet if $A$ does not occur (first toss is tails), then the probability of $B$ occurring is $1/2$, since we now require that the second coin toss is heads for $B$ to occur. In mathematical notation, $$\Pr[B \mid A] = 1, \quad \Pr[B \mid \bar A] = 1/2. \tag{2}$$ With a little more thought, we can also see that $\Pr[B] = 3/4$. So this clearly violates $(1)$.

Then if this is not memorylessness, what is? Memorylessness does require independence of events, but it is in fact a much stronger property: informally, it means that the probability distribution of an event does not depend on a previous outcome or observation. A motivating example is in order.

Since we were discussing coin tosses, let's look at a modification of the binomial experiment. Suppose we are not interested in the number of heads we observe in a fixed number of coin tosses (trials), but rather, the number of trials we need to perform until we see a desired outcome of the coin; for instance, the number of times we need to toss the coin in order to see the first occurrence of heads.

So unlike the binomial model, the sequence of outcomes for such an experiment must always look something like this: $$\{T, \ldots, T, H\},$$ where the number of $T$s in the sequence could be $0$ (in which case the sequence is just $\{H\}$); but the common property is that the only occurrence of $H$ is in the last place of the sequence, at which point we stop tossing the coin. Then we count up the number of tosses in the sequence, and call this number $X$. So the sequence $\{T, T, T, T, H\}$ has $X = 5$, and $\{H\}$ corresponds to $X = 1$, and $\{T, T, T, T, T, T, T, T, H\}$ corresponds to $X = 9$.

Then a natural question to ask is, what is $\Pr[X = x]$ for each $x \in \{1, 2, 3, 4, \}$? In other words, what is the probability of each possible outcome of the number of trials needed to observe the first occurrence of heads? If the coin is fair, then $\Pr[X = 1] = 1/2$, because with probability $1/2$, the coin is heads on the first toss. And it is easy to see that $\Pr[X = 2] = 1/4$, $\Pr[X = 3] = 1/8$, and so forth; indeed, $$\Pr[X = x] = 1/2^x, \quad x \in \{1, 2, 3, \ldots \}. \tag{3}$$

Now, suppose you tossed the coin once, and it was tails. What is the probability that the next coin toss is heads? As we discussed earlier, the outcome of the second coin toss is independent of the first, so it is still $1/2$ (for a fair coin). And if you had tossed the coin $10$ times and gotten tails each time, the next coin toss still has a $1/2$ probability of being heads. This is the independence of events at work. But the consequence of this property is a seemingly counterintuitive one: that, if you have tossed the coin some number of times and not yet seen heads, the random number of additional tosses you need to make in order to see the first occurrence of heads is the same as if you had not performed those previous coin tosses!

This is the memorylessness property. In probability notation,

$$\Pr[X > x+y \mid X \ge y] = \Pr[X > x]. \tag{4}$$

Let's take a moment to unpack what this equation $(4)$ means. It means that, given you have tossed the coin $y$ times and not yet observed heads (event $X \ge y$), then the probability that you need a total of more than $x+y$ tosses to observe the first heads is equal to the probability that you need more than $x$ tosses of a coin that you had not yet tossed.

This of course is phrased a little differently, but it is equivalent, since the event $X > x+y$ is equivalent to $X - y > x$; that is to say, the event that you need more than $x+y$ total coin tosses, is the same as saying that you need more than $x$ additional coin tosses after having observed $y$ failures.

So what we can see here is that memorylessness is a sort of consequence of a series of independent and identically distributed events as it relates to a kind of time-to-event random process. It is because the coin does not "remember" what happened before, that allows us to assert that the incremental waiting time (or in this case, the additional number of trials) does not depend on the amount of time we have already waited to observe the stopping event. This is a very strong property: so strong, in fact, that there is a unique discrete probability distribution that obeys it: the geometric distribution. And correspondingly, there is a unique continuous probability distribution that obeys it: the exponential distribution.

0
On

So, talking about tossing a coins: or any indefinite sequence of independent and identical Bernouli trials until a success is obtained...

The count for tosses until the coin lands tails up has a geometric distribution over support $\{1,2,...\}$, and this is a memoryless random variable.

$$X\sim\mathcal{Geo}_1(p)\implies \mathsf P(X > m+n\mid X > m)=\mathsf P(X > n)$$

When given that at least $m$ consecutive heads have occurred, the conditional probability that at least a further $n$ consecutive heads beyond that, is equal to the unconditioned probability of $n$ consecutive heads will occur.

The intuition behind this is exactly as you outlined (though it is not based on a binomial distribution, rather a geometric one). It is precisely because the individual trials have independent and identically distributed success rates.