Expected number of items looked at

2.4k Views Asked by At

Currently failing at basic probability…

Given a sequence of items, linear search means looking at each in turn and seeing if it's the one we're looking for. (I.e. in the worst case, it's the last item in the sequence or not in the sequence at all and we'll have to look at all items to find it/determine it's not there.)

How many elements of the input sequence need to be checked on average, assuming that the element being searched for is equally likely to be any element in the array?

Ok, we know two things:

  • the element we're looking for is in the array
  • P[the i-th item is the one we're looking for] = $\frac{1}{n}$ for all $i\in\{1,..,n\}$

And I think I'm supposed to get $\frac{n+1}{2}$ or $\frac{n}{2}$ or something thereabout.

Let $X$ be the number of items the linear search algorithm is looking at.

$$\begin{align} P[X=1] &= \frac{1}{n} \\ P[X=2] &= \left(1-\frac{1}{n}\right)\frac{1}{n} \\ P[X=3] &= \left(1-\frac{1}{n}\right)^2\frac{1}{n} \\ & \ \ \vdots \\ \end{align}$$

So

$$\begin{align} \mathbb{E}(X) &= \sum_{i=1}^n i\left(1-\frac{1}{n}\right)^{i-1}\frac{1}{n} \\ &= \frac{1}{n}\sum_{i=1}^n i\left(1-\frac{1}{n}\right)^{i-1} \\ \end{align}$$

That's not something I want to solve myself, so I gave it to Wolfram Alpha and got

$$\left(1-2\left(\frac{n-1}{n}\right)^n\right)n$$

This doesn't look particularly close.

Where did I go wrong?

5

There are 5 best solutions below

10
On BEST ANSWER

Assuming that there is exactly one item of interest: $$P[X=i]=\frac1n\text{ for }1\le i\le n$$ $$E[X]=\sum_{i=1}^n\frac in=\frac1n\sum_{i=1}^ni=\frac{n(n+1)}{2n}=\frac{n+1}2$$ If $P[X=i]=\left(1-\frac1n\right)^{i-1}\frac1n$ instead then there is a geometric distribution, which can be realised as (say) there are $n$ rolls of an $n$-sided die and we are searching for the first 1 rolled. Here there is an extra $\left(1-\frac1n\right)^n$ probability of doing $n$ comparisons and not finding the desired item, leading to the expectation in this case as $$\left(1-2\left(\frac{n-1}n\right)^n+\left(1-\frac1n\right)^n\right)n$$ $$=\left(1-\left(1-\frac1n\right)^n\right)n$$ which is different from $\frac{n+1}2$ because the desired element may appear more than once in the array.

0
On

Your calculations for $P(X=2), P(X=3)$, etc are incorrect.

For instance you should have, $P(X=2)=P(X=2\mid X\ne 1)\cdot P(X\ne 1)=\frac{1}{n-1}\cdot\frac{n-1}{n}=\frac1n$.

In fact as noted by others $P(X=k)=\frac1n$ for $k=1, 2,\cdots, n$.

0
On

If you wish to include the possibility that the desired object isn't in the array:

Denote by $\psi$ the probability that the desired object isn't in the array. Let $p_i$ denote the probability that it is in slot $i$. Then, since all the $p_i$ are equal, we must have $$\sum_{i=1}^np_i=1-\psi\implies p_i=\frac {1-\psi}n$$

For every $i<n$ the probability that you look at exactly $i$ objects in your search is $p_i$. For $i=n$ the probability is $p_i+\psi$. Thus in this case the expected value is $$\sum_{i=1}^n i\times \frac {1-\psi}n+n\times \psi=\frac {(n+1)(1-\psi)}2+n\psi=\frac {n+1+n\psi-\psi}2$$

Sanity checks: Note that if $\psi=0$ this reduces to the formula we know. Then again, note that if $\psi =1 $ this reduces to $n$, as it should.

0
On

Let us assume that the key is present in the list. The expectations of the "cost" of finding the key equiprobably in every position $k$ are $\dfrac kn$, the sum of which is

$$\frac{n(n+1)}2\frac1n=\frac{n+1}2.$$

The correction for the possibility of the key being absent with probability $q=1-p$ yields

$$p\frac{n+1}2+qn.$$

0
On

This is a matter of conditionality.

In your calculation, you took $1/n$ to be the probability that the element is in the $i$th position under the condition that it was not in any of the positions before $n$. If this were the case, then your calculation of the probabilities for $i\geq2$ would be correct. Unfortunately though, the probabilities don't sum up to one.

Instead, the probability to find the element in the $i$th position is precisely $1/n$ for every $i\in\{1,\dots,n\}$. The probabilities are given, and there is no need to find them. The probabilities are unconditional; events for $i<j$ have no bearing on $P[X=j]$. If you want, you can use this information to find the conditional probabilities. They are not needed in this problem, but you will find that they are not $1/n$.

There is two alarming things in your calculations, hinting at a problem:

  1. You assumed $P[X=2]=\frac1n$ and the calculated that $P[X=2]=(1-\frac1n)\frac1n$.
  2. The probabilities don't sum up to one although they should.