Currently failing at basic probability…
Given a sequence of items, linear search means looking at each in turn and seeing if it's the one we're looking for. (I.e. in the worst case, it's the last item in the sequence or not in the sequence at all and we'll have to look at all items to find it/determine it's not there.)
How many elements of the input sequence need to be checked on average, assuming that the element being searched for is equally likely to be any element in the array?
Ok, we know two things:
- the element we're looking for is in the array
- P[the i-th item is the one we're looking for] = $\frac{1}{n}$ for all $i\in\{1,..,n\}$
And I think I'm supposed to get $\frac{n+1}{2}$ or $\frac{n}{2}$ or something thereabout.
Let $X$ be the number of items the linear search algorithm is looking at.
$$\begin{align} P[X=1] &= \frac{1}{n} \\ P[X=2] &= \left(1-\frac{1}{n}\right)\frac{1}{n} \\ P[X=3] &= \left(1-\frac{1}{n}\right)^2\frac{1}{n} \\ & \ \ \vdots \\ \end{align}$$
So
$$\begin{align} \mathbb{E}(X) &= \sum_{i=1}^n i\left(1-\frac{1}{n}\right)^{i-1}\frac{1}{n} \\ &= \frac{1}{n}\sum_{i=1}^n i\left(1-\frac{1}{n}\right)^{i-1} \\ \end{align}$$
That's not something I want to solve myself, so I gave it to Wolfram Alpha and got
$$\left(1-2\left(\frac{n-1}{n}\right)^n\right)n$$
This doesn't look particularly close.
Where did I go wrong?
Assuming that there is exactly one item of interest: $$P[X=i]=\frac1n\text{ for }1\le i\le n$$ $$E[X]=\sum_{i=1}^n\frac in=\frac1n\sum_{i=1}^ni=\frac{n(n+1)}{2n}=\frac{n+1}2$$ If $P[X=i]=\left(1-\frac1n\right)^{i-1}\frac1n$ instead then there is a geometric distribution, which can be realised as (say) there are $n$ rolls of an $n$-sided die and we are searching for the first 1 rolled. Here there is an extra $\left(1-\frac1n\right)^n$ probability of doing $n$ comparisons and not finding the desired item, leading to the expectation in this case as $$\left(1-2\left(\frac{n-1}n\right)^n+\left(1-\frac1n\right)^n\right)n$$ $$=\left(1-\left(1-\frac1n\right)^n\right)n$$ which is different from $\frac{n+1}2$ because the desired element may appear more than once in the array.