probability of number of comparisons of randomized quicksort

208 Views Asked by At

Let's assume we have an array of length $5$ which contains pairwise different integers. The subcript denotes the order of the respective integer, so $i_1<i_2<i_3<i_4<i_5$.

We apply the randomized quicksort algorithm to the unordered array and would like to know the probability $P(A_{24})$ that two integers $s_2$ and $s_4$ are being compared. (Note that every item in the array is chosen as pivot with equal probability)


In our lecture the professor simply says that if we think about it for a while it becomes trivial that $P(A_{24})=\frac{2}{4-2+1}=\frac{2}{3}$. However it didn't....


I would like to have a mathematical frame to argue that $P(A_{24})=\frac{2}{3}$.

My attempt:

So far I have only assumed that we already have a discrete probability space $(\Omega,P)$ where $\Omega=\{i_1,i_2,i_3,i_4,i_5\}$ and $B_i$ denotes the event that $s_i$ has been chosen as pivot element. So I simply collect all the events where $s_2$ or $s_4$ are the pivot at some stage during the algorithm and get \begin{align*} &P(A_{24})=\\ &P(B_2)+P(B_4)+P(B_1)P(B_2\mid B_1)+P(B_5)P(B_2\mid B_5)+P(B_5)P(B_4\mid B_5)+P(B_1)P(B_4\mid B_1)\\ &+P(B_1)P(B_5\mid B_1)P(B_2\mid B_1\cap B_5)+P(B_5)P(B_1\mid B_5)P(B_2\mid B_5\cap B_1)\\ &+P(B_1)P(B_5\mid B_1)P(B_4\mid B_1\cap B_5)+P(B_5)P(B_1\mid B_5)P(B_4\mid B_5\cap B_1)\dots \end{align*} If we use some intuition and the fact that every element has the same probability to be chosen as pivot we get (conditional) probabilities \begin{align*} &\dots=\frac{1}{5}+\frac{1}{5}+\frac{1}{5}\cdot\frac{1}{4}+\frac{1}{5}\cdot\frac{1}{4}+\frac{1}{5}\cdot\frac{1}{4}+\frac{1}{5}\cdot\frac{1}{4}\\ &+\frac{1}{5}\cdot\frac{1}{4}\cdot\frac{1}{3}+\frac{1}{5}\cdot\frac{1}{4}\cdot\frac{1}{3}+\frac{1}{5}\cdot\frac{1}{4}\cdot\frac{1}{3}+\frac{1}{5}\cdot\frac{1}{4}\cdot\frac{1}{3}\\ &=\frac{1}{5}+\frac{1}{5}+\frac{1}{5}+\frac{1}{15}=\frac{2}{3}. \end{align*}

Though this seems much clearer to me why we get $\frac{2}{3}$ I am not sure about the actual definition of the probability space. How would you set up an appropriate probability space in order to derive $P(A_{24})$? Or is there a simpler approach to show why $P(A_{24})=\frac{2}{3}$?

1

There are 1 best solutions below

7
On BEST ANSWER

Between $i_2$ and $i_4$ there is only one number, $i_3$. If it is chosen as a pivot before $i_2$ or $i_4$ are, then $i_2$ and $i_4$ won't be compared with each other (they will be compared with $i_3$ and placed on the opposite sides of the pivot). So the probability of $i_2$ and $i_4$ not being compared is the probability of $i_3$ being chosen as pivot first out of the set $\{i_2, i_3, i_4\}$, which is $\frac{1}{3}$. By complement, the probability of them being compared is $\frac{2}{3}$.


Edit: here is a more formal version of the argument:

Let $T= \{i_1, i_2 \dots i_5\}$, $\;S=\{i_2,i_3,i_4\}$.

Let $p_1$ be the first pivot chosen.

Denote $P_T(A_{24})$ to be the probability that the comparison $A_{24}$ occurs in the quicksort of a random ordering of $T$. Ie. we consider the quicksort of a random ordering of $T$ as an implicit condition for our events to happen at all.

We will take an inductive approach to the general formula. We know that $P_S(A_{24}) = \frac{2}{3}$ (quicksort of a random ordering of $S$). Additionally, assume we proved $P_M(A_{24}) = \frac{2}{3}$ for any $M$ which has $S\subset M$ and $|M| < |T|$.

We have:

$$P_T(A_{24}) = P_T(p_1 \in S) \; P_T( A_{24} \;|\; p_1 \in S ) + P_T(p_1 \notin S) \;P_T( A_{24} \;|\; p_1 \notin S) $$

Let us compute the terms of this formula.

$P_T(p_1 \in S) = \frac{3}{5}$

$P_T(p_1 \notin S) = (1 - P_T(p_1 \in S)) = \frac{2}{5}$

These only important thing about the above branch condition probabilities is that they always sum to 1. Our main aim is to show that: $$ P_T( A_{24} \;|\; p_1 \in S ) = P_T( A_{24} \;|\; p_1 \notin S) = \frac{2}{3}$$

If we show that these two are equal, the branch conditions $P_T(p_1 \in S)$ and $P_T(p_1 \notin S)$ sum up by common factor.

From the initial argument we can see that $P_T( A_{24} \;|\; p_1 \in S ) = \frac{2}{3}$. (the first pivot being in $\{i_2,i_3,i_4\}$ makes it easy to compute). What remains is to show that $P_T( A_{24} \;|\; p_1 \notin S ) = \frac{2}{3}$ .

We have:

$$P_T( A_{24} \;|\; p_1 \notin S ) =\\ P_T(p_1 < i_2 \;|\; p_1 \notin S ) \;P_T( A_{24} \;|\; p_1< i_2 ) + \\+\;P_T( i_4 < p1 \;|\; p_1 \notin S ) \;P_T( A_{24} \;|\; i_4 < p_1 )$$

As before, $P_T( p_1 < i_2 \;|\; p_1 \notin S)$ and $P_T(i_4 < p1 \;|\; p_1 \notin S)$ are complementary events that sum to 1, we are only interested in the other factors.

$$P_T(A_{24}\;|\; p_1<i_2) = \sum_{j < i_2} P_T( p_1 = j\;|\; p_1 < i_2 ) P_T( A_{24} \;|\; p_1 = j)$$

Same argument, $P_T( p_1 = j\;|\; p_1 < i_2 )$ sum to 1, we are interested in $P_T( A_{24}\;|\; p_1 = j)$. Given $j < i_2$ we have:

$$P_T( A_{24} \;|\; p_1 = j) = P_{\{x\in T \;|\; x > j\}}(A_{24})$$

Ie. if the first pivot is chosen such that $S$ will fall on the right side of the pivot, what remains is the probability of $A_{24}$ appearing in a quicksort of said right side. However, the set $ L = \{x\in T | x > j\} $ obeys $S \subset L$ and $|L| < |T|$ . Therefore, by inductive hypothesis we have $P_{\{x\in T | x > j\}}(A_{24}) = \frac{2}{3}$ for $j< i_{2}$. We can do a similar thing for $$P_T( A_{24} \;|\; p_1 = k) = P_{\{x\in T \;|\; x < k\}}(A_{24})$$ where $k > i_4$ .

These correspond to the cases where all elements of $S$ fall on the right or the left side of the first pivot, respectively. They all reduce to $\frac{2}{3}$, except for the branch condition probabilities, which always sum to one. So the final probability is $\frac{2}{3}$.