Using Markov - Chain to find average and probability

1.4k Views Asked by At

Suppose a computer generate a random vector of n positions where each position appears on of the numbers from 1 to n. The generation is performed uniformly on the $n!$ possibilities. In the problem we must guess each position by following these rules

a) You write you guess vector and then discovers how many correct answers.
b) As you guess, the value is informed in position. So whether or not you know exactly what went wrong and occupies one position.

Let $N_{a}, N_{b}$ the number of hits you get in each of the above rules. How do I calculate the averages of $N_{a}$ and $N_{b}$ and the probability of hitting at least 50% of their guesses for $n = 10$

In the case 1) I suppose that if the Markov - Chain is not necessary. Because I think would only need to calculate the probability of 1 hit, 2 hits ... and so on and take the average, however how can I calculates this probability not sure. I appreciate any help. Thanks

2

There are 2 best solutions below

7
On BEST ANSWER

For (a), the average value $N_a$ can be found using linearity of expectation. \begin{align*} \mathbb{E}_a(n) &= n\cdot \frac{1}{n} = 1 \end{align*}

and for $n=10$, the probability of atleast half of them to be correct is: \begin{align*} \mathbb{P}_a\left(\text{at least 5 correct positions}\right) &= \sum_{i=5}^{10}\frac{\binom{10}{i}\cdot !(10-i)}{10!} \\ &= \frac{4421}{1209600} \approx 0.00365492724867725 \end{align*}

where the notation $!i$ indicates the number of derangements.

For (b), I'm going according to Michael's interpretation that we guess numbers one after another, and our guess depends on previous decisions.

Expectation seems to have the recurrence:

\begin{align*} \color{#c00000}{ \mathbb{E}_b(n)} &= \color{#c00000}{\mathbb{E}_b(n-1) + \frac{1}{n}} \\ \color{#c00000}{\mathbb{E}_b(1)} &= \color{#c00000}{1} \end{align*} or \begin{align*} \color{#c00000}{\mathbb{E}_b(n)} &= \color{#c00000}{\sum_{i=1}^n\frac{1}{i}=H_n} \\ \color{#c00000}{\mathbb{E}_b(10)} &= \color{#c00000}{\frac{7381}{2520}\approx 2.93} \end{align*}

Update

My previous attempt solving case (b) is incorrect, the corrected version follows:

My simulation was not agreeing with part (b), and I was thinking for a while about the optimal strategy and finally found a paper.

Exactly the same problem and much more has been discussed in this paper by Persi Diaconis and Ronald Graham.

Hence, on using the optimal strategy discussed in theorem 5, i.e. Keep guessing 1 till it's correct, then guess 2 till it's correct and so on,

\begin{align*} \mathbb{E}_b(n) &= \sum_{k=1}^n \frac{1}{k!} \\ \mathbb{E}_b(10) &= \frac{6235301}{3628800}\approx 1.71828180114638 \end{align*}

and

\begin{align*} \mathbb{P}_b(G\ge k) &= \frac{1}{k!} \\ \mathbb{P}_b(G\ge 5) &= \frac{1}{5!} \approx 0.00833 \end{align*}

and this time everything agrees with the simulation.

19
On

One of the problems here is you would like to just treat this as iid random variables such as using bernoulli trials for each answer but the problem they are not independent. Once I answered 3 for question one we then know that question two can not be 3 (thus not independent it gives us information when conditioned) But we could run a quasi simulation in R to get a rough estimate of expected number of correct answers. I will look at case n=10

For Rule A) This is my R code

#This will be the correct answers
#the position is the question while number is what is correct #answer

ans=sample(1:10,10,replace=FALSE)

#I am assuming that our guess vectors are also just uniformly from
#the n! possibilities 
numCorrect=NULL
for(i in 1:1000)
{
guess=sample(1:10,10,replace=FALSE)
correct=which(guess==ans)
numCorrect[i]=length(correct)
}

#By Law of Large Numbers this should be close to true mean
mean(numCorrect)

#This will count how many simulations have 5 or more right answers
geq5=NULL
for(i in 1:length(numCorrect))
{
if(numCorrect[i]>=5)
{
    geq5[i]=1
}
else
{
    geq5[i]=0
}
}

#Again By Law of Large Numbers this should be close to probability
mean(geq5)

From this we get about a average of 1 answered right with a 0.005 probability of answering 50% right

**NOTE I am pretty rusty with R so if anyone sees anything wrong do comment about

Also just quick question what is easiest way copy paste code I tried using pre formatted but I had to indent each line which was tedious