I stumbled on a relationship between ln(x) and estimated probability. Can someone help me locate or generate a proof?

85 Views Asked by At

Yesterday, I personally stumbled on the following relationship of ln(x):

Say you have x number of checkboxes, and you randomly pick a position (p) between 1 and x. If the checkbox at position P is empty, check it. If it's already checked, do nothing. How many attempts (a) does it take to check all checkboxes.?

I found that the average number of attempts it takes to check every single checkbox appears to be equal to ln(x) * x

$$a=x \cdot ln(x)$$ $$or$$ $$a=x \cdot (ln(x) + 1)$$

I can't quite tell which. It at least seems to be very close to either after a few iterations using software. I'm assuming this is a known relationship, and was wondering if anyone can point me to a proof.

Here's an image of the observed relationship: Image

The X axis value, multiplied by the Y axis value determines the number of attempts found to check all of the boxes.

  • The blue line indicates the code-generated coefficient to check all of the boxes
  • The blue dotted line indicates the trend line for the random picks, generated by Excel
  • The orange line is a perfect ln(x) of the X axis
  • The green line is a perfect (ln(x) + 1) of the X axis

Here is the code I made to generate random results, using Excel VBA: http://pastebin.com/TYxnPDQz

  • Column 1 will print the number of "checkboxes" (there's no reference of "checkboxes" in code, just using that word in this explanation)
  • Column 2 will print the total number of attempts to check all "checkboxes"
  • The code currently does 64 iterations.
  • The number of "checkboxes" used for each iteration is equal to 2^Iter
1

There are 1 best solutions below

1
On BEST ANSWER

The exact expectation for any $x$ is $$E_x = x H_x = x \sum_{n=1}^x \frac1n$$

$H_x$ is called the $x$-th harmonic number.

Your expressions are both wrong, but really close; the reality lies between them: For large $x$,

$$E_x = x ((\ln x )+ \gamma) + \frac12 - \frac1{12x} + \frac1{120x^3}+ O\left( \frac1{x^5}\right)$$

where $\gamma$ is Euler's constant, which is about $0.58$. Pretty much between 0 and 1.