Yesterday, I personally stumbled on the following relationship of ln(x):
Say you have x number of checkboxes, and you randomly pick a position (p) between 1 and x. If the checkbox at position P is empty, check it. If it's already checked, do nothing. How many attempts (a) does it take to check all checkboxes.?
I found that the average number of attempts it takes to check every single checkbox appears to be equal to ln(x) * x
$$a=x \cdot ln(x)$$ $$or$$ $$a=x \cdot (ln(x) + 1)$$
I can't quite tell which. It at least seems to be very close to either after a few iterations using software. I'm assuming this is a known relationship, and was wondering if anyone can point me to a proof.
Here's an image of the observed relationship: Image
The X axis value, multiplied by the Y axis value determines the number of attempts found to check all of the boxes.
- The blue line indicates the code-generated coefficient to check all of the boxes
- The blue dotted line indicates the trend line for the random picks, generated by Excel
- The orange line is a perfect ln(x) of the X axis
- The green line is a perfect (ln(x) + 1) of the X axis
Here is the code I made to generate random results, using Excel VBA: http://pastebin.com/TYxnPDQz
- Column 1 will print the number of "checkboxes" (there's no reference of "checkboxes" in code, just using that word in this explanation)
- Column 2 will print the total number of attempts to check all "checkboxes"
- The code currently does 64 iterations.
- The number of "checkboxes" used for each iteration is equal to 2^Iter
The exact expectation for any $x$ is $$E_x = x H_x = x \sum_{n=1}^x \frac1n$$
$H_x$ is called the $x$-th harmonic number.
Your expressions are both wrong, but really close; the reality lies between them: For large $x$,
$$E_x = x ((\ln x )+ \gamma) + \frac12 - \frac1{12x} + \frac1{120x^3}+ O\left( \frac1{x^5}\right)$$
where $\gamma$ is Euler's constant, which is about $0.58$. Pretty much between 0 and 1.