Maximizing the number of $n$-tuples in $\{1,...,m\}^n$ such that . . .

179 Views Asked by At

This question is motivated by the pattern of data obtained in my answer to the question

$\qquad\;\;$The best $n$-digit password?

I'll restate some definitions . . .

For positive integers $m,n$ with $m\le n$, let $P(m,n)$ be the number of $n$-tuples with each component in $\{1,...,m\}$ such that each of the values$\;1,...,m\;$occurs at least once.

For example, for $n=4$, we have $P(1,4)=1,\,P(2,4)=14,\,P(3,4)=36,\,P(4,4)=24$.

For each positive integer $n$, let $f(n)$ be the least positive integer $m\le n$ such that $P(m,n)$ is as large as possible.

The data obtained so far supports the following conjecture:

    Claim:$\;$If $n$ is a multiple of $4$, then $f(n)={\large{\frac{3}{4}}}n$.

Can anyone prove or disprove this claim?

    Update #$1$:$\;$The claim fails for $n=32$.

Which suggests that I should ask for asymptotic results rather than exact values.

Here's the revised question:

    Question:$\;$Does ${\displaystyle{\lim_{n\to\infty}}}{\large{\frac{f(n)}{n}}}$ exist? If so, what is its value?
    Update #$2$:$\;$Based on @Phicar's valuable observation, I was able to quickly generate a lot more data. The new data makes it evident that the limit in the above question doesn't exist.

Since no answers are yet posted, I'll make one final revision (last one, I promise).

    Revised Question:$\;$What are the values of$\;\,{\displaystyle{\liminf_{n\to\infty}}}\,{\large{\frac{f(n)}{n}}}\;\,$and$\;\,{\displaystyle{\limsup_{n\to\infty}}}\,{\large{\frac{f(n)}{n}}}\;$?
1

There are 1 best solutions below

4
On BEST ANSWER

I don't know what the data you generated looks like, but the limit does exist. In fact we have

$$\boxed{ f(n) \approx \frac{n}{2 \ln 2} } \approx (0.7213 \dots )n$$

which, as a sanity check, is consistent with some quick WolframAlpha plots. See, for example, this plot for $n = 50$, which gives $f(50) = 36$:

ordered Stirling number plot

This plot correctly suggests a much stronger result, namely that the distribution of the numbers $P(k, n), 0 \le k \le n$ is asymptotically Gaussian as $n \to \infty$ (with mean asymptotic to $\frac{n}{2 \ln 2}$, hence the claim about $f(n)$ above). This follows from Theorem IX.8 (the "quasi-powers" theorem) in Flajolet and Sedgewick's Analytic Combinatorics; a full discussion of how this works would take awhile (read the book, it's long but incredible) but let me at least attempt to sketch it.

As mentioned in the comments, we have $P(k, n) = S(n, k) k!$ where $S(n, k)$ is the Stirling numbers of the second kind. The numbers $P$ don't seem to have a common name, but they are A019538 on the OEIS, and $\sum_{k=0}^n P(k, n)$ is known as the ordered Bell numbers.

Our starting point for analysis is the following two-variable generating function

$$\sum_{n \ge 0} \left( \sum_{k=0}^n P(k, n) t^k \right) \frac{x^n}{n!} = \frac{1}{1 - t(e^x - 1)}.$$

This follows from thinking about $P(k, n)$ as counting surjections from a set of size $n$ to a set of size $k$, and in turn thinking of these as "lists of nonempty sets"; see Chapters I, II, and III, but particularly III, of Flajolet and Sedgewick for a longer discussion of how to write down these kinds of generating functions.

Write

$$P_n(t) = \sum_{k=0}^n P(k, n) t^k$$

for the coefficient of $\frac{x^n}{n!}$ above. We can think of this function as an unnormalized probability generating function (the real PGF is $\frac{P_n(t)}{P_n(1)}$) for a certain random variable. The cleanest way to describe this random variable that I can see involves reinterpreting $P(k, n)$ slightly: it can be thought of as counting weak orderings of $n$ elements in which there are $k$ "places," for example different ways that $n$ horses can place in a race that allows ties (so some of the horses are in first place, some in second, etc). Then $P_n(t)$ is the unnormalized PGF of the random variable given by counting the number of places in a random weak ordering of $n$ elements, and we want to study the asymptotic behavior of this random variable as $n \to \infty$.

This can be done using what Flajolet and Sedgewick call meromorphic singularity analysis. For $t \approx 1$ (which turns out to be the only values of $t$ we need to consider) the generating function $\frac{1}{1 - t(e^x - 1)}$ has a unique simple pole nearest to the origin at

$$x = r(t) = \ln \left( 1 + \frac{1}{t} \right)$$

which lets us approximate the generating function above by some multiple of $\frac{1}{1 - \frac{x}{r(t)}}$, and in turn lets us approximate $P_n(t)$ as $n \to \infty$ as (after a little calculation)

$$P_n(t) \approx \frac{n!}{(1 + t) \left( \ln \left( 1 + \frac{1}{t} \right) \right)^{n+1} }.$$

Substituting $t = 1$ tells us the asymptotic behavior of the ordered Bell numbers, namely

$$P_n(1) \approx \frac{n!}{2 (\ln 2)^{n+1}}$$

and dividing out by this asymptotic to get a PGF gives

$$\frac{P_n(t)}{P_n(1)} \approx \frac{2}{(1 + t)} \left( \frac{\ln 2}{\ln \left( 1 + \frac{1}{t} \right)} \right)^{n+1}.$$

This puts us in the "quasi-powers" framework: this PGF looks like the PGF of a sum of i.i.d. copies of the same random variable $X$ (which is given by the $n^{th}$ power of the PGF of $X$), plus a little extra which won't matter asymptotically, and it turns out that one can prove a generalization of the central limit theorem that tells us that as $n \to \infty$ the result is asymptotically Gaussian as in the usual CLT.

A Gaussian peaks at its mean, so it remains only to compute the mean. This can be done by differentiating the above approximate PGF with respect to $t$, then substituting $t = 1$; we get a mean asymptotic to $\frac{n}{2 \ln 2}$ as desired. If you want you can even compute the variance.