Expectation of sampling integers from a reciprocal distribution without replacement

Question

Expectation of sampling integers from a reciprocal distribution without replacement

80 Views Asked by Bumbble Comm At 01 Apr 2026 - 6:06

I have a reciprocal distribution with PDF $$\frac{1}{x\ ln{N}}$$ I sample $k$ integers from this distribution in the range $[1,N]$ without replacement. I need to determine the average (expected) number of each integer in $[1,N]$ in my sample.

I came across the multivariate Wallenius' noncentral hypergeometric distribution, which deals with sampling weighted colours of ball from an urn without replacement in sequence. The distribution describes the expected number of each colour $i$ as $\mu_i$ in a vector $\mathbf\mu$, which can be found by solving the following system of non-linear equations $$\left(1-\frac{\mu_1}{m_1}\right)^\frac{1}{\omega_1} = \left(1-\frac{\mu_2}{m_2}\right)^\frac{1}{\omega_2} = \cdots = \left(1-\frac{\mu_c}{m_c}\right)^\frac{1}{\omega_c}$$ $$\sum_{i=1}^c \mu_i=n$$

For my use case, $c=N$, $m_i=1$, $n=k$ and $\omega_i=\frac{1}{i\ lnN}$, so the equations become $$\left(1-\mu_1\right)^{lnN} = \left(1-\mu_2\right)^{2\ lnN} = \cdots = \left(1-\mu_N\right)^{N\ lnN} \tag{1}$$ $$\sum_{i=1}^N \mu_i=k \tag{2}$$ The more general Wallenius' mean is normally approximated through e.g. Newton-Raphson, so I'm hoping that the narrowing of the equations makes them directly solvable. My work so far is as follows:

We can rewrite $(1)$ to put $\mu_i$ in terms of $\mu_j$ $$(1-\mu_i)^{i\ lnN}=(1-\mu_j)^{j\ lnN}$$ Using the identity $a^b=e^{b\ lna}$ $$e^{i\ lnN\ ln(1-\mu_i)}=e^{j\ lnN\ ln(1-\mu_j)}$$ $$i\ lnN\ ln(1-\mu_i)=j\ lnN\ ln(1-\mu_j)$$ $$ln(1-\mu_i)=\frac{j}{i}\ ln(1-\mu_j),\ \ \ N>1$$ $$\mu_i=1-(1-\mu_j)^{j/i}\tag{3}$$ We can then repeatedly substitute $(3)$ into the summation $(2)$ to obtain a formula only in terms of $\mu_i$ $$ \begin{aligned} k&=\mu_1+\mu_2+\cdots+\mu_N\\ &=(1-(1-\mu_i)^{i/1})+(1-(1-\mu_i)^{i/2})+\cdots+(1-(1-\mu_i)^{i/N})\\ &=N-\sum_{j=1}^N (1-\mu_i)^{i/j} \end{aligned} $$ Therefore $$N-k=\sum_{j=1}^N (1-\mu_i)^{i/j}$$ However, I do not know how to proceed. Can this be rearranged for $\mu_i$? Am I overcomplicating things?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Since a ball can either be selected once without replacement in $k$ draws, or not at all, the expected number of times it is drawn is the probability of it being drawn.

You said in comments that you did not want to use simulation. Just to show it is possible, and to allow you to check other calculations you may make, here is an example in R with $n=10$ for all the possible $k$s:

n <- 10
cases <- 10^5
set.seed(2020)
orderedsample <- function(n){unique(sample(n,250*n,p=1/(1:n),replace=TRUE))}
simdat <- replicate(cases, orderedsample(n))
count <- matrix(numeric(n^2), ncol=n, nrow=n)
for (i in 1:n){count[i,] <- table(factor(simdat[1:i,], levels=1:n))}
colnames(count) <- 1:n
rownames(count) <- paste("k=", 1:n, sep="")
count / cases

to give

           1       2       3       4       5       6       7       8       9      10
k=1  0.34305 0.17065 0.11250 0.08565 0.06820 0.05702 0.04878 0.04241 0.03788 0.03386
k=2  0.59220 0.34732 0.24146 0.18280 0.14841 0.12595 0.10735 0.09438 0.08364 0.07649
k=3  0.76348 0.51624 0.37823 0.29484 0.24177 0.20549 0.17734 0.15635 0.13899 0.12727
k=4  0.87436 0.66553 0.51715 0.41541 0.34651 0.30076 0.25883 0.22924 0.20442 0.18779
k=5  0.94059 0.78772 0.65120 0.54302 0.46255 0.40323 0.35356 0.31437 0.28348 0.26028
k=6  0.97631 0.88015 0.77102 0.66755 0.58399 0.51750 0.46126 0.41608 0.37727 0.34887
k=7  0.99167 0.94177 0.86763 0.78511 0.70843 0.64429 0.58418 0.53412 0.48830 0.45450
k=8  0.99803 0.97812 0.94039 0.88480 0.82844 0.77341 0.71887 0.67036 0.62274 0.58484
k=9  0.99976 0.99516 0.98292 0.96018 0.93124 0.89962 0.86237 0.82643 0.78881 0.75351
k=10 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000

This is a simulation so you should not use the later decimal places. But to illustrate how well it does, it is easy to calculate the first row theoretically, which gives

           1       2       3       4       5       6       7       8       9      10
k=1  0.34142 0.17071 0.11381 0.08535 0.06828 0.05690 0.04877 0.04268 0.03794 0.03414

Expectation of sampling integers from a reciprocal distribution without replacement

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in PROBABILITY-DISTRIBUTIONS

Related Questions in SYSTEMS-OF-EQUATIONS

Related Questions in SAMPLING

Trending Questions

Popular # Hahtags

Popular Questions