How can I simulate on R the total variation distance between the Law of the number of fixed points and Pois(1)?

123 Views Asked by At

I am trying to make a graph on the Total Variation Distance between the Law of the number of Fixed Points of a Random Permutation and the Poisson($\lambda = 1$) distribution. I know that the Total Variation distance is given by ${ ||\mu - \upsilon ||_{TV} } = \frac{1}{2}{\sum_{x \in \Omega} |\mu(x)-\upsilon(x)|}$ or, equivalently by the $max_{x \subseteq \Omega}|\mu(x)-\upsilon(x)|$.

Actually I should make a graph by sampling something (that I do not understand what, yet) where it is possible to see that the TV decrease as N goes to infinity and that the TV curve stays below a bound that is given by $\frac{2^{N+1}}{(N+1)!}$. Here is the formula to obtain the law of the number of fixed points in a random permutation $\pi(x) = {\displaystyle \frac{1}{x!} \sum _{k=0}^{n-x} \frac{{(-1)}^{k}}{k!}}$

I tried to plot something on R but I was told that it was very wrong because I set on the x axis the number of elements to permutate (n) and on the y axis the distance between $\pi$ and pois(1). I understand that I was wrong because I plotted the sigle difference over n and not the sum over all sets, but I do not how I can find all this sets by sampling.

Well, I am very confused because I do not understand how and what I should sample to estimate the TV so any suggestions, comments or corrections to help me to understand it better will be very useful (also only related to the theory). I hope I explained me decently. Thank you very much

1

There are 1 best solutions below

3
On BEST ANSWER

Here is an attempt to find these total variation distances empirically through simulation (100,000 cases each time) and theoretically for permutations of $n$ numbers from $1$ to $20$.

Various notes:

  • as you might expect, simulation noise can make a difference. Even for permuting $n\ge 8$ items, this seems to dominate the calculation with this sample size of $10^5$, and much larger samples do not take things much further
  • the Poisson distribution is unbounded (i.e. can exceed $n$) and you need to account for this in the total variation distance calculation
  • I used a slightly different calculation for the theoretical probabilities for what are rencontres numbers but it gives the same probabilities
  • The suggested bound of $\frac{2^{n+1}}{(n+1)!}$ on the total variation distance looks as if it works for the theoretical probabilities and (if my calculations are correct - not guaranteed) may be a little high

Some useful functions:

match <- function(n){
  sum(1:n == sample(n))
  }
empiricalprobs <- function(n, cases){
  sims <- replicate(cases, match(n))
  table(factor(sims, levels = 0:n)) / cases
  }
theoreticalprobs <- function(n){
  k <- 0:n
  recontresprobs <- round(factorial(n-k)/exp(1)) / 
                          factorial(k) / factorial(n-k)
  recontresprobs[n+1] <- 1/factorial(n)
  names(recontresprobs) <- k
  recontresprobs
  }
poissonprobs <- function(n){
  list(poisprob=dpois(0:n, 1), excesspois=1-ppois(n, 1))
  }
empiricaldist <- function(n, cases){
  pp <- poissonprobs(n)
  ep <- empiricalprobs(n, cases)
  (1/2)*(pp$excesspois + sum(abs(pp$poisprob - ep)))
  }
theoreticaldist <- function(n){
  pp <- poissonprobs(n)
  tp <- theoreticalprobs(n)
  (1/2)*(pp$excesspois + sum(abs(pp$poisprob - tp)))
  }
suggestedbound <- function(n){
  2^(n+1) / factorial(n+1)
  }

The main run with a seed for replication:

maxn <- 20
cases <- 10^5
set.seed(2023)
results <- data.frame(n=1:maxn, empiricaldist=NA, 
                      theoreticaldist=NA, suggestedbound=NA)  
for (n in 1:maxn){
  results[n, 2:4] <-  c(empiricaldist(n,cases), theoreticaldist(n),
                        suggestedbound(n))
  }

results

#    n empiricaldist theoreticaldist suggestedbound
#1   1  0.6321205588    6.321206e-01   2.000000e+00
#2   2  0.4481808382    4.481808e-01   1.333333e+00
#3   3  0.2349373186    2.374740e-01   6.666667e-01
#4   4  0.1000125282    9.951919e-02   2.666667e-01
#5   5  0.0340116566    3.440832e-02   8.888889e-02
#6   6  0.0110115845    1.011936e-02   2.539683e-02
#7   7  0.0047592235    2.589300e-03   6.349206e-03
#8   8  0.0040507706    5.864288e-04   1.410935e-03
#9   9  0.0016283853    1.191500e-04   2.821869e-04
#10 10  0.0008730603    2.195351e-05   5.130672e-05
#11 11  0.0023224029    3.700680e-06   8.551120e-06
#12 12  0.0015428120    5.749430e-07   1.315557e-06
#13 13  0.0014124637    8.283992e-08   1.879367e-07
#14 14  0.0020798946    1.112873e-08   2.505823e-08
#15 15  0.0007483375    1.400403e-09   3.132278e-09
#16 16  0.0014065191    1.657376e-10   3.685033e-10
#17 17  0.0016062019    1.851414e-11   4.094481e-11
#18 18  0.0044949087    1.958317e-12   4.309980e-12
#19 19  0.0027784741    1.967063e-13   4.309980e-13
#20 20  0.0014732789    1.884827e-14   4.104743e-14

Visualising these results (empirical with black o, theoretical with red +, and suggest bound with a blue line) with a log-scale for the total variation distance

plot(results[,1], results[,2], log="y", ylim=c(10^-14,1), pch="o",
     xlab="terms in permutation", ylab="total variation distance")  #sample
points(results[,1], results[,3], col="red", pch="+")                #theory
lines(results[,1], results[,4], col="blue")                         #bound 

enter image description here