This question is a follow up to the question that I asked in Stack Overflow. I now have a hint on the origin of the problem, but I still can't model it, and I was wondering if someone knows a way to do so.
The problem:
I generate 100,000 random matrices $11\times 11$ and I want to know which part of the random matrices has greater numbers. To do so I use a sliding window of $3\times 3$ that moves one cell at a time (overlapping windows, that are always inside the random matrix).
For each of the 100K random matrices I store the position of the sub-matrix with highest sum. I finally plot the cumulative result of the 100K highest sums.
This is the script I use, and the result I obtain:
from random import random
from matplotlib import pyplot as plt
import numpy as np
size = 11
positions = np.zeros((size, size))
for _ in range(100000):
matrix = [[random() for _ in range(size)] for _ in range(size)]
max_value = 0
max_coord = 0, 0
for beg in range(1, size - 1):
for end in range(1, size - 1):
suma = sum(matrix[i][j]
for i in range(beg - 1, beg + 2)
for j in range(end - 1, end + 2))
if suma >= max_value:
max_value = suma
max_coord = beg, end
positions[max_coord] += 1
plt.imshow(positions[1:10,1:10], origin='lower')
plt.colorbar()
The explanation I got was that the cells on the edges and, in particular in the corners visited less visits from the sliding window, and thus may have a more directed contribution to their corresponding sub-matrices.
this bellow shows how many times a cell is visited by a sliding window:

And this matches some how the previous result.
BUT, in order to contrast better the result I continued with my simulation and went beyond the 100K (>300M), this is the result:
It seems that there are more categories, even in regions where the sliding window passes the same number of times.
If we plot the distribution of counts by cell:
plt.figure(figsize=(12,8))
vals, icoord, jcoord = zip(*sorted([(v, i, j) for i, l in enumerate(positions[0:9,0:9]) for j, v in enumerate(l)]))
vals = [i / sum(vals) for i in vals]
color_list = [tuple(c) for c in plt.cm.tab10(np.linspace(0,1, 10))]
colors = [color_list[0]]
ncolor = 0
plt.plot(vals, color='grey', alpha=0.5, lw=1)
for n, (a, b) in enumerate(zip(vals[:-1], vals[1::])):
if b - a > 0.00002:
ncolor += 1
colors.append(color_list[ncolor])
plt.scatter(range(len(vals)), vals, color=colors, alpha=0.75)
plt.axhline(1 / 81, color='black', ls='--', alpha=0.7)
plt.text(0, 1 / 81, 'expected', color='black', alpha=0.75)
plt.xlabel('sorted cells')
plt.ylabel('count per cell')
plt.gca().yaxis.grid()
we obtain this:
I highlighted in different colours what seems to be clearly defined categories (I think that with more simulation I can find more categories).
As final dumb check I coloured previous matrix with these categories and check if the result was symmetric:
plt.figure(figsize=(5.28, 5.3))
plt.scatter(icoord, jcoord, c=colors, s=1000, marker='s')
I have now done enough simulations to correct for this effect in my analysis, but I am still very curious about if there could be some analytical solution to this problem.




