Approximating $\sum_{i,j=1}^n (h_i-h_j)^2$ where $h_i=\left(1 - \frac{1}{i}\right)^{2 s}\frac{1}{i}$

96 Views Asked by At

I'm looking to approximate the behavior of the following quantity, as a function of $n$ and $s$

$$f(s,n)=\frac{1}{n^2 (n + 2)}\sum_{i,j=1}^n (h_i-h_j)^2$$ where $$h_i=\left(1 - \frac{1}{i}\right)^{2 s}\frac{1}{i}$$

For $s=0$, this is well approximated by $\frac{\pi^2}{3n^2}$, derived here. How does this behave for $s>0$?

enter image description here

notebook

Background: This is the variance of loss after applying $s$ steps of gradient descent and learning rate 1 to minimize quadratic form with eigenvalues $1,\frac{1}{2},\frac{1}{3},\ldots$ starting with random initial point on the unit circle.

enter image description here

2

There are 2 best solutions below

3
On

One approach is to write \begin{align}g(n,s)&=\frac12n^2(n+2)f(n,s)=\frac12\sum_{i,j=1}^n (h_i-h_j)^2\\&=n\sum_{k=1}^n\left(1-\frac1k\right)^{4s}\frac1{k^2}-\left(\sum_{k=1}^n\left(1-\frac1k\right)^{2s}\frac1k\right)^2\\&=n\sum_{r=0}^{4s}\binom{4s}r(-1)^rH_n^{(2+r)}-\left(\sum_{t=0}^{2s}\binom{2s}t(-1)^tH_n^{(1+t)}\right)^2\end{align} where $H_n^{(m)}$ is the generalised harmonic number. Use Euler-Maclaurin to derive the result $$H_n^{(m)}\sim\zeta(m)+\frac{n^{1-m}}{1-m}+\frac{n^{-m}}2-\sum_{p=1}^\infty\binom{m+2p-1}{2p}\frac{B_{2p}n^{-m-2p+1}}{m+2p-1}$$ from which we can easily cut off higher-order terms. Up to ${\cal O}(n^{-3})$, we have \begin{align}g(n,s)&\sim\small n\left(\sum_{r=0}^{4s}\binom{4s}r(-1)^r\zeta(2+r)-n^{-1}+\frac{n^{-2}}2-\frac{n^{-3}}6-4s\left(-\frac{n^{-2}}2+\frac{n^{-3}}2\right)+\frac{4s(4s-1)}2\left(-\frac{n^{-3}}3\right)\right)\\&\quad\small-\left(\log n+\gamma+\frac{n^{-1}}2-\frac{n^{-2}}{12}+\sum_{t=1}^{2s}\binom{2s}t(-1)^t\zeta(1+t)-2s\left(-n^{-1}+\frac{n^{-2}}2\right)+\frac{2s(2s-1)}2\left(-\frac{n^{-2}}2\right)\right)^2\\&\sim \small n\sum_{r=0}^{4s}\binom{4s}r(-1)^r\zeta(2+r)-1+\frac{(4s+1)n^{-1}}2-\frac{(4s+1)^2n^{-2}}6\\&\quad\small-\left(\log n+\gamma+\sum_{t=1}^{2s}\binom{2s}t(-1)^t\zeta(1+t)+\frac{(4s+1)n^{-1}}2-\frac{(12s^2-6s+1)n^{-2}}{12}\right)^2.\end{align} Thus \begin{align}g(n,s)&\sim a_1n-(1+a_2^2)+(b_1-2a_2b_2)n^{-1}+(c_1-2a_2c_2-b_2^2)n^{-2}\\&\quad-\log^2n-2(a_2+b_2n^{-1}+c_2n^{-2})\log n\end{align} where \begin{align}a_1=\sum_{r=0}^{4s}\binom{4s}r(-1)^r\zeta(2+r)&\quad b_1=\frac{4s+1}2&c_1=-\frac{(4s+1)^2}6\\a_2=\gamma+\sum_{t=1}^{2s}\binom{2s}t(-1)^t\zeta(1+t)&\quad b_2=\frac{4s+1}2&c_2=-\frac{12s^2-6s+1}{12}\end{align} so that $f(n,s)\sim2n^{-3}g(n,s)$.

0
On

This is an outline of asymptotic expansion for 'large' $s$ that has some advantage over that of TheSimpliFire's. I'll also work with a better normalized quantity, $$T(s,n):=\frac{1}{n} \sum_{i,j=1}^n ((1-1/i)^{2s}/i - (1-1/j)^{2s}/j )^2 $$ The key approximation I'll use, good for large $s,j$ is $$ (1-1/j)^{2s} \sim e^{-2s/j}(1-s/j^2) $$ Then with elementary series manipulations $$\frac{1}{2} T(s,n)\sim \sum_{j=1}^n e^{-4s/j}/j^2(1-s/j^2)^2 - \frac{1}{n} \Big(\sum_{j=1}^n e^{-2s/j}/j(1-s/j^2) \Big)^2 $$ Since $n$ is large, interpret each term as a Riemann sum, $$\frac{1}{2} T(s,n)\sim I_1(s,n) - \frac{1}{n}\big(I_2(s,n)\big)^2$$ $$ I_1(s,n) = \frac{1}{n} \int_0^1 e^{-4s/(nx)}(1-s/(nx)^2)^2\frac{dx}{x^2} $$ $$ I_2(s,n) = \int_0^1 e^{-2s/(nx)}(1-s/(nx)^2)\frac{dx}{x} $$ The integrals can be done in closed form. $$ I_1(s,n) = e^{-4s/n}\ \frac{n^4(32s^2-8s+3) + n^3(-32s^2+12)+...}{128n^4 \ s^3} $$ $$ I_2(s,n) = \Gamma(0,2s/n)-e^{-2s/n} \ \frac{n+2s}{4ns} $$

There are more terms to $I_1,$ but I kept only the terms to $\cal{O}(1/n).$ In $I_2,$ the incomplete gamma function appears, and this one can be written in terms of the exponential integral Ei.

As an example, let the approximation to $T(s,n)$ be that above, $$ \hat{T}(s,n)/2 = e^{-4s/n}\ \frac{n^4(32s^2-8s+3) + n^3(-32s^2+12)}{128n^4 \ s^3} - \frac{1}{n} \Big(\Gamma(0,2s/n)-e^{-2s/n} \ \frac{n+2s}{4ns} \Big)^2 $$

Then $T(3, 200)=0.06200$ while $\hat{T}(3,200)=0.06212$ and $T(2, 200)=0.11099$ while $\hat{T}(2,200)=0.110515.$ Both of these are in very good agreement for such a crude approximation, and $s$ is not even that large! The scaling of $s/n$ is convenient. Also note that TheSimplifiFire's approximation has a $\log^2(n)$ term, and that should come from my $\Gamma(0, 2s/n)$ when it's asymptotic expansion is used.

I have not proven that the approximation in the second equation is optimal for this use and I'm surprised it works as well as it does. I suggest lots of numerical verification to see if this approximation suits your purposes.