I've posted the question in the physics site too.
It is said that
..the largest Lyapunov exponent, which measures the average exponential rate of divergence or convergence of nearby network states.
Lyapunov exponents (LEs) measure how fast nearby trajectories or flows diverge from each other.
Q1: Why does the largest LE measure the average divergence rate, instead of the mean LE?
My thought is that the LEs are somehow eigenvalues of a matrix involved in solving the ODEs $$\tau\frac{dh_i}{dt} = -h_i + \sum_{j=1}^N J_{ij} \phi(h_j),$$ so the solutions would possibly look like a linear combination of $e^{\lambda_i t}$. Since $e^{a t}\gg e^{b t}$ when $a > b$, as $t\to\infty$, the term with the largest LE will be much larger than other terms despite the coefficients, and therefore dominate.
I have not solved the ODEs so I am not sure whether my thought is correct.
How can I strictly solve the equation and answer my question?
This seems to have verified my guess. But the details of calculation are still unclear. It is a non-linear ODE and the solution is possibly more complex than that of a linear ODEs, for which the monotonicity of $e^{\lambda_i t}$ straightforwardly gives the result.
So perhaps the question can be restated as how solutions of non-linear ODEs differ from those of linear ODEs, and whether we can still use the linear algebra method of eigenvalues and eigenvectors to solve the former.
One idea is to linearize the ODEs near the fixed point, with Jacobian (of which LEs are, roughly speaking, eigenvalues).
Even if we can do so, the conclusion seems to be valid only near the fixed point; while here we need to consider $t\to\infty$ for chaos (unstable), and therefore it is almost certain that we would go away from the fixed point.
Q2: Why do the other LEs matter for characterizing chaos?
Q3:
We know $g$ is proportional to (sqrt of) the variance of $J$'s every entry ($J_{ij} \text{~} \mathcal{N}(0, g^2/N)$).
Why is it also positively related to the variance of $h_i$. In other words, why stronger coupling results in stronger neuronal signals, from a math perspective?
(This question is posted here too; with additional references.)
Here is the orginal paper:


Q1: "Average" is a time-average. The orthogonal frame that brings the linearization of a time step into eigenvalue or principal components/SVD normal form is not transported "parallel" with the flow of the ODE. Meaning the frame of one time step, transported forward, is slightly askew to the frame of the next time step. Practically this means that if you start with two close solutions whose difference is in the direction of the most contracting basis vector, then in the next time step the difference will have small components in the directions of all other basis vectors. Generally, on average, over a sufficiently long time span, the fastest growing direction will come to dominate. I'm not sure how they incorporated convergence into this picture.
On the interpretation of the LE computation see also https://scicomp.stackexchange.com/questions/36013/numerical-computation-of-lyapunov-exponent and the discussions there.
Q2: Pure expansion is not chaotic. To stay within or return to some fixed volume you also need compression, as provided by negative LE. Another component to chaos is folding, meaning the flow is not "order preserving" over longer distances like in a linear system. See horseshoe dynamic.
Q3: At the setup of the system $J_{ij}=\frac{g}{\sqrt{N}}z_{ij}$ is randomly chosen, with $z_{ij}$ independent normal. If $g$ and thus $J$ is small (with a large probability), then the dynamic of the first term dominates and the system converges to zero with time scale $\tau^{-1}$. If $h$ stays small, the system is essentially linear, $\tau\dot h=-h+Jh$. The coefficient matrix $-I+J$ can now have positive eigenvalues if $J$ is large enough, for instance $\pmatrix{-1&a\\a&-1}$ has eigenvalues $-1\pm a$. This gives the possibility of not-almost-zero values in the long range. With more dimensions and the non-linear nature of $\phi$, also oscillations become possible. One would have to check if "variation" refers more to amplitude or to rate of change.