The numerical methods for finding (the largest) eigenvalues and (the largest) Lyapunov exponents (LEs) look similar.
The power method is to use the matrix $B$ repetitively to grow a vector $z$, and the expansion/shrinking rate of the vector is the largest eigenvalue: $\frac {u^T B^{m+1}z}{u^T B^{m}z} \approx \lambda_1$, where $u$ is a somehow arbitrarily chosen vector.
The largest LE is computed similarly by finding the growth rate of a vector under the transform of certain Jacobian matrices $D_t$.
Q1:
Is QR decomposition similar to the power method?
i.e. is QR as follows: when we use QR to find the eigenvalues of a matrix $B$, we let $Q_{s+1}=B Q_s$, where $Q_s=[ q_1 \dots q_n]$ and $q_i$'s' are tagent vectors which form a basis for the tangent space;
then we compute $Q_m = B^m Q_0$, and decompose $Q_m$ with $Q_m = QR$ (where $Q$'s columns are unit vectors); and then the diagonal entries of $R$ will be the growth rates of tangent vectors under the transformation of $B$ and therefore the eigenvalues? $^{1}$
What confuses me is that QR seems to be more complicated than is stated above, and involves Hessenburg matrices and Householder reflections. I am not sure how the two play a role in QR.
Q2:
If my understanding of QR 1 is correct, then I am confused by the way we use QR to compute LEs.
Since LEs are the eigenvalues of $\Lambda=(T_t^T T_t)^{1/2t}$, and $T_t = D_{t-1}\dots D_0$, it seems we should compute the eigenvalues of $D_{t-1},\dots, D_0$ respectively and then multiply them; i.e. we need to use QR to compute $Q_{m,t}=(D_t)^m Q_0$ for each $t$ (i.e. we use the same matrix $D_t$ to transform the tangents $Q_0$ repetitively), then $Q_{m,t}=Q_t R_t$, and the iagonal entries of $R_t$ will be the eigenvalues of $D_t$. Then we multiplies the eigenvalues to get the eigenvalues of $T_t^T T_t$, i.e. $(\prod_t R_{t,ii})^2, \forall i \in \{1,\dots, n\}$.$^2$
But what we actually do is to use different $D_t$ ($D_0,D_1, \dots, D_{t-1}$) to transform the tangents $Q_0$, i.e. $Q_{t}=(D_{t-1})^1 (D_{t-2})^1\dots (D_{0})^1 Q_0$. Why can we do this? Without using $D_t$ with sufficient repetitions, it seems we cannot get the eigenvalues of $D_t$, then how can we get the LEs (which in my eyes are the product of eigenvalues of $D_t$. [2])? $^3$
Q3:
Since (no matter we use [2] or [3] to compute LEs) we use $D_t$ ($D_0,D_1, \dots, D_{t-1}$) where $t$ is somehow arbitrarily defined (it is discrete time, while the actual dynamical system is continuous time), the choices of $t$'s possibly matters. Do the time step (the duration between 0, 1, ...,t-2, t-1) matter for the computed value of LEs? If not (as it should be), why?
$\\$
In a word, the three methods look similar but their relations look as messy as intertwined threads to me.
My main reference is 4.8 The eigenvalue problem, in Conte, Elementary numerical analysis.
Related questions: What is integrating a variational equation?, How to understand the largest Lyapunov exponent?.
Supplemental:
What is the meaning of the (blue) text, which looks important?

QR decomposition is just the matrix factorization $QR=A$ with orthogonal $Q$ and upper triangular $R$. The sign structure of the diagonal of $R$ is not fixed, to get it to be all positive requires an extra effort.
QR algorithm is the most basic idea of the Francis eigenvalue algorithm. The Francis algorithm itself is based on a single-shifted QR decomposition (double-shifted for non-real eigenvalues of non-symmetric matrices). The zero shift algorithm iterates $Q_kR_k=A_k$, $A_{k+1}=R_kQ_k$. $A_0$ is either the input matrix $A$ or its Hessenberg form. In the latter case all $A_k$ and $Q_k$ remain in Hessenberg form.
Careful observation gives that in the first columns the algorithm essentially follows the power iteration, if one takes the first $k$ columns it corresponds to a subspace iteration, and starting from the last column it is (approximately?) the inverse power iteration.
Lyapunov exponents The motivation is to produce something like the eigenvalue analysis around stationary points or fixed points of a Poincaré map also for orbits that are not closed but still restricted to some bounded area. In the interesting cases the divergence of the trajectories is exponential, the (state space) Jacobian $J(t,t_0)$ of the propagator $x(t)=P(t,t_0)(x_0)$, rapidly becomes ill-conditioned. Thus the idea to compute these values $\sigma_j(t_{k+1},t_k)$ for small segments $[t_k,t_{k+1}]$ and average their logarithms $\ln(\sigma_j(t_{k+1},t_k))\sim\lambda_j·(t_{k+1}-t_k)$. As a side effect, these exponents no longer belong to global solution curves like in the motivation. For the small local segments one can produce initial points to that the solution difference to the reference solution grows or shrinks approximately according to the singular values of the propagator Jacobian. However, the initial points for the current local segment will be (slightly) shifted from the end points of the last segment, even if only the direction is considered.
Thus the LE over a length of the reference solution that fills the attractor set densely enough, or over several solutions that achieve this, will give numbers that are characteristic for the attractor, but not really connected to the difference of any two solutions.
The heuristic for the usual LE computations seems to be that in the progression of SVD $$U_k\Sigma_kV_k^T=J(t_{k+1},t_k),$$ $k_{k+1}=t_k+h$, the orthogonal bases at $t_k$ are close enough that $U_{k-1}=V_k+O(h^2)$, so that in the QR decomposition $$Q_kR_k=U_{k-1}^TJ(t_{k+1},t_k)U_{k-1}$$ the product $U_{k-1}Q_k$ is close to $U_k$ and the non-diagonal entries of $R_k$ are $O(h^2)$, so that this abbreviated computation is good enough as an approximate SVD.
This indeed looks like the QR algorithm, like a gliding eigenvalue computation. The question remains if the $R$ factors do indeed sufficiently approximate the singular values of the segment.
Note that $$J(t_{k+1},t_k)=I+h\nabla_xF(t_k,x_k)+O(h^2),$$ so that the factors in any decomposition are $O(h)$ close to the identity matrix in all non-trivial entries. Demanding that the other entries are approximately zero thus needs a smaller tolerance, like $O(h^2)$.
In consequence one could say that the iteration should start with an SVD of $J(t_1,t_0)$ or at least a good enough approximation in the above sense of error $O(h^2)$ in the off-diagonal entries of the middle factor. But also starting with identity matrices is not catastrophically wrong. Using the transformation to Hessenberg form for initialization is somewhere in-between. Usually one lets the computation of the orthogonal frame run for some time before starting the averaging of the singular values. This reduces the importance of the initialization.