Low storage 3rd order Runge-Kutta scheme

589 Views Asked by At

I'm looking for a 3rd order Runge-Kutta scheme of order 3 (or higher) with low-storage, which means that not all intermediate results must be stored concurrently. I found an old paper which presents exactly this. The paper is A Kutta Third-Order Procedure for Solving Differential Equations Requiring Minimum Storage, by S.Conte and R.Reeves and appeared back in 1956 and is only 4 pages long.

In section 3 of that article the algorithm is depicted in a way that makes clear that just two arrays are sufficient to store the initial value, the intermediate values and the result (of course these arrays are overwritten multiple times). The algorithm has 6 steps:

  1. Have initial solution in array B : $B \leftarrow y$
  2. Compute k0 : $A \leftarrow h \cdot f(t; B)$
  3. Compute y1 : $A \leftarrow B + a \cdot A$
  4. Compute k1 : $B \leftarrow h \cdot f(t+a; A)$
  5. Compute y2 : $A \leftarrow A + r \cdot B$
  6. Compute y3 : $B \leftarrow A + (b-r)*B + c \cdot k_2$

Now, step 6 cannot be performed before computing $k_2$. To do so with just two arrays one could update the array A by doing $A = A + c \cdot h \cdot f(t+n; A)$, which cannot be computed in-place if f needs to access values in the array other than the one it is computing (e.g. in case it needs to compute some finite-difference derivative of the array) [unless some kind of programmable cache is present in the computer which for sure did not exist back then].

My question is then how do the author plan to perform the step 6 of the integration with just two arrays. Am I missing something obvious?

P.S.: somebody should create the tag "Runge-Kutta" or similar. There are many questions related to this kind of numerical integration, but I cannot create this tag because I don't have enough reputation.

2

There are 2 best solutions below

0
On BEST ANSWER

Probably the problem can be solved differently than I initially thought. We can split step 6 into two sub-steps:

6a. Update B : $B \leftarrow (b-r) \cdot B+c \cdot h \cdot f(t+n; A) \quad = \quad (b-r) \cdot k_1 + c \cdot k_2$

6b. Update B with final solution : $B \leftarrow A + B \quad = \quad y + a \cdot k_0 + r \cdot k_1 + (b-r) \cdot k_1 + c \cdot k_2 ~ = ~ y +a \cdot k_0 + b \cdot k_1 + c \cdot k_2$

0
On

In more modern conventions, of the third order procedure \begin{array}{c|ccc} 0&\\ c_1&a_{10}\\ c_2&a_{20}&a_{21}\\ \hline &b_0&b_1&b_2 \end{array} it is first demanded that $a_{10}=c_1=a_{20}=b_0$, so that the input $Y$ can be replaced with $Y+a_{10}k_0$, where $k_0=hf(Y)$, without needing the original $Y$ later.

Now the "trick" of the implementation of the method is to sacrifice time for space, as was implicitly declared when in the start of the article it was stated that the system is given as $N$ independent functions $f_i$ that can be separately evaluated. One could amend that to one function per particle, so that some intermediate results, like distances and their powers in molecular dynamics simulations, need only be computed once (or twice).

While stages 0 and 1, step 1 to 4, incl. 5, can be computed in parallel, in step 6 the sequential computation of the ODE rhs functions becomes important

B=Y
A=h*f(Y)
A=B+a_{10}*A
B=h*f(A)
A=A+a_{21}*B
for i in range(N):
    B[i] = A[i] + (b_1-a_{21})*B[i] + b_2*h*f[i](A)
Y=B  

Thus there is no need for additional memory space above the usual temporary registers to evaluate multi-term arithmetic expressions. B[i] is not used again in its role of holding $k_{2i}$, so this in-place computation is valid.

The first and last step just change the variable name, signifying entering from the outer logic of the time loop into the inner logic of the method step and returning. One could retain the name $Y$ and rename $A$ to $Z$ without needing to change the method.

As to the question, indeed if the ODE system function were to be evaluated all-at-once, one would need an additional memory bank to hold its result $k_2$.