(Question at the very bottom)
Def 1. Let $F = \Bbb{Z}_p$ be a finite field. Then an $F^k$-machine is a machine with $k$ input / output memory slots. All computations are done in the field $F$ and in a problem-specific amount of arithmetic registers. Thus algorithms on a $F^k$-machine can be specified with a set $\{P_1, \dots, P_k \}$ of polynomial expressions. For instance the expression $(x + y)(x-y) \not\equiv (x^2 - y^2)$ although their polynomials are the same. Each $P_i$ is a polynomial expression for a polynomial in $F[x_1, \dots, x_k]$. So it's the coefficients of the polynomials that make up the "code". The time complexity $T$ of an algorithm on an $F^k$-machine is defined to be the total operations needed to compute the polynomial expressions as they are given so $(x+y)^k$ would be $T = 1 + \ln(k)$.
Def 2. A deterministic problem is two sets $X, Y$ together with a map $f : X \to Y$.
Def 3. A problem is compatible with an $F^k$-machine if there exists algorithms on a standard machine for encoding and decoding respectively the input and then ouput. That is, $X \xrightarrow{\phi} F^k \xrightarrow{\psi} Y$.
- $\phi, \psi$ must be computable in $O(|x|)$ time where $|x|$ is the size of input $x \in X$. For instance if $X \subset \Bbb{Z}$, $|x| \approx \log_p(x)$.
- There must exist a function $f^* : F^k \to $ itself such that $f = \psi f^* \phi$.
Theorem 1. For any $F^k$-compatible problem $f$, there exists a set of polynomials $\{f^*_1, \dots, f^*_k\} \subset F[x_1, \dots, x_k]$, such that $f(x) = \psi \circ (f^*_1, \dots, f^*_k) \circ \phi(x), \ \forall x \in X$, and an algorithm for computing them in time $O(p^k k^2 ln(p))$. Proof: see here.
Lemma 1. A problem $f: X \to Y$ is computable in time $O(t(|x|)) \iff$ there exists a sequence $X_0 \subset X_1 \subset X_2 \dots = X$ such that for each $X_i, \ f_{X_i} : X_i \to Y$ is a problem computable in time $O(t_i(|x|))$ and each $t_i(|x|)$ is $O(t(|x|))$. Proof: $\Rightarrow$, obvious, take $X_0 = X$. $\Leftarrow$, given any specification $|x| \leq K$, for some $K \in \Bbb{R}$, there is a smallest set $X_i$ such that $x \in X_i, \ \forall |x| \leq K$, and a problem $f_{X_i} \to Y$ that's solvable in $O(t(|x|))$ time. That says that the problem is computable in $O(t(|x|))$ time, by usual standards.
Let's define a C-like language. We've already shown that an $F^k$-machine can compute any deterministic problem for which it is compatible. We want to show that the run times of algorithms run on a standard machine using the $C$-like language are similar to some algorithm on a $F^k$ machine that computes the same problem.
Let the language be a sequence of statements each of which can be a while-loop or a conditional, an assignment (=) or arithmetic operation (+,*), and each of those can contain a sequence of statements (recursively). You can also declare variables and static arrays. All loops are of the form "while(x < y)" and all conditionals are "if(x < y)". Convince yourselve that this is a powerful enough language and also there's a transformation of algorithms written in it to a standard machine language that preserves the time and space complexity.
Intuitively, it seems like this is a standard computer language and can compute anything a turing machine can given enough memory. More importantly it can compute anything that a standard home computer archictecture can. And vice versa, given enough memory for code and data. Iow, I'm not proving that here.
Conjecture 1. For any deterministic algorithm on the "C-like machine" with $F$-sized words, whose problem is compatible with an $F^k$-machine there is a time and space complexity-preserving transformation of the algorithm to an $F^k$-machine algorithm. Proof Since our problem is $F^k$ compatible, there already is a set of computing polynomial expressions by Theorem 1, but we want to show that there are polynomials expressions whose time and space complexity equals that of the C code algorithm. Since the algorithm is deterministic, we can precompute the values taken on by all loop-referenced variables in time and space proportional to the original, so that each while loop can be transformed into:
for(i=N; i > 0; i--) {
// reference all i-dependent data here
}
where N is a constant that depends on input $x^* \in R^k$. So set it to $\max_{x^*} \{N(x^*)\}$. Let all i-dependent data that's not part of our input $x^* = $ x[1..k] be pre-computed. For an algorithm that runs in $O(t(|x|))$ time we will need $C t(|x_m|)$, where $x_m \in X$, memory slots, maximum, to precompute the i-dependent data. Clearly that bound is correct. Now suppose that our algo is now composed of recursive for-loops like above and assigments to our input variables held in $x[1..k]$. A typical example would be:
for(i=N; i > 0; i--) {
for(j=M; j > 0; j--) {
x[0] += a[0] + x[i]*b[j] + 2;
x[1] *= x[0] + x[j]*c[i] - 3;
}
x[i] = x[i-2] * x[i - 10] + 100;
}
Let's convert conditional statements into those of the form "x[i] (*= | += | = ) Arithmetic". Each conditional assignment
if(a < b)
x = c
can be written $x := x p_1(a,b) + c (1-p_1(a,b))$ where $p_1(a,b) = \sum_{(u,v) \in F^2} p_2(u,v)(1 - (u - a)^{p-1})(1-(v-b)^{p-1})$ where $p_2(u,v)$ are the coefficients of $p_1$, and $p_1$ answers the question is $a \lt b$. $p_1$ is calculable on a standard machine in $O(1)$ time so each conditional assignment can be rewritten to a polynomial expression that can be evaluated in $O(c) = O(1)$ time on a standard machine.
Thus now our code looks like the recursive tree of for-loops and assignment /arithmetic statements to x[1..k], and the only cost to complexity is an increase in programm memory (constants that make up the i-dependent data + regular constants + instructions), that is proportional to the running time or $O(t(|x|)^3)$ but usually a lot less. Except this is not the true picture, I just realized. There may be loops inside of conditionals and conditionals inside of loops, and so on, but each assign / arithmetic operation may be taken out of the conditional.
Starting at the inner most loops, say of varying degere N, we have our first sequence of code that's only a sequence of $n$ assigns / arithmetic ops. Traverse the operations sequentually and each time you come across x[j] = E, do $P_j := E$ on our $F^k$ side. If you cross x[j] *= E do $P_j := P_j * E$, and so on... Then you've built up the $P_j$'s for that innermost loop. Now destroy the loop and write x[j] = $P_j$ for each affected variable from the killed loop. Now you should have assigns / ops in a conditional or in another loop. Keep killing inner-most loops until you can completely collapse the code structure into a set of assignments $x[j] = P_j$. Notice that the computational complexity of the resulting $P_j$ is exactly the same as that of the C code if it were interpreted directly, without any optimization.
I think that's the gist of what we're trying to prove and how to prove it.
What are your thoughts on this work?