I'm working through Cutland's Computability. Problem 4.4.5 says:
Show that for each $m$ there is a total ($m+1$)-ary computable function $s^m$ such that for all $n$, $$\phi_{e}^{(m+n)} (\textbf{x},\textbf{y}) \simeq \phi_{s^{m}(e,\textbf{x})}^{(n)}(\textbf{y})$$ where $\textbf{x},\textbf{y}$ are $m,n$ tuples.
It also wants me to show that $s^m$ is primitive recursive.
Now the s-m-n theorem states the existence of a primitive recursive $s^m_n$ such that: $$\phi_{e}^{(m+n)} (\textbf{x},\textbf{y}) \simeq \phi_{s^{m}_{n}(e,\textbf{x})}^{(n)}(\textbf{y})$$ which is significant because for any given $n$ we can write a program that effectively shifts the $n$-tuple $\textbf{y}$ right by $m$, inserts $\textbf{x}$, and runs the program encoded by $e$ on this new input $\textbf{x},\textbf{y}$.
But the problem wants me to find some general $s^m$ that will work regardless of the input size $n$. A hint is given, along the lines of: "Use the fact that $\rho(P_e)$, the maximum space accessed by the $e$-th program, is independent of $n$, and that the output depends only on what's inside the registers $1,2,3, ... , \rho({P_e})$ to begin with."
I think the idea is to simply shift every single block used by the program ($1, 2, ... \rho(P_e)$) up by $m$ and inject $\textbf{n}$ in the style of the proof of s-m-n theorem. But this requires actually finding this value $\rho(P_e)$, and I'm not sure if that's doable, primitive recursively at least. It feels like I would need unbounded search for that, making it not primitive recursive. Am I on the right track here?
The input for $s^m$ is really just a program, $p$. We can examine at $p$ directly to compute $r(p)$, the maximum register number used anywhere inside $p$. Registers that are "used" in this sense need not be used as inputs; they can be used at any stage of the program in any way. So we can determine $r(p)$ in a primitive recursive way by just listing all the register numbers mentioned anywhere in the program.
It is important here that Cutland's register machines, like most register machines (but unlike real CPUs), hard-code the register numbers to be used in each instruction. There is no "indirect addressing" in which an instruction can say something like "look in register 4, and then add 1 to the register whose number is stored there", nor "look in register 8 and then jump to the instruction whose number is stored there". Each instruction directly tells which registers are relevant to evaluating that instruction.
Now, even though not all $r(p)$ registers need to be viewed as inputs of $p$, is certainly true that the result of the program is only dependent on the contents of those registers. So the number of inputs actually used by program $p$ is not more than $r(p)$. Any further inputs are, essentially, ignored by $p$, and so they can be ignored by $s^m(p)$.
So, in the new program $s^m(p)$, we first shift the values of registers $1, 2, \ldots, r(p)$ into registers $m+1, m+2, \ldots, m+r(p)$. Then we copy specific values into registers $1$, ..., $n$, as usual.