I came across the following MO question https://mathoverflow.net/questions/461426/church-turing-thesis-for-higher-order-functions and got confused. The author asks for references of Church-Turing-thesis in higher order context.
As I have understood, the model of computability that is widely accepted is Turing machine and other equivalent formulations. Now, if we move to higher order functions, I fail to see how the C-T thesis could possibly change. Yes, we have different context, but is there a different notion of computability defined in that context? If yes, what is it? If not, how would the higher order settings change what Turing machines can compute? What would it even mean? For example, we (can) have higher order functions in Turing-complete languages, which, in my mind, in a way voids the question, doesn't it?
Yes, there are some trivialities as choosing the domain in the case of $(\mathbb{N}\to\mathbb{N})\to\mathbb{N}$, but if we are looking for a computable functional, aren't we referring to some formalisation of computability in the first place, and the C-T thesis is about that, isn't it?
What am I missing?
$\newcommand{\Bool}{\mathsf{Bool}}$Higher order cases are more complicated because the details of the 'calling convention' for functions can vary. Consider these two cases:
These two methods of representing functions are clearly inequivalent (at least, when generalized in a way I'll explain later).
So, the Turing machine convention allows certain operations that are not possible with the lambda convention. For instance, a Turing machine for a function of type $(ℕ → ℕ) → (ℕ → ℕ) → ℕ$ can interleave the execution of its two arguments. This actually doesn't matter for total functions, but if we consider partial functions $ℕ \rightharpoonup ℕ$, then it allows us to do operations like 'racing' two partial functions. There's no operation for racing the reduction of two lambda terms.
You can do something like the Turing machine convention in lambda calculus, because (the code for) lambda terms can be lambda encoded. Then you can write a term that interleaves the reduction of two encoded terms just like interleaved simulation of two Turing machines. But it isn't the most obvious or convenient way to represent higher order functions in the lambda calculus.
The lambda convention has some nice aspects as well. For instance, I didn't need to add any extensionality condition there. Since the lambda convention just gives a 'black box' that can only be applied to get an output, it's impossible for a representative of a higher-order function to not be extensional. I've seen some arguments that this makes it possible to compute certain things in the lambda setting that cannot be computed in the TM setting, but I can't think of any examples off hand. I think possibly such arguments only apply to non-extensional 'higher order' TMs (or similar, due to Rice's theorem and the like). That is, some problems on non-extensional, higher order computations are undecidable, but are easily checkable about extensional computations.