What is the exact definition of a local function? Is a function said to be local if it depends only on the value of its variable, and a finite number of derivatives of this variable, at a single point? For example, $$f(y(x),y'(x),\ldots,y^{(n)}(x);x)$$ or is there something else to it?
The reason I ask in particular is to do with defining the meaning of a Lagrangian density in physics; as far as I understand, a Lagrangian density (dependent on some field $\phi(x)$ is local if its value at a given spacetime point $x^{\mu}$ is dependent only on the value of the field, and its derivatives (to finite order), evaluated at that single spacetime point $x^{\mu}$, $$\mathcal{L}(x)=\mathcal{L}(\phi(x),(\partial_{\mu}\phi)(x),\ldots,(\partial^{(n)}_{\mu}\phi)(x))$$ and this definition of local is not physical, but purely a mathematical definition of locality. (At least according to Matthew Schwartz's book "Quantum Field Theory and the Standard Model", chapter 24, section 4).
Is this definition of locality borrowed from the notion of local coordinate charts and neighbourhoods in differential geometry at all?
(This is probably not the right place, but I assume that this definition is purely mathematical and not physical, because physically, the quantum fields $\phi$ are technically distributions and so cannot be localised to a single point in space, however, such a mathematical definition guarantees desired properties, such as Lorentz invariance, an absence of action-at-a-distance, etc?)
Yes, your interpretation is correct.
Let $F_1$ be a fibre bundle over $M_1$ and $F_2$ be a fibre bundle over $M_2$; denote by $\Psi_{1,2}$ the corresponding space of smooth sections. A map $f:\Psi_1\to \Psi_2$ is said to be local if it factors through a bundle map $\tilde{f}: J^k\to F_2$ where $J^k$ is the $k$-th jet bundle of the fibre bundle $F_1$.
In your physical interpretation, you should think of the Lagrangian density itself as a "distribution" over your space-time manifold, and that each field is itself a "distribution" over your space-time manifold. And the mapping from field to Lagrangian density is in general a mapping from some set of distributions to some other set of distributions. This is the physical point of view! In other words, the Lagrangian density takes, physically, the entire wave function/field as input, not the "value" of the field at an arbitrary point. And the codomain of this Lagrangian density function is some set of distributions over your manifold, not just some number or some finite dimensional vector.
Imposing the locality condition is just saying that mathematically you can compute the "value" of the Lagrangian density at a space-time point $p$ based only on the "values" of the first few derivatives of the field at the point $p$. The main advantage of this, when it comes to mathematics, is that then the associated equations of motions are partial differential equations.
If you allow "non-local" dependences in the Lagrangian density your equations of motions would become either delay-differential equations or integro-differential equations or some combination thereof, and those are much harder to solve or analyze in general when compared to straight-up partial differential equations. Hence imposing the locality condition is mathematically convenient; if you want to actually use QFT to compute things, you might as well assume the locality condition, since if you don't, things get really hard.
Related: Peetre's theorem
If you go back to the basics of all these, which is calculus of variations and the principle of least action, all you really need for the action functional is that for each quantum field (thinking in terms of it as an object defined over the space-time) your action returns a number. There's not even originally the requirement that the action be necessarily obtained from integrating a Lagrangian density. In principle you can just think of the action functional as a function defined over some infinite dimensional vector space (the state space) and your job is to find a critical point for this function.
Mathematically the representation of the action by a Lagrangian density and the requirement that the mapping from the field to the density is local, are mainly "simplifying assumptions"; they simplify the situation since we know how to use the Euler-Lagrange technique to convert the condition of "critical point of the functional" to the condition of "solving a PDE". And solving PDEs is sometimes easier than doing calculus in infinite dimensions.
Physically there may or may not be good axiomatic justification for these requirements. But at the very least, experience has shown that under these assumptions we can still formulate interesting physics that reasonably model the real world. And so there is really no need to consider the more general picture just to make life more difficult when it comes to computations.