For a math paper I need to be able to evaluate $\int_{-a}^{a}\delta^{(n)}(x)\ f(x)\ dx$ for differentiable $f$. I know that it is 'supposed' to equal $(-1)^nf^{(n)}(0)$: $$\int_{-a}^a\delta^{(n)}(x)\ f(x)\, dx=\int_{-a}^a \frac{d^n}{dx^n}\delta(x) f(x)\, dx =\int_{-a}^a(-1)^n\delta(x) \frac{d^nf}{dx^n}\, dx=(-1)^nf^{(n)}(0).$$
This argument seems like a hack and I have no idea what is actually going on. I want to write something better, but don't know any distribution theory which appears to be the necessary framework for the problem.
All I have seen at this point is undergrad analysis and some linear algebra. Is a rigorous definition and proof out of reach?
This makes use of the distributional definition of the derivative. The Dirac delta distribution isn't a function so it doesn't have a classical derivative. However making use of integration by parts like arguments, you can assign meaning to the derivative - or $n$ fold derivative - of the Dirac delta. So the answer is: yes this is rigorous because it is quite nearly the definition of how the derivative would act on a Dirac delta. More generally, if $T$ is an operator on the set of test functions (be they compactly supported smooth functions or Schwartz functions or what have you) and $\varphi$ is a distribution, then we can define $T\varphi $ by
$$\langle T\varphi, f\rangle = \langle \varphi, T^*f\rangle$$
For reference, see Rudin's Functional Analysis chapter 7, I believe. Here $T^*$ is the adjoint of $T$, kind of like what you are used to from linear algebra (with some major caveats). So in this case $T$ is the $n$th derivative and the adjoint $T^*$ is nothing more than $(-1)^nT$, so by our above prescription, we simply shift the $n$th derivative over to the function $f$ and pick up $(-1)^n$ like what you did. Note that the inner product like notation could be viewed as integration, but it's more rigorously thought of as the distribution's action on the test function.