I wonder if it is justified, in any manner, to treat the identity operator $I$ on $L^2(\mathbb{R})$ as an integral transform with the kernel $K(y,x) = \delta(x-y)$ such that $$f(y) = (I f)(y) = \int_\mathbb{R} K(y,x) f(x) dx = \int_\mathbb{R} \delta(x-y) f(x) dx.$$ This is quite ubiquitous in quantum mechanics literature where "generalized position eigenstates" are used in the place of an orthonormal basis in identities like $$\langle\phi|\psi\rangle = \langle\phi|I|\psi\rangle = \int_{\mathbb{R}^2} \langle\phi|y\rangle\langle y|I|x\rangle\langle x|\psi\rangle\,dx\,dy = \int_{\mathbb{R}^2} \phi(y)^\ast \delta(x-y) \psi(x)\,dx\,dy = \int_\mathbb{R} \phi(x)^\ast \psi(x) dx.$$
On the one hand, Dirac $\delta$ is an object from an entirely different part of mathematics than $L^2$ functions, and the integral sign has a different meaning in generalized functions and in Lebesgue spaces. So I'm inclined to say that the above calculation is wrong on many levels. On the other hand, the $\delta$ is not used as a functional here and only some of its properties are needed for the integral kernel, and the resulting integral is perfectly alright, so the reasoning might have a potential to work, perhaps in some theory I just don't know.
Disregarding the rest I'm interested whether I can make the very last “=” work in the above display. Is there such a theory in which $\delta(x)$ (and its derivatives) could be used in (formal) integral transforms applied on Lebesgue spaces, or $L^2$ at least?
If you study physics and not pure math, my advice would be "your intuition (and all formulas you wrote) is fine, don't worry about the rigorous details". But here go some ramblings on math, maybe it helps.
$\delta$ is a distribution, not a function. It is an element of the dual space of a space of "testfunctions". Typically, you take as testfunction-space either Schwarz-functions $S$ or smooth functions with compact support.