This might be an easy question, but I'm stuck at it for a while now. Let $p(x,y;t)$ be the heat kernel: $$p(x,y;t) := (4\pi t)^{-n/2}e^{-\frac{|x-y|^{2}}{4t}}.$$ In Reed & Simon's book, the following claim is made.
If $f$ is any measurable function on $\mathbb{R}^{n}$ so that $f(\cdot)p(x,\cdot;t)$ is integrable, we can approximate $f$ by continuous functions.
Here, integrable means integrable with respect to the Lebesgue measure on $\mathbb{R}^{n}$. I don't exactly understand the claim. What is this approximation? (Maybe they are referring to using the heat kernel as some sort of mollifier?) And why is the hypothesis of being integrable necessary?
The approximation they are referring to is likely the one given by convolving $f$ with the heat kernel. Define $$ f_t(x) = \int f(y)p(x,y;t)\,dy. $$ Note that you can readily find proofs of the claim that are technically simpler, though similar in spirit, if the function $f$ is assumed to be much nicer like a Schwartz function, or smooth and compactly supported, or even integrable. A proof of this particular result only assuming that $f_t(x)$ is well-defined can be found in Widder's book The Heat Equation, end of p.65, through to the top of p.68.
Here is a sketch of why the result is true. There are details that need to be worked out, so think of this more as intuition rather than a proof.
Claim 0: $f_t(x)$ is well-defined (by the assumption that $f(\cdot)p(x,\cdot\,;t)$ is integrable). In fact, $f$ is locally integrable by this assumption.
Claim 1: $f_t(x)$ is continuous as a function of $x$.
Sketch. It comes down to proving for a fixed $x$ that $$\int f(y)[p(x+h,y;t)-p(x,y;t)]\,dy \to 0\quad\text{as $h\to 0$}.$$ We know $p(x+h,\cdot\,;t)$ is roughly concentrated in the ball of radius $t^{\frac12}$ centered at $x+h$, and similarly $p(x,\cdot\,;t)$ is concentrated near $x$. As $h$ gets small, we will see a lot of cancellation.
More concretely, you could try the following. For fixed $x$, $p(x+h,y;t)-p(x,y;t)\to 0$ uniformly for $y$ satisfying $|x-y|\le 100\,t^{\frac12}$ as $h\to 0$. For $|x-y|\ge 100\,t^{\frac12}$, use the mean value theorem to estimate $|p(x+h,y;t)-p(x,y;t)|$. Laplace's method is also relevant in this connection, and is brought up in Widder's exposition, too.
Claim 2: $f_t(x)\to f(x)$ $x$-a.e. as $t\to 0^+$.
By definition, $f_t(x) = (f\ast \mathcal H_t)(x)$, where $\mathcal H_t(x) := p(x,0;t)$ is the heat kernel. Intuitively, $$\mathcal H_t\approx \frac{\chi_{B(0,t^{\frac12})}}{|B(0,t^{\frac12})|},$$ or in other words, $(f\ast \mathcal H_t)(x)$ looks a lot like the average value of $f$ on the ball $B(x,t^{\frac12})$. By the Lebesgue differentiation theorem, we know that for a locally integrable $f$, and a.e. $x$, $$ (f\ast \frac{\chi_{B(0,t^{\frac12})}}{|B(0,t^{\frac12})|})(x)\xrightarrow{t\to 0^+} f(x), $$ so it is plausible that this holds for $f\ast \mathcal H_t$, too. One way to prove this may be to mimic the proof of the Lebesgue differentiation theorem and work with the maximal function $$ \mathcal M_{heat}f(x) := \sup_{t>0}|f\ast \mathcal H_t|(x), $$ and prove that it satisfies the weak-type estimate: $$ \text{measure}(\{x: \mathcal M_{heat}f(x) > \lambda\})\le \frac{\text{const}}{\lambda}\int |f| $$ for all $f\in L^1$. I would be interested to know if this approach works to prove the claim under the assumption that $f_t(x)$ is well-defined, rather than assuming $f\in L^1$.