The heat kernel of a compact Riemannian manifold is the only smooth function $k=k(t,x,y):\mathbb{R}_{>0}\times M \times M\to \mathbb{R} $ such that $k(\cdot,x,\cdot)$ is a solution to the heat equation with initial distribution $k(0,x,\cdot)$ is the Dirac delta centered in $x$.
(By $k(0,x,\cdot)$, I mean the limit in the sense of distribution of $k(t,x,\cdot)$ as $t\to 0^+$).
Since distributions act on the space of test functions $C^\infty_c(M)=C^\infty(M)$, the condition $k(0,x,\cdot)=\delta_x$ should mean that $$\lim_{t\to 0^+}\int_M k(t,x,y)f(y)=f(x)$$ for any $f\in C^\infty(M)$. However, many important references like this one and this one, require an (apparently) stronger condition. The first reference requires the equality above to hold for any $f\in L^2(M)$ and the second one for any $f\in C^0(M)$.
Are these three requests all equivalent?
Those conditions are actually equivalent. In general, it is defined with test functions, and we show then that it remains true for any $L^2$ function where the limit when $t \rightarrow 0$ is an $L^2$ convergence. Let $T_0 = \mathrm{Id}$ and for all $t > 0$, $$ T_t : \left\{\begin{array}{rcl} \left(\mathcal{C}^\infty(M),\|\cdot\|_{L^2(M)}\right) & \rightarrow & \left(L^2(M),\|\cdot\|_{L^2(M)}\right) \\ f & \mapsto & x \mapsto \displaystyle \int_M k(t,x,y)f(y) \, dy \end{array}\right. $$ $T_t$ is a linear operator for all $t$ and for all test function $f$, \begin{align*} \|T_t(f)\|_{L^2(M)}^2 & = \int_M \left(\int_M k(t,x,y)f(y) \, dy\right)^2 \, dx\\ & \leqslant \int_M \int_M k(t,x,y)^2 \, dy \int_M f(y)^2 \, dy \, dx\\ & = \|k(t,\cdot,\cdot)\|_{L^2(M^2)}^2\|f\|_{L^2(M)}^2. \end{align*} We deduce that $T_t$ is continuous and $\|T_t\| \leqslant \|k(t,\cdot,\cdot)\|_{L^2(M^2)}$ (this remains true for $t = 0$). In particular, $T_t$ can be uniquely extended to $L^2(M)$ by density of the space of test functions. Therefore, $\int_M k(t,x,y)f(y) \, dy$ makes sens when $f \in L^2(M)$ and is itself an $L^2$ function.
Now, consider some $f = \sum_{i \geqslant 0} a_i\phi_i$ where the $\phi_i$ are a Hilbert basis of $L^2(M)$ of eigenvectors of $\Delta$ with associated eigenvalues $\lambda_i$ and the $(a_i)$ are real numbers with $\sum_{i \geqslant 0} a_i^2 = \|f\|_{L^2(M)}^2 < +\infty$. We know that $k$ verifies for all $t$ and $i$, $\int_M k(t,x,y)\phi_i(y) \, dy = e^{-\lambda_it}\phi_i(x)$, therefore, \begin{align*} \|T_tf - f\|_{L^2(M)}^2 & = \left\|\sum_{i \geqslant 0} a_iT_t\phi_i - \sum_{i \geqslant 0} a_i\phi_i\right\|_{L^2(M)}^2\\ & = \left\|\sum_{i \geqslant 0} a_i(e^{-\lambda_it} - 1)\phi_i\right\|_{L^2(M)}^2\\ & = \sum_{i \geqslant 0} a_i^2(e^{-\lambda_it} - 1)^2. \end{align*} Now, if $\varepsilon > 0$, there is a rank $j$ such that $\sum_{i \geqslant j} a_i^2 \leqslant \frac{\varepsilon}{2}$ and for all $i < j$, we can find some $t_i$ such that for all $t < t_i$, $a_i^2(e^{-\lambda_it} - 1)^2 \leqslant \frac{\varepsilon}{2j}$. If we set $s = \min_{i < j}\{t_i\} > 0$, we have that for all $t \leqslant s$, $$ \|T_tf - f\|_{L^2(M)} \leqslant \sum_{i < j} a_i^2(e^{-\lambda_it} - 1)^2 + \sum_{i \geqslant j} a_i^2(e^{-\lambda_it} - 1)^2 \leqslant \sum_{i < j} \frac{\varepsilon}{2j} + \frac{\varepsilon}{2} = \varepsilon. $$ We deduce that $T_tf \rightarrow f$ i.e. $$ \lim_{t \rightarrow 0} \int_M k(t,x,y)f(y) \, dy = f(x) \textrm{ for the $L^2$ norm.} $$ Note however that unless $\Delta$ is compact (I am pretty sure it never happens), the convergence is not uniform because, $$ \|T_{1/\lambda_i}\phi_i - \phi_i\|_{L^2(M)} = \|e^{-\lambda_i \times 1/\lambda_i}\phi_i - \phi_i\|_{L^2(M)} = 1 - e^{-1} > 0, $$ and $\frac{1}{\lambda_i} \rightarrow 0$ when $i \rightarrow +\infty$. It means that for $f \in L^2(M)$ fixed, $t \mapsto T_tf$ is a continuous function from $\mathbb{R}_+$ to $L^2(M)$ but not uniformly with respect to $f \in L^2(M)$.