I am studying the Dirac delta and I am struggling with the intuition of the following two points.
I understand the Dirac delta as a distribution with mass in one point. Perhaps, the limiting case of a standard normal distribution when the standard deviation approaches zero. As such, it integrates to one: $\int_{-\infty}^{\infty} \delta(x) dx = 1$. This seems a little counter-intuitive to me, since that the integral is doing something akin to computing the area under a point.
Intuitively, the expected value of a random variable with the Dirac delta distribution is zero. That is because the whole mass is centred at x=0. That being said, $\delta(x)=\infty$ at x=0. How come that it does not affect the "weighting" in the expected value calculation?
Dirac's delta is not an ordinary function, indeed it is a distribution. The "definition" you are talking about in the second part of your post i.e. $$\delta(x):=\begin{cases} 0\quad\text{if}\quad x\neq0 \\ +\infty\quad\text{if}\quad x=0\end{cases}\tag{1}$$ is not a rigorous one and instead it is a sloppy way to say that the weight is non-zero only in the point where the $\delta$ function is centered ($x=0$ in the present case). Another way to formulate that is$^1$ $$\begin{cases} \delta(x)=0\quad\text{if}\quad x\neq0\\ \int\limits_{-\infty}^{+\infty}\delta(x)dx=1\end{cases}\tag{2}$$ This will remind you of the definition you proposed as the limit of a normalized gaussian distribution. You can easily see that $(2)$ is not possible for any ordinary function. In fact, such function would be zero when $x\neq0$ as requested, thus it can only be non-zero when $x=0$, that is a zero-measure set as it consists of a single point. This is the reason why in "definition" $(1)$ delta is infinite at the point where it is centered.
Nonetheless, also $(2)$ is ill-defined, since as there is no ordinary function that satisfies such property, so even the integral of such an object is undefined.$^2$
So we understood this not just a regular real function of real variable, then what is this $\delta$ distribution? Let $V$ be the linear space of all real functions defined on the real line (with the usual operation). The (linear) functional $$\delta:V\rightarrow\mathbb{R}$$ defined by $$\delta(f)=f(0)\qquad\forall f\in V$$ is what we call Dirac's $\delta$.
As I defined $V$, the function can be any function $f:\mathbb{R}\rightarrow\mathbb{R}$ e.g. $$f(x)=\cosh(x)(\implies f(0)=1)$$
Then $$\delta(\cosh)=1$$
If we were to use the integral notation the RHS would be
$$\int_{-\infty}^{+\infty}\delta(x)\cosh(x)dx=1$$
I should also mention there are other definitions, this is the one used in distribution theory. The space where it is defined generally has some other regularity restrictions I didn't mention here.
$^1$This is the definition originally used by Paul Dirac.
$^2$ In fact, it is a matter of convenience to use the integral symbol for distributions and as described below, it is camouflaging what is actually going on.