Suppose I've rolled two dice and took the sum, for $25$ times; then plotted the results on below histogram.
If I added up all the heights, I get the total $25$ as expected : $$1+1+1+3+2+7+2+1+4+2+1=25$$ No issues so far.
Next, if I want a relative frequency histogram, I just need to scale the heights of the bars by $1/25$. Here if I add up all the heights, I will get $1$.
No issues here too.
In this video of khan academy and everywhere they say that the "area" under a relative frequency histogram equals 1. I don't know how this is true and it is throwing me off completely. I only see that the heights add up to 1. Maybe I'm missing something... Any help ?
EDIT : In above histogram the bin width is $1$, so it may not be a good example. Kindly also consider general histograms like below :


I would like to try to answer the question somewhat broader, if I may. A histogram is an approximation to a probability measure. The broader question is: How should such an approximation look like? This does immediately answer the question about the construction of a histogram.
Which kind of probability measure do you want to approximate. Is it a discrete or a(n) (absolutely) continuous measure?
A discrete measure say on $\mathbb{Z}$ has a probability mass function $f:\mathbb{Z} \rightarrow [0,1]$. In which case the probability of some event $A \subseteq \mathbb{Z}$ is defined as $$P(A) = \sum_{a \in A} f(a).$$
If you want to approximate such a measure that for instances assigns probabilities to categorial parameters or parameters on countable spaces, the way you describe above is the right way to go to approximate consistently the probability mass function $f$. The normalisation is indeed correct.
Moving on to the continuous case. Say, you want to approximate a probability measure on $\mathbb{R}$. Given is now a probability density function $g$. An event $A := [a,b]$, $a \leq b$ has probability: $$P(A) = \int_{a}^b g(x) \mathrm{d}x$$ Note that $P(\mathbb{R}) = 1$. Hence, if you want to approximate $g$, you need to make sure that $g$ integrates to $1$. One could for instance employ the following histogram: $$\hat{g}(x) = \sum_{n= -\infty}^\infty a_n\mathbf{1}_{(n,n+1]}(x),$$
where $\mathbf{1}_{(n,n+1]}(x) = 1$, if $x \in (n,n+1]$ and $0$ otherwise and $$a_n = \frac{\text{Number of data points in }(n,n+1]}{\text{Total number of data points}}.$$
Obviously, this can be generalised to any partition of measurable sets of $\mathbb{R}$, not just the set system $\{(n, n+1] : n \in \mathbb{Z}\}$.
Which normalisation is the right for you, depends only on your particular data set and the type of measure you want to approximate.