I don't understand why the two equations below are equivalent.
\begin{align} \boldsymbol{\theta}_{\rm ML} &= \mathop{\rm argmax}_\boldsymbol{\theta} \sum_{i=1}^{m} \log p_{\rm model}(\boldsymbol{x}^{(i)}; \boldsymbol{\theta}) \tag{5.58}\label{5.58} \\ \boldsymbol{\theta}_{\rm ML} &= \mathop{\rm argmax}_\boldsymbol{\theta} \mathbb{E}_{\mathbf{x} \sim \hat{p}_{\rm data}} \log p_{\rm model}(\boldsymbol{x}; \boldsymbol{\theta}) \tag{5.59}\label{5.59} \end{align}
Quoted from chapter 5 of Deep Learning:
Because the $\mathop{\rm argmax}$ does not change when we rescale the cost function, we can divide by $m$ to obtain a version of the criterion that is expressed as an expectation with respect to the empirical distribution $\hat{p}_{\rm data}$ defined by the training data.
It just takes a bit of time to find out where the symbols are defined. There's no computations involved in this question. \begin{align} \boldsymbol{\theta}_{\rm ML} &= \mathop{\rm argmax}_\boldsymbol{\theta} \sum_{i=1}^{m} \log p_{\rm model}(\boldsymbol{x}^{(i)}; \boldsymbol{\theta}) \tag{5.58}\label{558} \\ &= \mathop{\rm argmax}_\boldsymbol{\theta} \underbrace{\frac1m}_\text{const.} \sum_{i=1}^{m} \log p_{\rm model}(\boldsymbol{x}^{(i)}; \boldsymbol{\theta}) \tag{divide by $m$}\label{frac1m} \end{align}
In \eqref{559}, $\boldsymbol{x}^{(i)}$'s are replaced by $\boldsymbol{x}$, and a new symbol $\hat{p}_{\rm data}$ is introduced, so it's better to scroll up the page to see where they are defined.
Observe the difference in boldface styles and their corresponding meaning.
\begin{array}{|c|c|c|} \hline \text{$\rm \LaTeX$ code} & \texttt{\boldsymbol{x}} & \texttt{\mathbf{x}} \\ \hline \text{output} & \boldsymbol{x} & \mathbf{x} \\ \hline \text{meaning} & \text{realized} & \text{theoretical} \\ \hline \text{usage} & \boldsymbol{x}^{(i)} & \hat{p}_{\rm data}(\mathbf{x}) \\ \hline \end{array}
Let's take a closer look at the ${\rm \small{\bf smaller}}$ part of \eqref{559}
\begin{equation} \boldsymbol{\theta}_{\rm ML} = \mathop{\rm argmax}_\boldsymbol{\theta} \mathbb{E}_{\mathbf{x} \sim \hat{p}_{\rm data}} \log p_{\rm model}(\boldsymbol{x}; \boldsymbol{\theta}). \tag{5.59}\label{559} \end{equation}
It reads
$${\huge \mathbf{x} \sim \hat{p}_{\rm data}}. \tag{subscript} \label{sub}$$
From the previous quoted text, it's clear that $\boldsymbol{x}^{(i)}$'s are i.i.d. with (unknown) distribution $\hat{p}_{\rm data}$. In \eqref{558}, we calculate $\log p_{\rm model}(\boldsymbol{x}^{(i)}; \boldsymbol{\theta})$ from these $m$ realisations $\boldsymbol{x}^{(i)}$'s with $i = 1,\dots, m$, then we take the simple average in \eqref{frac1m}. The symbol $\mathbb{E}$ captures the idea of "average" and the \eqref{sub} indicates the underlying probability distribution.