One representation is E[(y-g(x))^2] and other is $(1/2)m\sum_{i=1}^m (g(x^i)-y^(i))^2$ where m is number of training examples in a training set.
First one is from statistical learning book, second is from machine learning. How can I make the first expectancy equal to the second expression?
Let $x\sim U(S)$ be a random data sample drawn from a discrete uniform distribution over a set $S$ of size $|S|=m$. Then then probability of choosing $x$ at random is $p(x)=1/m$. Let $y_x$ be the associated label of $x$ (written $x_i$ and $y_i$ for a specific realization). Let $g(x)$ be your learned regression function. Then \begin{align*} \mathbb{E}_{x\sim U(S)}\left[ (y - g(x))^2 \right] &= \sum_{x\in S} (y - g(x))^2 p(x) \hspace{0.51in} \text{By definition of expectation}\\ &= \frac{1}{m} \sum_{x\in S} (y - g(x))^2 \hspace{0.59in} \text{By $p(m)=1/m$}\\ &= \frac{1}{m} \sum_{i= 1}^m (y_i - g(x_i))^2 \hspace{0.5in} \text{By summing over all $S$}\\ &=: \text{MSE}_S(g) \end{align*} is the mean-squared error loss of $g$ on $S$. This should be no surprise, since "expectation" essentially means taking the weighted average with respect to some probability distribution (here, uniform).
In the machine learning case, we often set the training loss to be $$\mathcal{L}_S(g) = \frac{1}{2}\text{MSE}_S(g),$$ so that when taking the derivative of $\mathcal{L}$ with respect to the parameters of $g$, the $1/2$ cancels the $2$ that comes from the squaring in the objective function. It's merely for aesthetics. See also this post.
In other words, they are not equal; they are different by a factor of $1/2$. But the solution in terms of the parameters of $g$ is unchanged by this scaling.