In my project I am trying to give a Heuristic proof of Itō's lemma. I show $E[dW_t^2] = dt$
I take $g(x,t)$ to be a twice continuously differentiable function and $dt$ to be infinitesimally small.
Applying Taylors to $g$ gives me,
$dg(W,t) = \frac{\partial g}{\partial t}dt + \frac{\partial g}{\partial x}dW + \frac{1}{2}\frac{\partial^2 g}{\partial x^2}(dW)^2 + \frac{1}{2}\frac{\partial^2 g}{\partial t^2}(dt)^2+ \frac{1}{2}\frac{\partial^2 g}{\partial t \partial x}(dt)(dW) + \dots$.
Question 1: How can I justify having $dW_t^2 = dt$?
Question 2: On what basis do I neglect the other terms? I guess it is because they are of higher order but what does this actually mean?
I end up with $dg(W,t) = \frac{\partial g}{\partial t}dt + \frac{\partial g}{\partial x}dW + \frac{1}{2}\frac{\partial^2 g}{\partial x^2}dt$
The Itō integral $I\left(t\right)=\int_0^t f\left(s\right) dW\left(s\right)$ has the property that $$ \mathbb{E}\left[I^2\left(t\right)\right]=\mathbb{E}\left[\int_0^t f^2\left(s\right)ds\right] $$ (this is called Itō isometry). Taking $f\left(t\right)=1$ gives you the equation $\mathbb{E}\left[\left(\int_0^t dW\left(s\right)\right)^2 \right]=t$ (or $\mathbb{E}\left[dW\left(t\right)\right]=dt$ for short, as you wrote it).
As for the higher order terms, from Shreve and Karatzas, here's what happens to one of the higher order terms that disappears (the rest are similar):
\begin{align*} & \lim\left|\sum_{j=0}^{n-1}f_{tx}\left(t_{j},W\left(t_{j}\right)\right)\left(t_{j+1}-t_{j}\right)\left(W\left(t_{j+1}\right)-W\left(t_{j}\right)\right)\right|\\ \leq & \lim\sum_{j=0}^{n-1}\left|f_{tx}\left(t_{j},W\left(t_{j}\right)\right)\right|\left(t_{j+1}-t_{j}\right)\left|W\left(t_{j+1}\right)-W\left(t_{j}\right)\right|\\ \leq & \lim\max_{0\leq k\leq n-1}\left|W\left(t_{k+1}\right)-W\left(t_{k}\right)\right|\cdot\lim\sum_{j=0}^{n-1}\left|f_{tx}\left(t_{j},W\left(t_{j}\right)\right)\right|\left(t_{j+1}-t_{j}\right)\\ = & 0\cdot\int_{0}^{T}\left|f_{tx}\left(t,W\left(t\right)\right)\right|dt=0 \end{align*}
The symbol $\lim$ should be understood as taking the number of intervals $n\rightarrow \infty$ in a fairly "regular" way. The other higher order terms also go to zero with arguments like the above.
Edit: Woops, misread your question! You also asked about $dW^2\left(t\right)=dt$. This has to do with quadratic variation, which is not the same thing as the variance of the Itō integral (i.e. $\mathbb{E}\left[I^2\left(t\right)\right]$). The quadratic variation (written $\left[\cdot,\cdot\right]\left(t\right)$) of the Itō integral satisfies
$$ \left[I,I\right]\left(t\right)\equiv\lim\sum_{j=0}^{n-1}\left[I\left(t_{j+1}\right)-I\left(t_j\right)\right]^2 = \int_0^t f^2\left(s\right)ds. $$
Taking $f\left(t\right)=1$ gives you $\left[W,W\right]\left(t\right)=t$ (or $dW^2\left(t\right)=dt$ for short, as you wrote it).
For proofs of the isometry property and the quadratic variation, I would read Shreve and Karatzas. I can also vouch for Oksendal's book as being a good read, but I do not remember if he covers the proofs of these claims (he may have skipped over them to focus on other topics).
Secon Edit:
Here's the proof for the quadratic variation from Shreve's book. He uses $\Delta$ instead of $f$.