Edit 2:
Edit 1 gives the essential question of this post. This edit is meant to lay out what is left to be explained.
How do you translate $\mathrm{d}W(t) \mathrm{d}W(t)$ to integral notation? This source here writes
$$\int^t_s (\mathrm{d}W(x))^2\text{.} \tag{9}$$
This makes sense as a "translation", but it doesn't mean anything to me mathematically. What does this integral mean? How do we compute an integral of this form? The current answer says the translation should be
$$\int^t_0 \mathrm{d}(W_s)^2\text{.}\tag{10}$$
This makes sense mathematically, and appears to be what we want, but it doesn't make sense as a "translation." Do $(9)$ and $(10)$ mean the same thing? If not, is there an explanation for why $(10)$ is the "correct" translation?
Edit 1:
The answer to this question and another answer on this site suggest that the "differential" form is just a short-hand for an integral form. That leaves me with this question.
How do I interpret
$$\mathrm{d}I(t) \mathrm{d}I(t) = \Delta^2(t) \mathrm{d}W(t) \mathrm{d}W(t)\tag{6}$$
as an integral? I assume the LHS is
$$(I(t)-I(s)) * (I(t)-I(s))\text{,}\tag{7}$$
but the RHS appears ambiguous. If I interpret the symbols on the RHS the same as I did on the LHS, I would write
$$(W(s)-W(t))\int^t_s \Delta^2(t) \mathrm{d}W(t) \text{.}\tag{8}$$
If this is incorrect (and I believe it is), then the LHS appears ambiguous as well.
Original:
I'm reading Shreve's Stochastic Calculus for Finance. Between pages 125 and 132, he introduces the Itô integral:
$$I(t) = \int^{t}_{0}\Delta(u) \,\mathrm{d} W(u)\text{.}\tag{1}$$
He defines this integral precisely and I understand it. However, he uses another notation that he says is equivalent to this integral:
$$\mathrm{d}I(t) = \Delta(t) \mathrm{d}W(t)\text{.}\tag{2}$$
This kind of notation has never made much sense to me. I don't know what it means formally. I don't know how to prove statements made using this notation. So I had a few questions about this.
First, should $(2)$ be read as
$$\frac{\mathrm{d}I(t)}{\mathrm{d}t} = \Delta(t) \frac{\mathrm{d}W(t)}{\mathrm{d}t}\text{ ?}\tag{3}$$
Second, if the answer to the above is "yes", how do I show that $(3)$ is equivalent to $(1)$?
Third, on page 132, Shreve states without proof that
$$\mathrm{d}I(t) \mathrm{d}I(t) = \Delta^2(t) \mathrm{d}W(t) \mathrm{d}W(t)\tag{4}$$
"is another way of reporting the result" that "the quadratic variation accumulated up to time $t$ by the Itô integral" is
$$[I, I](t) = \int ^t _0 \Delta^2(u) \, \mathrm{d}u\text{.}\tag{5}$$
I can see intuitively why $(4)$ and $(5)$ are the same, but I'm not sure how to show it, again, formally. It appears to be significantly more complicated than some "take the integral of both sides" solution that might be the answer to my second question. How do I show this?
You are right in that the expression $\frac{\text{d}W}{\text{d}t}$ does not make sense, as the paths of the Brownian motion are not differentiable in the classical sense.
However, the notation for stochastic differential equations is purely due to the intuition. In fact it is equivalent to the integral equation $$ \text{d}X_{t} = \mu \text{d}t + \sigma \text{d}W_{t} \iff X_{t}-X_{0} = \int_{0}^{t}\mu\text{d}s + \int_{0}^{t}\sigma \text{d}W_{s} $$ The expression $\text{d}W$ should therefore be interpreted as an integral.
I am not familiar with the source you are citing, but the covariance process between processes $X$ and $Y$ has the defining property, that it is the unique continuous, predictable process of bounded variation that starts in 0 such that $$ XY - [X,Y]\quad \text{is a (local) martingale} $$ One can show that this process for stochastic integrals must be given by (for suitable $H,K$) $$ \left[\int H\text{d}X, \int K\text{d}Y\right]_{t} = \int_{0}^{t}H_{s}K_{s}\text{d}[X,Y]_{s} $$ where the right hand side is a Lebesgue-Stieltjes integral. In the case of the Brownian motion, we have $[W,W] _{t}= t$. The only proof that I know of this is rather lengthy, but starts from simple processes $H,K$ and uses the defining property of the covariance process. One can then extend to larger spaces of predictable processes through density arguments and such.