My confusion arose from a commonly mentioned exercise:
Show that the quadratic variation of Wiener Process is $\langle W\rangle_{T}=T$.
Note that the quadratic variation here is the non-decreasing continuous process in the Doob Mayer Decomposition, and we know that $$\lim_{\|\Pi\|\rightarrow 0}V_{T}^{2}(\Pi):=\lim_{\|\Pi\|\rightarrow 0}\sum_{i=1}^{m}(X_{t_{i}}-X_{t_{i}-1})=\langle X\rangle_{T}\ \text{in probability}.$$
I tried to prove it as follows:
For a fixed $T>0$, let $\Pi=\{t_{0}, t_{1},\cdots, t_{m}\}$ with $0=t_{0}\leq t_{1}\leq\cdots\leq t_{m}=T$ be a partition on $[0,T]$, then we need to consider $$V_{T}^{2}(\Pi):=\sum_{k=1}^{m}(W_{t_{k}}-W_{t_{k-1}})^{2}.$$ It suffices to show that $$\mathbb{E}(V_{T}^{2}(\Pi))^{2}\longrightarrow T,\ \text{as}\ \|\Pi\|\rightarrow 0,$$ as if this was true, then in view of Markov inequality, we would know that $$V_{T}^{2}(\Pi)\longrightarrow T\ \text{in probability},\ \text{as}\ \|\Pi\|\rightarrow 0,$$ and thus in view of the identity above, we would be able to conclude that $\langle W\rangle_{T}=T$.
We firstly do some preliminary computation \begin{align*} \mathbb{E}(V_{T}^{2}(\Pi))^{2}&=\mathbb{E}\Big(\sum_{k=1}^{m}(W_{t_{k}}-W_{t_{k-1}})\Big)^{2}=\sum_{i,j=1}^{m}\mathbb{E}\Big[(W_{t_{i}}-W_{t_{i-1}})(W_{t_{j}}-W_{t_{j-1}})\Big]\\ &=\sum_{i=1}^{m}\mathbb{E}(W_{t_{i}}-W_{t_{i-1}})^{2}+\sum_{i\neq j}^{m}\mathbb{E}\Big[(W_{t_{i}}-W_{t_{i-1}})(W_{t_{j}}-W_{t_{j-1}})\Big]. \end{align*}
The first term can be evaluated using the centered Gaussian increment property of Wiener Process $$\sum_{i=1}^{m}\mathbb{E}(W_{t_{i}}-W_{t_{i-1}})^{2}=\sum_{i=1}^{m}t_{i}-t_{i-1}=T.$$
My confusion then follows from the below argument:
To evaluate the second term, note that if $j=1$ is fixed, then $i$ cannot be $1$ and therefore the sum for $j=1$ is $$\mathbb{E}(W_{t_{2}}-W_{t_{1}})(W_{t_{1}}-W_{t_{0}})+\mathbb{E}(W_{t_{3}}-W_{t_{2}})(W_{t_{1}}-W_{t_{0}})+\cdots+\mathbb{E}(W_{t_{m}}-W_{t_{m-1}})(W_{t_{1}}-W_{t_{0}}),$$ but note that Wiener process has independent increment, i.e. $W_{t_{m}}-W_{t_{m-1}}$ is independent of $\sigma(W_{t_{0}},W_{t_{1}},\cdots, W_{t_{m-1}})$, so each expectation above is $0$, since independence implies being uncorrelated.
Similarly, this will happen for all $j=1,2,\cdots, m$, and thus the second sum is $0$.
Hence, $\mathbb{E}(V_{T}^{2}(\Pi))^{2}=T\longrightarrow T,\ \text{as}\ \|\Pi\|\rightarrow 0$, as desired.
By the above computation, it seems that the second variation of Wiener Process is the same over any partition... Is this true? Am I missing anything in my computation? Thank you!
Edit 1:
My proof above is not correct, the preliminary computation, the first equality, the first term should be $$\mathbb{E}\Big(\sum_{k=1}^{m}(W_{t_{k}}-W_{t_{k-1}})^{2}\Big)^{2}$$ This changes everything, let alone $\mathbb{E}(V_{T}^{2}(\Pi))^{2}\longrightarrow T$ does not implies the $L^{2}-$convergence.
After several attempts, I found the correct proof is really interesting! I will answer my own post to give the proof.
For a fixed $T>0$, let $\Pi=\{t_{0}, t_{1},\cdots, t_{m}\}$ with $0=t_{0}\leq t_{1}\leq\cdots\leq t_{m}=T$ be a partition on $[0,T]$, then we need to consider $$V_{T}^{2}(\Pi):=\sum_{k=1}^{m}(W_{t_{k}}-W_{t_{k-1}})^{2}.$$
It suffices to show that $$\mathbb{E}(V_{T}^{2}(\Pi)-T)^{2}\longrightarrow 0,\ \text{as}\ \|\Pi\|\rightarrow 0,$$ as if this was true, then in view of Markov inequality, we would know that $$V_{T}^{2}(\Pi)\longrightarrow T\ \text{in probability},\ \text{as}\ \|\Pi\|\rightarrow 0$$ and thus in view of note after my question in the post, we would be able to conclude that $\langle W\rangle_{T}=T$.
This proof has a really interesting technique: note that $$\mathbb{E}(V_{T}^{2}(\Pi))=\sum_{k=1}^{m}\mathbb{E}(W_{t_{k}}-W_{t_{k-1}})^{2}=\sum_{k=1}^{m}(t_{k}-t_{k-1})=T.$$ This means that the process $V_{T}^{2}(\Pi)-T^{2}$ is centered, and therefore $$\mathbb{E}(V_{T}^{2}(\Pi)-T)^{2}=Var(V_{T}^{2}(\Pi)-T)=Var(V_{T}^{2}(\Pi)),$$ where the last equality is obtained since variance is invariant w.r.t to translation with a non-random constant.
Thus, we have $$\mathbb{E}(V_{T}^{2}(\Pi)-T)^{2}=Var(V_{T}^{2}(\Pi))=Var\Big(\sum_{k=1}^{m}(W_{t_{k}}-W_{t_{k-1}})^{2}\Big).$$ Since Wiener process has independent increments, $(W_{t_{k}}-W_{t_{k-1}})^{2}$ is independent for each $k=1,\cdots, m$, and thus $$Var\Big(\sum_{k=1}^{m}(W_{t_{k}}-W_{t_{k-1}})^{2}\Big)=\sum_{k=1}^{m}Var(W_{t_{k}}-W_{t_{k-1}})^{2}.$$
Now we evaluate $Var(W_{t_{k}}-W_{t_{k-1}})^{2}$: \begin{align*} Var(W_{t_{k}}-W_{t_{k-1}})^{2}&=\mathbb{E}(W_{t_{k}}-W_{t_{k-1}})^{4}-\Big(\mathbb{E}(W_{t_{k}}-W_{t_{k-1}})^{2})\Big)^{2}\\ &=\mathbb{E}(W_{t_{k}}-W_{t_{k-1}})^{4}-(t_{k}-t_{k-1})^{2}\\ &=(t_{k}-t_{k-1})^{2}\mathbb{E}\Big(\dfrac{W_{t_{k}}-W_{t_{k-1}}}{\sqrt{t_{k}-t_{k-1}}}\Big)^{4}-(t_{k}-t_{k-1})^{2}. \end{align*}
The transformation in the last equality has the following reason: since $W_{t_{k}}-W_{t_{k-1}}\sim\mathcal{N}(0, t_{k}-t_{k-1})$, we must have $$\dfrac{W_{t_{k}}-W_{t_{k-1}}}{\sqrt{t_{k}-t_{k-1}}}\sim\mathcal{N}(0,1).$$ Therefore, $$\mathbb{E}\Big(\dfrac{W_{t_{k}}-W_{t_{k-1}}}{\sqrt{t_{k}-t_{k-1}}}\Big)^{4}=\mathbb{E}\Big(\dfrac{W_{t_{k}}-W_{t_{k-1}}}{\sqrt{t_{k}-t_{k-1}}}\Big)^{2\times 2}=1^{2}(2\times 2-1)!!=3!!=3\cdot 1=3.$$
Therefore, $$Var(W_{t_{k}}-W_{t_{k-1}})^{2}=3(t_{k}-t_{k-1})^{2}-(t_{k}-t_{k-1})^{2}=2(t_{k}-t_{k-1})^{2},$$ which gives us \begin{align*} \mathbb{E}(V_{T}^{2}(\Pi)-T)^{2}=\sum_{k=1}^{m}Var(W_{t_{k}}-W_{t_{k-1}})^{2}&=2\sum_{k=1}^{m}(t_{k}-t_{k-1})^{2}\\ &\leq 2\sup_{1\leq k\leq m}|t_{k}-t_{k-1}|\sum_{k=1}^{m}(t_{k}-t_{k-1})\\ &=2\|\Pi\|T\longrightarrow 0, \ \text{as}\ \|\Pi\|\rightarrow 0, \end{align*} as desired.
We also used the fact that for $X\sim\mathcal{N}(0,\sigma^{2})$, we have $$\mathbb{E}X^{2n}=(\sigma^{2})^{n}(2n-1)!!.$$