I am looking into arguments which yield the result $(dB_t)^2=dt$ and authors first start by letting P be a partition $P=\{t_0, t_1, ..., t_n\}$ of the interval $ [0,T]$ where $t_i=\frac{i}{n}T$ and then use the approximation of $$\int_{0}^{T}(dB_t^2)=\lim_{n\to \infty}\sum_{i=1}^{n} (B_{t_i}-B_{t_{i-1}})^2.$$ Then they use properties of Brownian motions to achieve that $$\lim_{n \to \infty}\sum_{i=1}^{n} (B_{t_{i}}-B_{t_i-1})^2=\lim_{n \to \infty}T\sum_{i=1}^{n}\frac{Z_i^2}{n}$$ where $Z_i\sim \mathcal{N}(0, 1)$.
Now where I start to get unsure is that texts opt to use the Weak Law of Large Numbers to obtain convergnce in probability but my question is, why do they avoid using the strong law when we can obtain a stronger result, that $$P(\lim_{n \to \infty}\sum_{i=1}^{n} (B_{t_{i}}-B_{t_i-1})^2=T)=1.$$
Strong Law requires a single sequence $\{X_i\}$ of ii.d. random variables. Here $Z_i$ is actually dependent on $n$, so there is no single i.i.d. sequence $\{X_i\}$. You cannot apply Weak Law directly but a simple argument using Chebychef's Inequality gives convergence in probability.