Section 4.2 in Loring Tu's Differential Geometry:
My Question: Since $D_XY −D_YX = [X,Y]$, then why define the quantity $T(X,Y)=D_XY −D_YX - [X,Y]$? Isn't $T$ always equal to $0$? I got very confused, and I want to know whether I have got anything wrong.


You're completely correct. This is a slightly unfortunate sentence. Later in the book, Tu will introduce the more general notion of an affine connection $\nabla$ on $TM$. This is a gadget quite similar to $D$, in that it is a map $\nabla:\mathfrak{X}(M)\times \mathfrak{X}(M)\to \mathfrak{X}(M)$ which is written $\nabla(X,Y)=\nabla_X Y$ and "differentiates" $Y$ with respect to $X$.
It satisfies moreover the properties of being $C^\infty(M)$ linear in $X$ and $\Bbb{R}-$linear in $Y$. The point of saying all of this is that for a general affine connection $\nabla$, we define the quantity $T(X,Y)=\nabla_X Y-\nabla_Y X-[X,Y]$ to be the torsion of $\nabla$, which is a tensor that eats a pair of vector fields and returns a vector field.
The reason we want to introduce this terminology is that a Riemannian manifold $(M,g)$ has a unique torsion free connection $\nabla$ compatible with the metric $g$. Compatibility here means that for all $X,Y,Z\in \mathfrak{X}(M)$, we have $$ X g(Y,Z)=g(\nabla_XY,Z)+g(Y,\nabla_X Z)\:\:\:\:\text{(a version of the product rule)}. $$ We call this the Levi-Civita connection and it shows us that a Riemannian manifold comes for free with a "canonical" choice of connection. This is in turn useful, because it gives us a notion of parallel transport of vector fields. Given a parametrized curve $\gamma:I\to M$, we say that a vector field $V$ along $\gamma$ is parallel with respect to $\nabla$ if $$ \nabla_{\gamma'(t)}V=0\:\:\:\text{(parallel transport equation)}. $$ If you look here: https://mathoverflow.net/questions/20493/what-is-torsion-in-differential-geometry-intuitively at Anonymous's answer, they provide an example of a connection on $\Bbb{R}^3$ which is not the Levi-Civita connection (because it has nonzero torsion) and with respect to which the parallel translation rotates a vector as it "moves" along a curve. This perhaps explains the reason why it is called torsion. $T(X,Y)\equiv 0$ means (roughly) that there is no twisting in the translation in some sense.