I am studying the proof of the Poincaré-Birkhoff-Witt theorem from Brian Hall's Lie Groups, Lie Algebras, and Representations. This is $\S 9.3$ and $\S9.4$ in the second edition.
Notation and setup
Let $\mathfrak{g}$ be a finite-dimensional Lie algebra with basis $\{ X_1,\dots,X_k \}$. Let $T(\mathfrak{g})$ denote the tensor algebra, and let $(U(\mathfrak{g}),i)$ be the universal enveloping algebra of $\mathfrak{g}$. The PBW theorem says that the set $$ \{ i(X_{j_1}) \cdots i(X_{j_N}) : 1 \leq j_1 \leq \dots \leq j_N \leq k, \text{and } N \in \mathbb{Z}_{\geq 0} \} \tag{1} $$ is linearly independent in $U(\mathfrak{g})$. To prove the PBW theorem, the author defines a linear map $\delta : T(\mathfrak{g}) \to D$, where $D$ is a vector space with basis $$ \{ v_{(j_1,\dots,j_N)} : 1 \leq j_1 \leq \dots \leq j_N \leq k \}, $$ such that $\delta$ satisfies $$ \delta(X_{j_1} \otimes \dots \otimes X_{j_N}) = v_{(j_1,\dots,j_N)}\tag{2} $$ for all nondecreasing tuples $(j_1,\dots,j_N)$, and $$ \delta(X_{j_1} \cdots X_{j_m} X_{j_{m+1}} \cdots X_{j_N} - X_{j_1} \cdots X_{j_{m+1}} X_{j_{m}} \cdots X_{j_N}) = \delta(X_{j_1}\cdots [X_{j_m},X_{j_{m+1}}]\cdots X_{j_N})\tag{3} $$ for all tuples $(j_1,\dots,j_N)$, not necessarily nondecreasing. (Here, the $\otimes$ symbol is being omitted to simplify the notation.) Since $U(\mathfrak{g})$ was constructed as the quotient of $T(\mathfrak{g})$ by the ideal $J$ generated by the set $$ \{ X \otimes Y - Y \otimes X - [X,Y] : X,Y \in \mathfrak{g} \}, $$ if such a map $\delta$ exists then it defines a linear map $\gamma : U(\mathfrak{g}) \to D$ such that $$\gamma(i(X_{j_1}) \cdots i(X_{j_N})) = v_{(j_1,\dots,j_N)}$$ for all nondecreasing tuples $(j_1,\dots,j_N)$, proving that $(1)$ is a linearly independent set.
Construction of $\delta$
The degree of a tuple $(j_1,\dots,j_N)$ is defined to be $N$. The index of a tuple $(j_1,\dots,j_N)$ is defined to be the number of pairs $m_1 < m_2$ such that $j_{m_1} > j_{m_2}$. The map $\delta$ is defined inductively on the degree $N$, and for each $N$ inductively on the index.
For $N = 0$ and $N=1$, there is nothing much to show. So, assume that $\delta$ has been defined on all terms $X_{j_1} \cdots X_{j_N}$, where $N \leq n-1$ and $n > 1$. For $N = n$, we define $\delta$ on $X_{j_1} \cdots X_{j_n}$ by induction on the index, denoted by $p$. If $p = 0$, then $\delta$ is defined by $(2)$. Now assume that $\delta$ is defined on all those terms of degree $n$ and index $\leq p-1$, where $p > 0$. Let $X_{j_1} \cdots X_{j_n}$ be a term of index $p$. Since there is at least one pair $(m,m+1)$ such that $j_m > j_{m+1}$ by choice of $p$, define $\delta$ on this term by $(3)$.
My question
At this point, the author says that it needs to be verified that the definition of $\delta$ on the term $X_{j_1} \cdots X_{j_n}$ of index $p$ is independent of the choice of $m$ (since there could be many such indices), and he proceeds to prove that it is indeed well-defined.
My doubt is, why not fix an index when defining $\delta$ on this term so that well-definedness is not a problem? For example, we could say, "Choose the largest index $m$ such that $j_m > j_{m+1}$, and use $(3)$ with this $m$ to define $\delta$ on the term $X_{j_1} \cdots X_{j_n}$ of index $p$."
Wouldn't this simplify the proof?
It might simplify the proof of the well-definedness of $\delta$, but you'd get stuck in the proof of the Poincaré-Birkhoff-Witt theorem later on.
To clarify this, I shall refer to Hall's definition of $\delta$ as the non-deterministic definition, since it involves a choice of $m$. As for your definition, I shall call it the deterministic definition, since it picks the maximal possible $m$.
Now, at the very end of the proof of the PBW theorem, Hall writes:
Here, (9.12) is your equation (3) (except that his $k$ is your $m$).
Why exactly does your equation (3) follow from the well-definedness of $\delta$? Fix an $N$-tuple $\left(j_1, j_2, \ldots, j_N\right)$ and any $m \in \left\{1,2,\ldots,N-1\right\}$. We want to prove (3). If $j_m < j_{m+1}$, we can swap $j_m$ and $j_{m+1}$ in this $N$-tuple, and the equality (3) does not "truly" change (both sides are just replaced by their negatives, because $\left[X_{j_{m+1}}, X_{j_m}\right] = - \left[X_{j_m}, X_{j_{m+1}}\right]$). Thus, we WLOG assume that $j_m \geq j_{m+1}$. If $j_m = j_{m+1}$, then (3) holds for obvious reasons (with both sides being $0$). Thus, we WLOG assume that $j_m \neq j_{m+1}$. Combined with $j_m \geq j_{m+1}$, this yields $j_m > j_{m+1}$. Hence, the non-deterministic definition of $\delta$ yields \begin{align} \delta\left(X_{j_1} \cdots X_{j_m} X_{j_{m+1}} \cdots X_{j_N}\right) &= \delta\left(X_{j_1} \cdots X_{j_{m+1}} X_{j_m} \cdots X_{j_N}\right) + \delta\left(X_{j_1} \cdots \left[X_{j_m}, X_{j_{m+1}}\right] \cdots X_{j_N} \right). \end{align} Thus, (3) holds.
With the deterministic definition, we could not have drawn this conclusion, because no one guarantees that the $m$ we have been working with is actually the largest $m$ satisfying $j_m > j_{m+1}$. So the deterministic definition only ensures that (3) holds for some values of $\left(j_1,j_2,\ldots,j_N\right)$ and $m$.