I'm following a proof of "Axiom of Choice implies Well Ordering Theorem" (from a not-well-known book we use for class), and the author uses this procedure:
First, he takes any non-empty set $X$ and defines the relation $\le$ so that for every $b \in X$, the following is true:
$b=f(X-\{ x\in X, x \neq b / x \le b \}) = \text{least element of }(X-\{ x \in X, x \neq b / x \le b \})$
Where $f$ is a choice function. In other words, the idea is to get the $f$ to select some element of $X$, and we call this the least element. Then we do the same for $X$ minus the elements we already ordered. And so on.
It appears like this procedure gets all the elements of a set, but the book then mentions that we can't guarantee this is the case for infinite sets. That's why the author uses the following method:
We consider subsets $B$ of $X$ whose elements have already been ordered by $f$, meaning $(B,\le_B)$ is a well-ordered set, such that for all $b\in B$:
$ b=f(X-\{ x \in B, x \neq b / x \le_B b \}$
($\le_B$ is the order relation induced in $B$ by $\le$)
We then get the union of all these sets and we get $(X,\le)$, using 2 more pages for this part and the complete proof in general.
My question is, why it isn't guaranteed that we will get all the elements of $X$ using the original procedure? Why should we create these subsets and then get their union?
Taken literally, the question has an easy (but probably unsatisfying) answer: The subsets that the proof creates and forms the union of are, as you wrote, all the well-ordered sets $(B,\leq_B)$ such that each element $b$ of $B$ is the result of applying the choice function $f$ to the complement of the set of strict predecessors of $b$. One such $B$ is the empty set. Another is the singleton $\{a\}$ where $a=f(X)$. And there are (for reasonably large $X$) lots of other $B$'s that are nowhere near all of $X$.
Now let me take the question less literally, so as to address what you probably meant. My guess is that you intended $B$ to be not just any old well-ordered set of the sort described above, but rather the one you get by continuing as long as possible. That is, as long as what you've put into $B$ so far isn't all of $X$, you should apply $f$ to the rest of $X$ to select an element that you then put into $B$ as the next element. You stop only when your well-ordering has exhausted $X$.
That's the correct intuition, but it isn't mathematically precise. It uses concepts like "put into $B$", "so far", "continue"; these would need to be defined (and some of their basic properties would have to be proved) in order to convert the idea into a real proof. There are some serious problems in a direct attempt to make these concepts precise. For example, the concepts seem to involve a notion of time, but the cardinality of $X$ might exceed the cardinality of the real line, so this notion of time couldn't be the usual, physical one.
The idea of taking the union of all the possible $B$'s, including those that, like my examples in the first paragraph, are obviously "too short", is to provide a mathematically precise substitute for these temporal notions. The key is that, among all the possible $B$'s, the "right" one is the biggest one. Since all the possible $B$'s (the right one and the too-short ones) form a linearly ordered chain, you can get the biggest one by taking the union of them all.