The following is an excerpt from Introduction to Algorithms by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein (aka CLRS):
$\mathcal{O}(g(n,m)) = \{ f(n,m): \text{there exist positive constants }c, n_0,\text{ and } m_0\text{ such that }0 \le f(n,m) \le cg(n,m)\text{ for all }n \ge n_0\text{ or }m \ge m_0\}$
Consider an alternative possible extention of Big-$\mathcal{O}$ notation to multiple variables:
$\mathcal{O}(g(n,m)) = \{ f(n,m):\text{there exist positive constants $c$, $n_0$, and $m_0$ such that }0 \le f(n,m) \le cg(n,m)\text{ for all }n \ge n_0\text{ and }m \ge m_0\}$
Are there any reasons why the first definition is used as an extension instead of the second?
Good question. I think the reason why the first definition is "better" is that it can be stated independently of the number of parameters as:
Then the usual one-parameter definition results when we take $X$ to be $\mathbb N$, and the one you quote from CLRS when we take $X$ to be $\mathbb N^2$.
In contrast, your proposed second definition would allow unbounded functions such as $$f(n,m) = \begin{cases} n+m & n=100 \lor m=100 \\ 1 & \text{otherwise} \end{cases}$$ to be $O(1)$, which doesn't seem to be useful for reporting algorithmic complexity.
More abstractly, remember that asymptotics are always defined relative to some limit $x\to x_0$. In the case above the limit is $x\to (+\infty,+\infty)$. Limits involving infinity are often treated as a special case in real analysis, but there's a natural topology on $\mathbb N^2\cup\{\infty\}$ such that the definition above is equivalent to
This view encompasses the asymptotics for $x\to 0$ we see in analysis. Additionally, it offers the opportunity to get your second definition back in the game, if you redefine the topology on $\mathbb N^2\cup\{\infty\}$ to give $\infty$ smaller neigborhoods.
Then the distinction between the two definitions you quote is simply that they are based on two different limits.