Suppose that $G$ is a connected compact Lie group, $g(a)\in G$ is its element, and $a=(a_1,a_2,...,a_r)$ is the parameter of the group element $g(a)$. Then the infinitesimal generator of the group can be defined as \begin{equation} X_i=\frac{\partial{g(a)}}{\partial{a_i}}\bigg|_{a=0} \end{equation} Then an infinitesimal element $g(\delta a)=E+\sum_i \delta a_i X_i$, in which $\delta a\in O({\bf0})$, $E=g({\bf0})$ is the identity element and $O({\bf0})$ is the neighborhood of the ${\bf 0}$ point in parameter space. We know that the parameter of the product of two infinitesimal elements is the sum of the parameters of the two infinitesimal elements (to the first order of the infinitesimal parameter), that is $$ g(\delta a)g(\delta b)=(E+\sum_i \delta a_i X_i)(E+\sum_i \delta b_i X_i)=E+\sum_i (\delta a_i+\delta b_i) X_i=g(\delta a+\delta b). \tag{1} $$ Then, in every book I have read, it says that, any finite element $g(a)$ can be generated by $\{X_i\}$: let $\delta a=a/N$, \begin{align} g(a)=g(N\frac{a}{N})=g(N\delta a)&=g(\delta a+\delta a+...+\delta a)\\ &=*=g(\delta a)g(\delta a)...g(\delta a)=g(\delta a)^N\\ &=(E+\sum_i\delta a_iX_i)^N=(E+\frac{\sum_i a_iX_i}{N})^N\\ &=\exp(\sum_i a_iX_i)~~[\text{let } N\rightarrow \infty] \tag{2} \end{align} In the above proof, the key step is the $(=*=)$ step, which uses the relation in Eq.(1) $g(\delta a+\delta b)=g(\delta a)g(\delta b)$. But this relation is ONLY valid for infinitesimal elements and between two infinitesimal elements, it can not be used to derive the $(=*=)$ step directly because both $a$ is finite and there are $N$ elements here.
My questions are:
- Since the ($=*=$) step is wrong, is the conclusion Eq.(2) also wrong? (not limited to this proof)
- If the the conclusion Eq.(2) is right, how to prove it in a right way?
I think I have a counterexample. Using the same logic as the above proof, we consider a scalar function $f(x)$ with single parameter. It has the property $f(0)=1$. Then the "infinitesimal generator" of $f(x)$ is just its derivative at 0, let it be $$ t=\frac{\partial f(x)}{\partial x}\bigg|_{x=0}=f'(0) $$ Then for any infinitesimal parameter $\delta x$ and $\delta y$, to the first order, we have $f(\delta x)=f(0)+f'(0)\delta x=1+\delta x\cdot t$ and $f(\delta y)=1+\delta y\cdot t$, and $$ f(\delta x)f(\delta y)=(1+\delta x\cdot t)(1+\delta y\cdot t)=1+(\delta x+\delta y)t=f(\delta x+\delta y) $$ Then, if the logic in Eq.(2) is right, we can also derive the conclusion that: "For any finite $x$, there is the relation $f(x)=\exp(tx)$". But this conclusion is obviously wrong! Because here $f(x)$ can be any function satisfying $f(0)=1$, e.g. $f(x)=1+\sin(x)$. Obviously, $1+\sin(x)\ne\exp(tx)$ !
Only when $f(x)$ satisfies $f(x+y)=f(x)f(y)$ for any $x$ and $y$ (not limited to infinitesimal parameters), we can derive $f(x)=\exp(tx)$.
Formulas (1) and (2) are fine, and so is (*), properly interpreted. But you may have been misreading the otherwise sensible texts you dealt with; you might consider some texts recommended by the Greek chorus of comments above.
Let's start from your false counterexample, which goes to the heart of your misconception. It is, of course, Lie's celebrated advective flow. (Beware of the perverse inverted use of variables from here!)
Any continuous one-parameter Abelian group is equivalent to the group of translations. Take your $$ f(z+\delta x) \equiv g(\delta x) ~ f(z)= \left (1+\delta x \frac{\partial}{\partial z}\right )f(z) +O(\delta x ^2), $$ to leading order in $\delta x$. Here, g represents the finite, not the infinitesimal group element: it is only approximated by the binomial. Note the generator t does not involve a specific f(x), it involves all of them.
You may now translate further $$ f(z+\delta x + \delta y)= g(\delta y)f(z+\delta x)=g(\delta x)g(\delta y)f(z)=g(\delta x +\delta y)f(z). $$
Composing N of these group elements amounts to Taylor expanding around an ever-shifting argument, not just z, as you apparently are reading the group composition to mean. The group property (*) always works, not just for infinitesimal argument. The Nth power of the binomial is only the dominant term of the Nth power of g, corrected by $O(1/N)$.
For infinite N, you have your exponential, Lagrange's translation operator which summarizes the Taylor expansion compactly, $$ g(x)f(z)=e^{xt} f(z)=e^{x \frac{\partial}{\partial z }} f(z)= f(z+x). $$ You may now set z=0; I kept it to emphasize t is an operator--a gradient of the variable of all f. Thus, $g(x)f(0)=f(x)$.
This could all appear like a triviality, except, in principle, you (Lie) have now solved the problem for arbitrary coordinates and flows.
That is, if, instead of the gradient, you had a messier (reasonable) function, $$ t=\beta(z) \partial_z, $$ You could change variable z to canonical coordinates h(z) s.t. $$ h'(z)=1/\beta(z), $$ so that $$ t=\frac{\partial}{\partial h} , $$ and you'd use this "Abel function" to translate your f through functional conjugacy. (Read up in WP, if interested, but no matter.)
Back to you question. Since you are referring to physics texts, let's stick to n×n matrices for the generators X. For one generator t and one parameter a (a unique sum as you have), you just repeat the above. This is, in fact, equivalent to the definition of the matrix exponential.
For two, you utilize the Trotter formula, well, actually the whole enchilada, the CBH expansion formula, underlying Lie's third fundamental theorem, to see that individual exponentials compose to a generic exponential of Lie Algebra elements.
When in doubt, check with familiar elementary rotations on a sphere (our globe!), using Pauli or 3x3 matrices, which I'm sure you've mastered before moving on to recondite Lie Theory texts.