I am confused about direct sum of groups, which is something that I thought I understood for a long time.
By definition, the direct sum $\oplus_{\alpha}G_{\alpha}$ of groups $G_\alpha$ has elements which are tuples of the form $(g_{\alpha})$ where $g_\alpha \in G_ \alpha$ where all but finitely many $g_\alpha$ are zero.
However, sometimes we think of $\oplus G_\alpha$ as finite sums of the form $\sum_{\alpha}g_a$ where $g_\alpha \in G_\alpha$. Why are these representations of elements giving the same group?
Writing a tuple as a sum is fine: it is a notation. You are free to define and use whatever notations you like, as soon as they are unambiguous.
$a+b^2$ unambiguously corresponds to the couple $(a, b^2)\in G\times H$, provided that the reader knows that $a\in G$ and $b\in H$ and $G\cap H=\emptyset$.
In $\Bbb Z^2$, things would get more complicated as $2+3$ already has a meaning in another context. You could then decide to use $2\oplus 3$ and agree that the first summand corresponds to the first factor $\Bbb Z$. Or (what people usually do when they really want an additive notation) give a nickname such as $e_i$ or $v_i$ to a generator of the $i$-th factor, which results in the unambiguous $2e_1+3e_2$.