Uusually, an elliptic curve is defined to be a smooth cubic curve in $\mathbb{P}^2$. But twisted Edward curves defined by $$ ax^2 + y^2 = 1 + dx^2y^2 $$ over fields of characteristics $\neq 2$, have degree 4, and they have singularities at infinity: $(0:1:0)$ and $(1:0:0)$. I know that a twisted Edwards curve is birationally equivalent to a smooth cubic curve in Montgomery form. But birational equivalence does not equal isomorphism.
Is a twisted Edwards curve including points at infinity a group and why? If it's not, which part of it forms a group and why?
You've written an affine equation for the twisted Edwards curve $C$, giving it as a subset of $\newcommand{\A}{\mathbb{A}} \newcommand{\P}{\mathbb{P}} \mathbb{A}^2$, but in mentioning the points $(0:1:0)$ and $(1:0:0)$, you are implicitly considering it as a subset of $\mathbb{P}^2$. However, there are other projective varieties naturally containing $\mathbb{A}^2$ as an open, and we can equally as well take the closure of $C$ in one of these---see the Extended coordinates section of the Wikipedia article for a few examples.
Here's a different proposal from those listed at that page: consider $C \subseteq \mathbb{A}^2 \subseteq \mathbb{P}^1 \times \mathbb{P}^1$ and take its closure in $\mathbb{P}^1 \times \mathbb{P}^1$. Writing $((X_0 : X_1), (Y_0 : Y_1))$ for the coordinates on $\mathbb{P}^1 \times \mathbb{P}^1$, then we can identify $\A^2$ as the open subset where $X_1 \neq 0$ and $Y_1 \neq 0$, with affine coordinates $x = X_0/X_1, y = Y_0/Y_1$. (There is a reason for choosing $\P^1 \times \P^1$ coming from toric geometry and the Newton polygon of the Edwards curve equation.) Substituting for $x$ and $y$ and clearing denominators, we find that the closure $\overline{C}$ of $C$ in $\P^1 \times \P^1$ is given by the bihomogeneous equation $$ \overline{C} : a X_0^2 Y_1^2 + X_1^2 Y_0^2 = X_1^2 Y_1^2 + d X_0^2 Y_0^2 \, . $$ (This equation is bihomogeneous of degree $(2,2)$, so $\overline{C}$ has genus $(2-1)(2-1) = 1$, as expected.)
Setting $X_1 = 0$, we have \begin{align*} 0 &= a X_0^2 Y_1^2 - d X_0^2 Y_0^2 = X_0^2 (a Y_1^2 - d Y_0^2) = X_0^2 \left(\sqrt{a} Y_1 - \sqrt{d} Y_0\right) \left(\sqrt{a} Y_1 + \sqrt{d} Y_0\right) \, . \end{align*} Since $X_0 \neq 0$ (we already have $X_1 = 0$), we find the two points $$ ((1:0), (\sqrt{a} : \sqrt{d})), \quad ((1:0), (\sqrt{a} : -\sqrt{d})) \, , $$ possibly defined over some extension of the base field. Setting $Y_0 = 0$ and proceeding similarly, we find two more points $$ ((1:\sqrt{d}), (1:0)), \quad ((1:-\sqrt{d}), (1:0)) $$ giving 4 total points at infinity. Moreover, one can show that $\overline{C}$ is nonsingular at each of these 4 points, so is nonsingular everywhere. (This is assuming $a \neq 0, d \neq 0$, and $a \neq d$, as required in the definition of a twisted Edwards curve.)
I claim that all the points of $\overline{C}$, including the 4 points at infinity found above, form a group. Moreover, we can obtain the group law by appropriately homogenizing the formula for the group law on the usual affine piece, given in section 6 of Bernstein et. al's Twisted Edwards Curves. The formula for $((X_0 : X_1), (Y_0 : Y_1)) + ((X_0' : X_1'), (Y_0' : Y_1'))$ is \begin{equation} \begin{aligned} \left(\left(X_0 Y_0' X_1' Y_1 + X_0' Y_0 X_1 Y_1' : X_1 X_1' Y_1 Y_1' + d X_0 X_0' Y_0 Y_0'\right),\\ \left(Y_0 Y_0' X_1 X_1' - a X_0 X_0' Y_1 Y_1' : X_1 X_1' Y_1 Y_1' - d X_0 X_0' Y_0 Y_0' \right) \right) \, . \end{aligned} \tag{$*$} \label{add} \end{equation}
This formula works even for the points at infinity. For instance, doubling the points $((1:0), (\sqrt{a} : \pm\sqrt{d}))$ using (\ref{add}), we obtain \begin{align*} 2 \cdot ((1:0), (\sqrt{a} : \pm\sqrt{d})) &= ((0 : d \cdot \sqrt{a} \cdot \sqrt{a}), (-a \cdot (\pm\sqrt{d}) \cdot (\pm \sqrt{d}) : -d \cdot \sqrt{a} \cdot \sqrt{a}))\\ &= ((0:ad), (-ad : -ad)) = ((0:1), (1:1)) \, . \end{align*} This is the identity of the group law, so $((1:0), (\sqrt{a} : \pm \sqrt{d}))$ are $2$-torsion points. Doubling $((1 : \pm \sqrt{d}), (1:0))$ using (\ref{add}), we find $$ 2 \cdot ((1 : \pm \sqrt{d}), (1:0)) = ((0:1), (1:-1)) \, . $$ Doubling $((0:1), (1:-1))$, we obtain $2 \cdot ((0:1), (1:-1)) = ((0:1), (1:1))$, showing that $((1 : \pm \sqrt{d}), (1:0))$ are $4$-torsion points. This agrees with the observations in the section Exceptional Points for the Birational Equivalence of Bernstein et al.