Quoted from Introduction to the Theory of Computation by Sipser, a regular expression is defined as:
Say that R is a regular expression if R is
- a for some a in the alphabet $\Sigma$,
- $\epsilon$,
- $\emptyset$,
- $(R_1 \cup R_2 )$, where $R_l$ and $R_2$ are regular expressions,
- $(R_1 \cdot R_2 )$, where $R_l$ and $R_2$ are regular expressions, or
- $(R_1^*)$, where $R_1$ is a regular expression.
In items 1 and 2, the regular expressions a and e represent the languages {a} and {$\epsilon$}, respectively. In item 3, the regular expression $\emptyset$ represents the empty language. In items 4, 5, and 6, the expressions represent the languages obtained by taking the union or concatenation of the languages $R_l$ and $R_2$ , or the star of the language $R_1$, respectively.
Reading the definition and Wikipedia's version, I was wondering if a regular expression is a string/word, or a set of strings, i.e., a formal language? Thanks!
As the definition says, a regular expression represents a language. You can regard it as a string if you like; not a string in the language being represented, but a string in the language of regular expressions for that language. It's not different in this respect from other expressions; for instance, you can regard the expression "$3+5$" as a string if you like, as long as you're aware that it represents a number and not a string. The possible confusion in the case of regular expressions arises because the regular expression may itself contain letters of the alphabet of the language being represented, but note that it also contains other symbols ('$\cup$', '$*$') which are not taken from that alphabet -- so if you want to regard the regular expression itself as a string in some language, then the alphabet of that language is a superset of the alphabet of the language being represented.