I'm recently trying to understand the basics of intuitionistic type theory, and I think I have grasped much of it. However, there is this question on my mind. For instance, can the type constructor $\to$ be regarded as an element of type $U_i\to U_i\to U_i$, where $U_i$ is some universe?
There are two main problems I can think of. First, this creates a circular reference of $\to$, which might offer the.chance of some non-normalizing construct. Second, this might cause universe inconsistencies, since $\to$ can be used with types of all universe levels.
Any help is appreciated!
The answer to this is pretty much, "it depends." There are generally multiple ways of presenting (roughly equivalent) type theories, and the answer might be different depending on presentation.
So, for instance, there is a way of presenting type theory in which types never themselves have a type. Instead they are just judged to be types like so:
$$Γ ⊢ A\ \mathsf{type}$$
whereas terms are judged like so:
$$Γ ⊢ e : A$$
And the way of presenting universes such that this distinction is maintained is to say that universes contain 'codes' and there is a family that 'decodes' those to the corresponding type. So:
$$\frac{}{Γ ⊢ U\ \mathsf{type}} \ \ \ \ \ \ \frac{Γ ⊢ u : U}{Γ ⊢ T(u)\ \mathsf{type}}\ \ \ \ \ \ \frac{}{Γ ⊢ \hat{ℕ} : U}\ \ \ \ \ \ T(\hat{ℕ}) = ℕ$$
and so on. This is sometimes called a Tarski style universe. But, one can also present things in a way that is called a Russell style universe, in which we just say that (some) types are themselves terms of the universe, which might have the following rules corresponding to the above:
$$\frac{}{Γ ⊢ U\ \mathsf{type}}\ \ \ \ \ \ \frac{Γ ⊢ A : U}{Γ ⊢ A\ \mathsf{type}}\ \ \ \ \ \ \frac{}{Γ ⊢ ℕ : U}$$
No computation rule being necessary, of course.
In this latter case, it might make some sense to say that something like $→$ has a type, although it might actually have many types, depending on how many universes there are. These also aren't necessarily the only ways of presenting things. Many computer implementations might more accurately be thought of as not having the $Γ ⊢ A\ \mathsf{type}$ judgment at all, and instead just talking about universes, because there are universes containing every particular type. Those would definitely rely on there being a sort of schematic type for things like $→$.
Note that there isn't exactly a circularity in these cases. For instance, if in the Russell case we say $⊢\; →\; : U → U → U$, this isn't even the initial specification of $→$, which is specified with a $\mathsf{type}$ judgment. It is just a description of which types are in $U$. If your goal were a single (non-schematic) rule that subsumed all specification of $→$, that might be more in danger of paradox, I suppose.