For example, consider the proof using Rosser's trick as shown on wikipedia.
https://en.wikipedia.org/wiki/Rosser%27s_trick#The_Rosser_sentence
That proof isn't inside the arithmetical theory T, but in some higher level meta-theory. So if I wanted to do meta-reasoning about Godel's proof (or Rosser's etc), I would need a system that could refer to or even encode statements like "For any fixed arithmetical theory $T$, let Proof$_T(x,y)$..." In what formal system would the statement "For any fixed arithmetical theory $T$..." exist? It obviously wouldn't be in $T$ itself since that would be tantamount to a proof that $T$ is consistent that is inside $T$, in which case $T$ would be inconsistent by Godel's second incompleteness theorem, thus making $T$ useless as a formal system. Is it second order logic? ZFC?
You write
This is a common misconception. The incompleteness theorem, appropriately phrased, can be proved in (first-order) $\mathsf{PA}$ or indeed much less. Here's the precise statement of the theorem:
Note that consistency is folded into the hypothesis. In particular $\mathsf{PA}$ can prove
(since it's easy for $\mathsf{PA}$ to prove that $\mathsf{PA}$ is computably axiomatizable and interprets Robinson arithmetic), but this doesn't contradict the inability of $\mathsf{PA}$ to prove its own consistency.
That said, there is a "fundamentally meta" aspect to the claim that the incompleteness theorem can be proved in $\mathsf{PA}$.
What $\mathsf{PA}$ actually proves is a sentence $(*)$ about numbers, which we claim is a faithful rephrasing of "Every computably axiomatizable consistent theory interpreting Robinson arithmetic is incomplete" in some sense. But where is that claim made and evaluated?
Basically, on some level we have to commit to the position that a particular formal system successfully describes (at least to a certain extent) the behavior of formal systems themselves. This can feel like a bit of a rabbit hole, but remember: we do need to start with something.