Here's an issue I ran into on this MathOverflow thread: https://mathoverflow.net/questions/18787/montagues-reflection-principle-and-compactness-theorem/18804#18804
Likewise, see Joel David Hamkins's answer here: https://mathoverflow.net/questions/15685/is-it-necessary-that-model-of-theory-is-a-set/15713#15713
A final subtle point, which I hesitate to mention because it can be confusing even to experts, is that it is a theorem that every model M of ZFC has an object m inside it which M believes to be a first order structure in the language of set theory, and if we look at m from outside M, and view it as a structure of its own, then m is a model of full ZFC.
In short: we can prove using Levy Reflection that for any model M of ZFC, some element of M satisfies ZFC (although the model may not believe that the structure does). But this seems like a problem for the axiom of foundation: can't we construct a downward chain of membership composed of the first model, the model which is an element of that model, and so on?
The only idea I've come up with is that somehow the descending chain of these submodels is not an element of the starting model M. But it's not clear to me why we couldn't guarantee that it is, by taking some sufficiently expansive initial model like the Von Neumann Universe. Is there a principle reason that the downward chain would not be a member of the universe? Or is there another way of resolving the issue that I've missed?
No, it doesn't. The quick slogan is "tables, chairs, and beer mugs" - when we speak of a model of set theory (within an ambeint model $M$), the membership relation of that model need have nothing to do with actual membership (in the sense of $M$), so "$a\in^b b\in^MM$" doesn't let us conclude anything about whether $a\in^Mb$ or not.
More explicitly: when we say that $M$ contains an $\{\in\}$-structure (a model of set theory or otherwise) such that [stuff], all we mean is that there is some $a\in M$ such that $M$ thinks "$a$ is an ordered pair $(x, e)$, where $e$ is a binary relation on $x$ and $x$ is nonempty" (nonemptiness is only an issue since the semantics of first-order logic rules out empty structures) such that [stuff].$^*$ Now $a$, as a structure, may itself "contain a structure" in the sense that there may be some $b\in^M x$ such that according to $a$, $b$ is an ordered pair $(y, f)$ where $f$ is a binary relation on $y$ and $y\not=\emptyset$. But note that since $a$'s membership relation is $e$, from the perspective of $M$ (and $\in^M$) $b$ might not look anything like what $a$ thinks $b$ looks like.
For example, take $M$ to be $V$ for simplicity; then for any countably infinite $\{\in\}$-structure $\mathfrak{S}$ and any countably infinite set $A$, there is a binary relation $E\subseteq A^2$ such that $(A, E)\cong\mathfrak{S}$. But of course $E$ need look nothing at all like $\in$.
For a more concrete example - again, working in $V$ for simplicity - let $x=\{\emptyset\}$ and $e=\{\{\{\emptyset\}\}\}$. Then:
In $V$, $x$ is nonempty and $e$ is a set of ordered pairs of elements of $x$ (specifically, $e=\{\langle \emptyset,\emptyset\rangle\}$).
This means that $a:=(x,e)$ is an $\{\in\}$-structure.
In the sense of $a$, we have that $\emptyset$ (by which I mean, and will continue to mean, $\emptyset^V$) is an ordered pair whose left coordinate, $\emptyset$, is nonempty (it has precisely one element, namely itself) and whose right coordinate, $\emptyset$, is a set of ordered pairs of elements of the left coordinate (it consists of a single ordered pair, saying that $\emptyset$ is related to $\emptyset$).
We can repeat this analysis, and show that in the sense of $a$ it is the case that in the sense of $\emptyset$ it is the case that $\emptyset$ is an $\{\in\}$-structure. Note that in the sense of $a$, $\emptyset$ is an element of $\emptyset$.
However, in the sense of $V$ the set $\emptyset$ is certainly not an $\{\in\}$-structure at all, let alone one that contains a further $\{\in\}$-structure, let alone one which contains itself as a further $\{\in\}$-structure.
$^*$This isn't quite the right general approach, since we will usually want the ability to distinguish between different binary relation symbols, but this restrictive approach is sufficient for the current question and is easier to think about.