In logic (and categorical logic) one learns that infinitary logic is like finitary logic, only that the formulas may contain set-indexed disjunctions and conjuctions such as $\bigvee_{i\in I}\phi_i$. I am trying to formalise the notion of a geometric theory inside the internal language of a Grothendieck topos. The problem that I encounter is that I am not really sure what a geometric formula actually is, and hence how type of formulas of an internal theory $\mathbb T$ shall look like.
A finitary theory consists of finite strings of symbols from some alphabet (which can be any internal set). It seems plausible that this can be formalised with help of the internal $List$ type-constructors of the topos. But a geometric formula does not look like this. How can I formalize what it is? When I for example allow extra symbols for all internal sets in the alphabet, and for all families of geometric formulas already build from them, then the resulting collection can't be a small type.
The problem already appears in ordinary mathematics. One can not form a set of all geometric formulas, when the index sets in the disjunctions are already allowed to range over all sets. What is a proper definition of "geometric" or "infinitary" theory?
Here is a definition that works in "ordinary mathematics". Given an ordinal $\alpha$ and a finite variable context $x$, let's define a geometric formula of complexity $\leq \alpha$ in context $x$ to be:
Formally, we're defining, by transfinite recursion, a set $F_{\alpha,x}$ (the set of geometric formulas of complexity $\leq \alpha$ in context $x$) for all ordinals $\alpha$ and all finite variable contexts $x$. Then the proper class of all geometric formulas in context $x$ is $\bigcup_{\alpha\in \mathrm{Ord}} F_{\alpha,x}$. In other words, a geometric formula in context $x$ is an element of $F_{\alpha,x}$ for some ordinal $\alpha$.
I believe this kind of definition should internalize to a Grothendieck topos.
It may also be helpful to note that up to logical equivalence, the geometric formulas actually form a set. This is because every geometric formula is logically equivalent to one of the form $\bigvee_{i\in I} \varphi_i$, where each $\varphi_i$ is a regular (or positive primitive) formula: built from atomics using finite conjunctions and existential quantifiers. The collection of all regular formulas is a set, and it follows that (up to logical equivalence), a geometric formula can be identified with a set of regular formulas.
This works because finite conjunctions and existential quantifiers distribute over infinite disjunctions. The same trick doesn't work for the full infinitary logic $L_{\infty,\omega}$, where there really is a proper class of formulas up to logical equivalence.