First, a little background: In XML there is the ability for one part of an XML document to reference another part of the document (i.e., a cross-reference). Below is an example. The BookSigning element references a Book element:
<Library>
<BookCatalogue>
...
<Book isbn="0-440-34319-4">
<Title>Illusions</Title>
<Author>Richard Bach</Author>
</Book>
...
</BookCatalogue>
<BookSignings>
...
<BookSigning isbn_ref="0-440-34319-4" />
...
</BookSignings>
</Library>
That is, isbn_ref points to isbn. The value of isbn_ref and isbn match.
If there is no matching isbn value then the isbn_ref is dangling and the XML is invalid.
I want to know if cross-references in XML can be expressed using a Context-Free Grammar (CFG)? Or, does the use of cross-references make XML Context-Sensitive?
Dealing with the XML syntax is much too complicated, so I would like to abstract the problem to something more manageable. I believe that cross-referencing in XML is analogous to this: Let x represent any XML element and a represent one end of a cross-reference. We could have an XML document without any cross-references, which corresponds to a sequence of x's:
xxxxxxxxxxxxx
Suppose that the XML document has a cross-reference, then it has an a and somewhere else in the document there must be exactly one other a. So amongst the x's there must be zero a's or exactly two a's:
xxxxxxaxxxxxxaxxxx
I wrote a CFG for that language:
S -> X | A
X -> xX | empty
A -> XaB | BaX
B -> XaX
So, if I have a correct abstraction of cross-referencing in XML, then I have proven that cross-references in XML are context-free. The problem is that I am not convinced that my abstraction faithfully represents cross-referencing in XML. Do I have a correct abstraction? Are cross-references context-free?
I think there's a pretty convincing but informal argument that cross-references are not context-free. A context-free language can be matched by a push-down automaton. However, if you have multiple cross-references and the IDs are drawn from an unbounded set, you need to "remember" all of the IDs you've seen, so you can't guarantee that the one you need will be at the top of the stack when you encounter a reference.
(For what it's worth, I think your abstraction also breaks down on the possibility of multiple references to the same ID, but that's by the by).