2010-12-23

MicroRNG

This is a contribution to the MicroXML conversation. It's a stripped-down version of RELAX NG suitable for validating MicroXML documents. It excludes namespaces, since MicroXML doesn't have them either. Somewhat reluctantly, I have also jettisoned all simple types but a few and all value types except the default, since I figure that MicroXML will mostly be used by applications that need to validate string values in more complicated ways anyhow.

Generalized interleave has a high implementation cost, so I've removed it as well, except for mixed content, which I consider essential. Finally, I've ditched lists, datatype libraries other than stripped-down XSD, foreign markup, name classes, nested grammars, external file inclusion, the notAllowed pattern, divs (which are just for documentation), and definition combining methods.

Here's what's left, in the form of a compact RELAX NG grammar. When translated to XML format, this is also a MicroRNG grammar (modulo namespace issues).

start = elementElem | grammarElem

grammarElem = element grammar {startElem, defineElem*}

startElem = element start {elementElem | refElem | element choice {(startElem | refElem)+ } }

defineElem = element define {attribute name {text}, pattern+}

pattern = elementElem | textElem | mixedElem | attributeElem | valueElem | groupElem | choiceElem | optionalElem | z zeroOrMoreElem | oneOrMoreElem | refElem | dataElem

elementElem = element element {attribute name {text}, (emptyElem | pattern+)}

emptyElem = element empty {empty}

textElem = element text {empty}

mixedElem = element mixed {pattern+}

attributeElem = element attribute {attribute name {text}, (valueElem|textElem)?}

valueElem = element value {text}<

groupElem = element group {pattern+}

choiceElem = element choice {pattern+}

optionalElem = element optional {pattern+}

oneOrMoreElem = element oneOrMore {pattern+}

zeroOrMoreElem = element zeroOrMore {pattern+}

refElem = element ref {attribute name {text}}

dataElem = element data {"string" | "decimal" | "double"| "integer" | "date" | "dateTime" | "boolean" | "base64Binary"}

In addition, MicroRNG just allows a single unique element element in a definition (that is, no more than one definition of an element), even though that would reduce the convenience of RNG definitions to their DTD equivalents.  There are other possible simplifications, like getting rid of element elements as the root, or removing zeroOrMore elements in favor of optional elements wrapped around oneOrMore elements, but I judge them to be more annoying to schema authors than helpful to implementers.

Comments are gratefully solicited either here or at James Clark's blog.