Warning: You don't know about MicroXML without you have read a blog post by the name of "More on MicroXML"; but that ain't no matter, because you can click on the link and read all about it.
Warning Too: Carefully note the word "and" in the title. There's a reason why it's not "versus".
The whole point of MicroXML is to provide an XML spec (and associated data model) which is small and simple enough, and easy enough to implement, that it can go where no XML has gone before. Of course, JSON is already filling part of that niche, and it's even simpler than MicroXML. So MicroXMLers have two choices: think up reasons why JSON is bad, or figure out ways to coexist with it. My personality being what it is, I choose the second.
The goals of this posting are a) to specify a way to losslessly and uniquely transform JSON documents into MicroXML documents and back, and b) to specify a way to add markup to an arbitrary MicroXML document to explain how to transform it to JSON, which probably involves some amount of loss, because if MicroXML weren't more expressive than JSON, it wouldn't have a reason to exist. Consequently, a non-goal is to specify a way to losslessly and uniquely transform MicroXML to JSON and back.
JSON values have six possible types: objects (key-value mappings), arrays (ordered lists of values), strings, numbers, booleans, and
null. The simplest approach to the first goal that could possibly work is to define a MicroXML vocabulary with six elements in it, named
object, array, string, number, boolean, and
null, and that's what I'm going to specify. So JSON converted to MicroXML looks pretty much like JSON itself, only more verbose. Why do this at all? So that the converted JSON can be fed into a MicroXML-based or XML-based pipeline and possibly converted back to JSON at the other end. Of course, if you don't need to do that, no problem: just don't convert to MicroXML in the first place.
Five of the six types are easy to represent: an
array element represents the elements of the array using its child elements; a
boolean element contains the string, number or boolean value as character content, and a
null element is always empty.
Next we must choose how to represent the key-value pairs within an object. They can't be represented as attributes (that is, with the key as the attribute name and the value as the attribute value), because the JSON RFC only says that keys SHOULD be unique, not that they MUST be unique, and attribute names in XML elements MUST be unique. So we'll represent each key-value pair as a child element, and represent the value of the pair using the content of the element.
But what about the key? There are two plausible choices: use an element with the fixed name
pair and specify the key (which must be a string) using a
key attribute, or use the name of the element directly as the key. The first solution is general but verbose; the second solution is not general, because only a subset of strings can appear as a MicroXML (or XML) element name. We'll require MicroXML-to-JSON converters to accept both (be liberal in what you accept), but require JSON-to-MicroXML converters to use the second solution unless the key contains a character that's not valid in XML names (be conservative in what you send). So
pair becomes a seventh name in the MicroXML vocabulary for JSON.
(The characters U+FFFE and U+FFFF can appear literally in a JSON string, key, or value, but can't appear in XML character content, not even using character references. These aren't likely to actually occur in JSON documents, but just for completeness we'll say that they must be escaped with JSON escaping as
\uFFFF. This constitutes a minor violation of the rule of verbatim round-tripping, since JSON->MicroXML->JSON will always produce escape sequences for these characters even if the original document had them appear literally, but no realistic JSON application will notice the difference.)
So much for the first goal. What about the second? We'll require JSON->MicroXML translators to adopt the rules above to begin with. What about elements and attributes present in the MicroXML that have other names? We'll say that if an element has the attribute
json-type, then the value of that attribute tells us how to process it. Thus an element named
list with a
json-type attribute of
array will be converted to a JSON array. In this process, the actual name of the element and any inappropriate content is discarded, including any character content in an element with a
array and any content at all of an element with a
null. We don't discard child elements in elements with a
boolean: instead we use the XPath value of the element, which is the same as the content of the element with any tags ignored.
What about MicroXML attributes? We discard them for all elements except those with a
object, where we treat them as additional key-value pairs (excepting of course any
As usual, comments are solicited.