Warning: You don't know about MicroXML without you have read a blog post by the name of "More on MicroXML"; but that ain't no matter, because you can click on the link and read all about it.
Warning Too: Carefully note the word "and" in the title. There's a reason why it's not "versus".
The whole point of MicroXML is to provide an XML spec (and associated data model) which is small and simple enough, and easy enough to implement, that it can go where no XML has gone before. Of course, JSON is already filling part of that niche, and it's even simpler than MicroXML. So MicroXMLers have two choices: think up reasons why JSON is bad, or figure out ways to coexist with it. My personality being what it is, I choose the second.
The goals of this posting are a) to specify a way to losslessly and uniquely transform JSON documents into MicroXML documents and back, and b) to specify a way to add markup to an arbitrary MicroXML document to explain how to transform it to JSON, which probably involves some amount of loss, because if MicroXML weren't more expressive than JSON, it wouldn't have a reason to exist. Consequently, a non-goal is to specify a way to losslessly and uniquely transform MicroXML to JSON and back.
JSON values have six possible types: objects (key-value mappings), arrays (ordered lists of values), strings, numbers, booleans, and null
. The simplest approach to the first goal that could possibly work is to define a MicroXML vocabulary with six elements in it, named object, array, string, number, boolean
, and null
, and that's what I'm going to specify. So JSON converted to MicroXML looks pretty much like JSON itself, only more verbose. Why do this at all? So that the converted JSON can be fed into a MicroXML-based or XML-based pipeline and possibly converted back to JSON at the other end. Of course, if you don't need to do that, no problem: just don't convert to MicroXML in the first place.
Five of the six types are easy to represent: an array
element represents the elements of the array using its child elements; a string
, number
, or boolean
element contains the string, number or boolean value as character content, and a null
element is always empty.
Next we must choose how to represent the key-value pairs within an object. They can't be represented as attributes (that is, with the key as the attribute name and the value as the attribute value), because the JSON RFC only says that keys SHOULD be unique, not that they MUST be unique, and attribute names in XML elements MUST be unique. So we'll represent each key-value pair as a child element, and represent the value of the pair using the content of the element.
But what about the key? There are two plausible choices: use an element with the fixed name pair
and specify the key (which must be a string) using a key
attribute, or use the name of the element directly as the key. The first solution is general but verbose; the second solution is not general, because only a subset of strings can appear as a MicroXML (or XML) element name. We'll require MicroXML-to-JSON converters to accept both (be liberal in what you accept), but require JSON-to-MicroXML converters to use the second solution unless the key contains a character that's not valid in XML names (be conservative in what you send). So pair
becomes a seventh name in the MicroXML vocabulary for JSON.
(The characters U+FFFE and U+FFFF can appear literally in a JSON string, key, or value, but can't appear in XML character content, not even using character references. These aren't likely to actually occur in JSON documents, but just for completeness we'll say that they must be escaped with JSON escaping as \uFFFE
and \uFFFF
. This constitutes a minor violation of the rule of verbatim round-tripping, since JSON->MicroXML->JSON will always produce escape sequences for these characters even if the original document had them appear literally, but no realistic JSON application will notice the difference.)
So much for the first goal. What about the second? We'll require JSON->MicroXML translators to adopt the rules above to begin with. What about elements and attributes present in the MicroXML that have other names? We'll say that if an element has the attribute json-type
, then the value of that attribute tells us how to process it. Thus an element named list
with a json-type
attribute of array
will be converted to a JSON array. In this process, the actual name of the element and any inappropriate content is discarded, including any character content in an element with a json-type
of object
or array
and any content at all of an element with a json-type
of null
. We don't discard child elements in elements with a json-type
of string
, number
, or boolean
: instead we use the XPath value of the element, which is the same as the content of the element with any tags ignored.
What about MicroXML attributes? We discard them for all elements except those with a json-type
of object
, where we treat them as additional key-value pairs (excepting of course any json-key
attribute).
As usual, comments are solicited.