2008-10-14

Converting Restricted XML to Good-Quality JSON

Here's some ideas for converting restricted forms of XML to good-quality JSON. The restrictions are as follows:
  • The XML can't contain mixed content (elements with both children/attributes and text).
  • The XML cannot depend on the order of child elements with distinct names (order dependence in children with the same name is okay).
  • There can't be any attributes with the same name as child elements.
  • There can't be any elements or attributes that differ only in their namespace names.
You also need to know the following things for each child element:
  • Whether it MUST appear at most once (a singleton element) or MAY appear more than once (a multiplex element).
  • Whether it only contains text (an element with simple type) or child elements and/or attributes (an element with complex-type).
Now, to convert the XML to JSON, apply these rules recursively:
  • A singleton element of simple type, and likewise an attribute, is converted to a JSON simple value: a number or boolean if syntactically possible, otherwise a string.
  • A multiplex object of simple type is converted to a JSON array of simple values.
  • A singleton element of complex type is converted to a JSON object that maps the local names of child elements and attributes to their content. Namespace names are discarded.
  • A multiplex element of complex type is mapped to a JSON array of JSON objects that map the local names of child elements and attributes to their content. Namespace names are discarded.
Comments are very welcome.

5 comments:

Stephan.Schmidt said...

"There can't be any attributes with the same name as child elements."

Some people map attributes to @attribute or {"@" : { "attribute": value } }

http://stephan.reposita.org/archives/2008/10/13/david-pollak-was-right-about-xml-and-json/

Peace
Stephan

Anonymous said...

Why not ordered?

{
orderedList: [
{foo:bar},
{baz:bat}
]
}

You can have order. What am I missing?

John Cowan said...

It's not that you can't have child element order, or namespaces, or coinciding child-element and attribute names. It's that you can only do so by substantially uglifying your JSON.

Indeed, you can provide a JSON translation for every feature in an XML document, even including the DTD. But what you get is awkward to navigate and tends to make your average Javascript programmer retch. It's just way too general, which is not the spirit of JSON.

My proposal here allows you to translate a broad but limited set of XML documents into clean JSON-represented data structures that are easy to extract and manipulate in a simple and intuitive way. That entails imposing certain restrictions on the XML.

Anonymous said...

OK, but using an array to order child objects is used frequently:

* a directory tree
* tabs
* window pane layout
* and of course data

I use it quite a bit.

best,
-Rob

Anonymous said...

I have a simple recipe for a declarative approach to generating JSOn from XML in Amara:

http://wiki.xml3k.org/Amara2/Recipes/XML_to_JSON

I'll look to update that to take advantage of your rules, maybe auto-generating from schema info the declarative model that drives the conversion (BOOK_MODEL in linked example).

Happy New Year!