Here's some ideas for converting restricted forms of XML to good-quality JSON.
The restrictions are as follows:
- The XML can't contain mixed content (elements with both children/attributes and text).
- The XML cannot depend on the order of child elements with distinct names (order dependence in children with the same name is okay).
- There can't be any attributes with the same name as child elements.
- There can't be any elements or attributes that differ only in their namespace names.
You also need to know the following things for each child element:
- Whether it MUST appear at most once (a singleton element) or MAY appear more than once (a multiplex element).
- Whether it only contains text (an element with simple type) or child elements and/or attributes (an element with complex-type).
Now, to convert the XML to JSON, apply these rules recursively:
- A singleton element of simple type, and likewise an attribute, is converted to a JSON simple value: a number or boolean if syntactically possible, otherwise a string.
- A multiplex object of simple type is converted to a JSON array of simple values.
- A singleton element of complex type is converted to a JSON object that maps the local names of child elements and attributes to their content. Namespace names are discarded.
- A multiplex element of complex type is mapped to a JSON array of JSON objects that map the local names of child elements and attributes to their content. Namespace names are discarded.
Comments are very welcome.