2008-03-03

Elements or attributes?

Here's my contribution to the "elements vs. attributes" debate:

General points:

  1. Attributes are more restrictive than elements, and all designs have some elements, so an all-element design is simplest -- which is not the same as best.

  2. In a tree-style data model, elements are typically represented internally as nodes, which use more memory than the strings used to represent attributes. Sometimes the nodes are of different application-specific classes, which in many languages also takes up memory to represent the classes.

  3. When streaming, elements are processed one at a time (possibly even piece by piece, depending on the XML parser you are using), whereas all the attributes of an element and their values are reported at once, which costs memory, particularly if some attribute values are very long.

  4. Both element content and attribute values need to be escaped, so escaping should not be a consideration in the design.

  5. In some programming languages and libraries, processing elements is easier; in others, processing attributes is easier. Beware of using ease of processing as a criterion. In particular, XSLT can handle either with equal facility.

  6. If a piece of data should usually be shown to the user, use an element; if not, use an attribute. (This rule is often violated for one reason or another.)

  7. If you are extending an existing schema, do things by analogy to how things are done in that schema.

  8. Sensible schema languages, meaning RELAX NG, treat elements and attributes symmetrically. Older and cruder schema languages tend to have better support for elements.

Using elements:

  1. If something might appear more than once in a data model, use an element rather than introducing attributes with names like part1, part2, part3 ....

  2. If order matters between two pieces of data, use elements for them: attributes are inherently unordered.

  3. If a piece of data has, or might have, its own substructure, use it in an element: getting substructure into an attribute is always messy. Similarly, if the data is a constituent part of some larger piece of data, put it in an element.

  4. An exception to the previous rule: multiple whitespace-separated tokens can safely be put in an attribute. In principle, the separator can be anything, but schema-language validators are currently only able to handle whitespace, so it's best to stick with that.

  5. If a piece of data extends across multiple lines, use an element: XML parsers will change newlines in attribute values into spaces.

  6. If a piece of data is in a natural language, put it in an element so you can use the xml:lang attribute to label the language being used. Some kinds of natural-language text, like Japanese, also require annotations that are conventionally represented using child elements; right-to-left languages like Hebrew and Arabic may similarly require child elements to manage bidirectionality properly.

Using attributes:

  1. If the data is a code from an enumeration, code list, or controlled vocabulary, put it in an attribute if possible. For example, language tags, currency codes, medical diagnostic codes, etc. are best handled as attributes.

  2. If a piece of data is really metadata on some other piece of data (for example, representing a class or role that the main data serves, or specifying a method of processing it), put it in an attribute if possible.

  3. In particular, if a piece of data is an ID (either a label or a reference to a label elsewhere in the document) for some other piece of data, put the identifying piece in an attribute. When it's a label, use the name xml:id for the attribute.

  4. Hypertext references (hrefs) are conventionally put in attributes.

  5. If a piece of data is applicable to an element and any descendant elements unless it is overridden in some of them, it is conventional to put it in an attribute. Well-known examples are xml:lang, xml:space, xml:base, and namespace declarations.

  6. If terseness is really the most important thing, use attributes, but consider gzip compression instead -- it works very well on documents with highly repetitive structures.

Michael Kay says:

Beginners always ask this question.
Those with a little experience express their opinions passionately.
Experts tell you there is no right answer.

I say:

Newbies always ask:
     "Elements or attributes?
Which will serve me best?"
     Those who know roar like lions;
     Wise hackers smile like tigers.
          --a tanka, or extended haiku

Final words:

Break any or all of these rules rather than create a crude, arbitrary, disgusting mess of a design if that's what following them slavishly would give you. In particular, random mixtures of attributes and child elements are hard to follow and hard to use, though it often makes good sense to use both when the data clearly fall into two different groups such as simple/complex or metadata/data.


6 comments:

Unknown said...

According to my notes (and it earned a place in my Favorite Quotes file a long time ago), the "Beginners always ask this question..." was Mike Kaye, not Len: http://lists.xml.org/archives/xml-dev/200006/msg00285.html

In fact, I think I was the one who suggested to Peter Flynn that he include it in the XML FAQ discussion of this question.

Peter should also add a pointer to your discussion here.

Bob

John Cowan said...

Hmm, you're right. I was misled by another posting into thinking Len was saying it rather than quoting it.

Fixed.

Anonymous said...

Is the missing 'W' in the last line just a typo, or is there a deeper symbolism (wise hackers do without the W?)

Nice list, by the way.

John Cowan said...

Thanks, Michael. Fixed.

No, no subtlety here, just a matter of sloppy mouse selecting.

Unknown said...

I think it's worth to mention that whitespace is normalized in attributes.

Anonymous said...

Great post - I'm doing xml design and didn't know where to start. Thanks for writing about this!