This is a report on Extreme Markup 2007 for Friday.
Friday was a half-day, but Extreme 2007 saved the best of all its many excellent papers for almost the last. Why is it that we repeat the "content over presentation" mantra so incessantly, but throw up our hands when it comes to tables? All the standard table models -- HTML, TEI, CALS -- are entirely presentational: a table is a sequence of rows, each of which is a sequence of cells. There are ways to give each column a label, but row labels are just the leftmost cell, perhaps identified presentationally but certainly not in a semantic way. If have a table of sales by region in various years, pulling out North American sales for 2005 with XPath is no trivial matter. Why must this be? Inquiring minds (notably David Birnbaum's) want to know.
David's proposal, really more of a meta-proposal, is to use semantic markup throughout. Mark up the table using an element expressing the sort of data, perhaps
salesdata. Provide child elements naming the regions and years, and grandchildren showing the legal values of each. Then each cell in the table is simply a
sales element whose value is the sales data, and whose attributes signify the region and year that it is data for. That's easy to query, and not particularly hard to XSLT into HTML either. (First use of the verb to XSLT? Probably not.)
Of course, there is no reason to stop at two categories. You can have a cube, or a hypercube, or as many dimensions as you want. The OLAP community knows all about the n-cubes you get from data mining, and the Lotus Improv tradition, now proundly carried on at quantrix.com (insert plug from highly satisfied customer here) has always been about presenting such n-cubes cleverly as spreadsheets.
The conference proper ended in plenary session with work by Henry Thompson of W3C that extends my own TagSoup project, which provides a stream of SAX events based on nasty, ugly HTML, to have a different back end. TagSoup is divided into a lexical scanner for HTML, which uses a state machine driven by a private markup language, and a rectifier, which is responsible for outputting the right SAX events in the right order. It's a constraint on TagSoup that although it can retain tags, it always pushes character content through right away, so it is streaming (modulo the possibility of a very large start-tag).
Henry's PyXup replaces the TagSoup rectifier, which uses another small markup language specifying the characteristics of elements, with his own rectifier using a pattern-action language. In TagSoup, you can say what kind of element an element is and something about its content model, but you can't specify the actions to be taken without modifying the code. (The same is technically true of the lexical scanner, but the list of existing actions is pretty generic.) In PyXup, you can specify actions to take, some of which take arguments which can be either constants or parts of the input stream matched by the associated pattern. This is a huge win, and I wish I'd thought of it when designing TagSoup. The question period involved both Henry and I giving answers to lots of the questions.
To wrap up we had, as we always have (it's a tradition) a talk by Michael Sperberg-McQueen. These talks are not recorded, nor are any notes or slides published, still less a full paper. You just hafta be there to appreciate the full (comedic and serious) impact. The title of this one was "Topic maps, RDF, and mushroom lasagne"; the relevance of the last item was that if you provide two lasagnas labeled "Meat" and "Vegetarian", most people avoid the latter, but if it's labeled "Mushroom" (or some other non-privative word) instead, it tends to get eaten about as much as the meat item). The talk was, if I have to nutshell it, about the importance of doing things rather than fighting about how to do them. At least that was my take-away; probably everyone in the room had a different view of the real meaning.
That's all, folks. Hope to see you in Montreal next August at Extreme Markup 2008.