2007-08-13

Extreme Markup 2007: Wednesday

This is a report on Extreme Markup 2007 for Wednesday.

The first talk of the morning was by David Dubin, and was about an alternative approach to reifying RDF statements so that one can make RDF claims about existing RDF statements (such as who made them, and where and when, and whether and how much you should believe them). The classical approach is to model the statement as a node, and then assert Subject, Predicate, and Object properties about this node. David's approach involves using the XML/RDF for the claim itself as the object of RDF claims. I can't say I understood what he was driving at very well: it seems to me that the main deficiency with RDF that makes reification necessary is that you can't state an RDF sentence without also claiming that it is true. This is convenient in simple cases, but annoying when you want to do meta-RDF.

Paolo Marinelli analyzed alternative approaches to streaming validation. W3C XML Schema provides what he calls a STEVE streaming discipline: at the Start tag, you know the Type of an element, and at the End tag, you can make a Validity Evaluation. XML Schema 1.1 proposes to provide various kinds of conditional validation using a subset of XPath 1.0, but does not (in the current draft) provide the full power of what is actually streamable in XPath.

The core of this paper is the classification of XPath expressions into various axes and operations, specifying when you can determine the value of the expression and at what memory cost ("constant" and "linear-depth" mean streamable, "linear-size" means not streamable). See Table 1 in the paper for the details.

Finally, Paolo proposes an extended streamability model called LATVIA, in which there are no restrictions on such XPaths, and schemas are marked streamable or non-streamable by their authors (or their authors' tools, more likely). The difficulty here is that implementors' tools that depend on knowing element types early will wind up being unable to process certain schemas, which will result in a process of negotiation: "Can't you make this streamable?" "Well, no, because ..." "That will cost you one billion stars."

Moody Altamimi gave a superb talk on normalizing mathematical formulae. There are two flavors of MathML, Presentation and Content; the former is about layout (it's close to LaTeX conceptually), the latter is meant to express meaning, and is closer to Lisp. Even in Content MathML, there will be lots of false negatives because of non-significant differences in the way authors express things: it's all one whether you say an integration is from m to n, or whether it's over the interval [m,n], but Content MathML uses two different notations for this. Similarly, we can presume that + represents an associative operation, and do some transforms to unify (+ a b c), (+ (+ a b) c), and (+ a (+ b c)). On the other hand, we don't want to go overboard and unify (+ a 0) with a; if the author wrote "a + 0", presumably there was a good reason.

The next talk I attended was about hierarchical processing in SQL database engines, and was the worst presentation of the conference: it was a marketing presentation rather than a technical one ("their products bad, our product good"), and furthermore it was a marketing presentation that was all technical details, and as such exceedingly boring. Furthermore, it assumed a detailed ANSI SQL background rather than an XML one. I'll read the paper, because I'm interested in the subject, but I'm not very hopeful.

Liam Quin of the W3C told us all about XQuery implementations, and which ones support what and how well, and what kinds of reasons you'd have for choosing one over the other, all without making any specific recommendations himself. He said that in general conformance was good across the major implementations, and performance depended too heavily on the specifics of the query to generalize.

At this point I got pretty burned out and skipped all the remaining regular talks for the day, except for Ann Wrightson's excellent overview of model-driven XML, why the instance documents are practically unreadable (too high a level of meta-ness, basically), and what can be done (not much, short of defining local views that map into the fully general modeled versions). I spent a fair amount of time in the poster room, though I didn't take notes. I also attended the nocturne that evening on URIs and names, where I defended the Topic Maps view of things with vim, vigor, and vitality.

2 comments:

Alex said...

Hi John ; I'd be interested in hearing more about the URI and naming session. Was it mostly about the PSI side of things, or the ambiguousity of URI naming, or something else? Or all of the above? Why did you have to defend the TM way? What's the opposing views? Etc.

Oh, and I wish I could have been at Extreme. *sigh* Sounds darn fun and interesting!

Danny said...

I don't know anything about David Dubin's alternative to RDF reification, but it doesn't seem that far removed from techniques I've seen discussed elsewhere.

Named graphs is the biggy - give the bunch of statements a URI, then make statements about that (this could exist online as an RDF/XML doc at that URI). This is pretty much the preferred approach of implementors, and is kind-of documented in the SPARQL spec, though there is a Jeremy Carroll paper on it.

Another approach (I've only seen used once) is to include the graph as the literal object of a regular statement.

Funnily enough this stuff is the subject of a thread currently on the semantic-web@w3.org list.

Funnily