2005-09-28

Thurrodowism

One of the differences between Thurrodowism and Christianity is that although both have an Old Testament and a New Testament, in Thurrodowism not only can the New Testament correct and update the Old Testament, but the Old Testament can correct and update the New Testament.

For example, the Younger Canon warns against enterprises that require getting dressed up, whereas the Elder Canon tells us that sometimes getting dressed up is exactly what people want to do. This is pointed out in the notes to Grandmother Little Bear Woman's "collation/ripoff" of the Elder Canon, The Way and The Power of the Way.

And if you ask how this can be true, I merely reply that Laozi was one hell of a clever bastard.

2005-09-24

The Song of the End-of-lifed

I posted this to xml-dev on a Friday afternoon back in 2003. No tune is required, but you can sing it to any of various tunes that fit the original if you want to.

We have fed our code with a thousand docs
  And still it cries unfed,
Though there's never a tag of all those tags
  But marks our hackers dead:
We have sent our best through the standards mess,
  Past the Borg and the Borg's own gull,
If blood be the price of SGML,
  By Charles, we ha' paid in full!

There's never a flood on the list we love
  But it trashes the posts we wrote;
There's never an ebb of sensible mail
  But the poets come out, quote unquote,
With their haiku (bad) and their limericks (worse),
  And their double dactyls too.
If blood be the price of XML,
If blood be the price of XML,
  By Jon, we ha' pushed it through!

So we'll parse our well-formed marked-up text,
  For that is our doom and our pride,
As it was when they punched on the 029
  So it is with the Code Worldwide.
Though we disagree upon every point
  To this one fact we can swear:
If blood be the price of XSLT,
If blood be the price of XSLT,
If blood be the price of XSLT,
  By James, we ha' paid it fair!
      --Not Rudyard Kipling

2005-09-23

Funny things

Well, that last post was #150, though I didn't notice that while posting it.

It's a funny thing, isn't it, that the Latin-based preface is English of several centuries' standing, whereas foreword, apparently quite Saxon, is in fact a calque of German Vorwort and was still being condemned by purists in the latter part of the 20th century?

Another funny thing, and a sort of footnote to my posting on doublets: the Greek root authent- came into English twice: once via Latin and French, giving us authentic, authentication and so on; and a second time via Turkish, giving us effendi 'a man of property, authority, or education in an eastern Mediterranean country'.

Trigrams

Trigrams are sequences of three letters in running text, counting space as a letter. I've had this table around for a while, but never publicized it on my web site index. Now I have.

The table is derived from the Brown Corpus, a million words of American English prose, and contains every trigram that appears at least once in every 10,000 words in that corpus (not once in every10,000 times, as the previous version had it).

2005-09-20

The Big Eight

Languages with more than a hundred million speakers, that is; not accounting firms, which in any case are now reduced to the Big Four thanks to mergers and malpractice.

Update: Blast it, Japanese has 122 million speakers. Why does Ethnologue claim there are only 8 languages in this range? Screws up the whole post. It makes me wonder if there are more hiding in the database but not readily accessible from the Web pages. Anyhow, thanks to JibberJim for the heads-up here.

Chinese. 873 million native speakers. The Big One, really; it's almost three times the size of Spanish and English, the next largest languages. It's also the most widely taught second language in the world, though most of the people who learn it as a second language live in China, and almost half of the second-language speakers use another Sinitic language as their mother tongue.

Spanish. 322 million native speakers. The widespread one: it's an official or national language in 21 countries plus New Mexico, a state of the U.S. It's truly remarkable how well this old colonial empire has held together linguistically despite vast political and geographical differences; though it's possible to tell where a Spanish-speaker comes from, there are no serious barriers to mutual intelligibility (though there are many jokes about it).

English. 309 million native speakers. English now belongs to the world: about 200 million people speak it as a second language with varying degrees of proficiency, making it the most widely taught foreign language. If we counted English as a Second Language as a separate language, it would make this list the Big Nine.

Bengali. 211 million native speakers, half of them in the small but very densely populated country of Bangladesh (which indeed is named after the language). Most people who speak Bengali, interestingly, are descended from second-language speakers, which gives it an unusual degree of consistency for such a large language.

Arabic. 206 million native speakers. This is the diverse one: "Arabic" is actually a cover term for over 30 closely related (but not always mutually intelligible languages), unified by Modern Standard Arabic, which is an updated version of Classical Arabic, the language of the Quran. Nobody actually speaks it, except for politicians making speeches, and even then the speeches have to be translated into the local Arabic language.

Hindi. 181 million native speakers. If you count the closely related Urdu (different script, different religion, different set of twenty-dollar words), tack on another 60 million native speakers. There are a lot of other languages in India, some of them quite large, and the term "Hindi" covers a lot of linguistic territory: it has been said that the local language in India changes significantly every 100 kilometers.

Portuguese. 177 million native speakers. Bizarrely, the Ethnologue calls this "a language of Portugal", even though only a tiny fraction of Portuguese speakers live there. The overwhelming majority, of course, live in Brazil, the other large non-Spanish-speaking American country.

Russian. 145 million native speakers. Here the much more recent colonial empire didn't hold together so well: except in Russia itself and in Kyrgyzstan, the language is not official anywhere, though there are plenty of Russian-speakers in what Russians call "the near foreign".

So how are we doing if we know all those languages (and hardly anyone does, I'll bet)? Well, we can now handle the native languages of just 40% of the world's population. That's how linguistically diverse the planet is. If you throw in the next 75 languages, with more than 10 million but less than 100 million speakers, you do much better, reaching 79%; and adding to that the 264 languages with more than a million but less than 10 million speakers, and coverage goes up to 93%.

After that, it's the Long Tail, with 6,565 languages with less than a million speakers, covering the remaining 7%. To be sure, sometimes native languages aren't everything: in the Pacific island nation of Vanuatu, with about 200,000 people, no single language has more than about 7000 native speakers, but almost everyone can handle at least some Bislama, an English-based creole (which itself has only 5000 native speakers).

Essentially all the data here comes from the 15th edition of the Ethnologue.

2005-09-10

The seven dirty words

George Carlin referred to the Seven Words You Can Never Say On Television as "Anglo-Saxon" (i.e. Old English). Piss, however, is not Old English at all; it's a straight borrowing from French. The agent ending -er in cocksucker and motherfucker is also a borrowing from French -ier, though some words ending in -arius, the Latin ancestor of -ier, entered Old English and now show -er as well.

Fuck and cunt are certainly not French, but don't look like Old English either, for phonological reasons; they are probably borrowings from other Germanic languages. There may be a remote connection to Latin cunnus, as in Latin and Englishcunnilingus.

The remaining two words (shit and tits) as well as the roots cock, suck, and mother, would be quite as clear to Anglo-Saxons (allowing for changes in pronunciation) as to us.

A few funny sayings

Glendower:
I can call spirits from the vasty deep.
Hotspur:
Why, so can I, or so can any man;
But will they come when you do call for them?

--Shakespeare, Henry IV, part 1, Act III, Scene 1

"Of course, a certain number of scientists have to go mad, just to keep the tradition alive."

--Matt Ruff, Fool on the Hill
"His movements could be called cat-like, except that he did not stop to spray urine up against things."
--Terry Pratchett, Night Watch
"From the way you attack your consonants as if they were an enemy swordsman and swallow your vowels as if they were a light snack, I would judge that you were raised in the East. Is that not so?"
--Sethra Lavode to Morrolan
--Steven Brust, Lord of Castle Black

From Cbits to Qbits

Here's a really excellent article (available in several formats) by the physicist N. David Mermin of Cornell. It's intended to teach computerniks just enough mathematical quantum mechanics to understand quantum computing. I now feel like I grok it myself. The "Where is ħ‽" section near the end is particularly clever.

Here's to the Duke, God bless her!

The Channel Islands sit in the English Channel off the coast of France, but they are not part of France. Their history and politics are most amazing things, and I can hardly do better than point you all to the Wikipedia article. So go there, read at least the first two sections, and then come back here for one final anecdote (I don't vouch for its accuracy, which is why I haven't merged it into Wikipedia).

When at the height of the Napoleonic Wars the U.K. attempted to provide Guernsey with a Smuggling Act, the States of Guernsey protested that such an act was against the constitution of Guernsey, and therefore could be of no effect there. After all, it was the Channel Islands — as part of Normandy — who had conquered England, not the other way about.

In the end, a compromise was eventually reached whereby all instances of smuggling in the Channel Islands would be tried solely in local courts, whereas any other instance of smuggling outside the U.K. would be tried in the county of Middlesex, as usual. (At the time, piracy and smuggling were, formally speaking, committed "on the high seas in the county of Middlesex".) As you can well imagine, there were no convictions, and the trade routes between England and France remained open.

2005-09-08

Tutorials at XML 2005

I'll be presenting two tutorials at XML 2005 in Atlanta.

On Friday, November 18, I'll be re-presenting the half-day tutorial "RESTful Web Services: building them without WSDL, SOAP, or tears" (OpenOffice|Powerpoint|PDF) that I gave at Extreme Markup Languages 2005.

On Monday, November 14, I'll be giving a full-day tutorial on XML schema languages. The first half will cover DTDs, RELAX NG, and Schematron; the second half will be devoted to W3C XML Schema. The first half shares a lot of material with the tutorial "RELAX NG: DTDs On Warp Drive" (OpenOffice|Powerpoint|PDF) that I've given in the past.

Show up, or else!

2005-09-05

The black guard and the house wives

Nowadays the word blackguard is a somewhat archaic insult, but the black guard were originally the kitchen servants, who were so-called because they had to deal with coal. As often, words for lower-class people become words for bad people: villain used to mean 'serf' (this meaning is usually spelled villein these days, but the words are the same), and before that it meant 'village-dweller'.

The narrator of Chaucer's Canterbury Tales says of the Knight:

He nevere yet no vileynye ne sayde
In al his lyf unto no maner wight.

which (in addition to being a spectacular example of Middle English multiple negation) means that the Knight was 1) never rude and 2) never behaved like a peasant.

The standard pronunciation of "blackguard" is "blaggard". It's typical for compounds to be pronounced as single words when they get established, and later to undergo sound-change as if they were single words. For example, English has created three separate compounds from house and wife: the modern formation housewife, the Middle English hussif (obsolete now, but still current in the 19th century) meaning 'sewing-kit', and the one dating back to Old English times, which now takes the form hussy.

2005-09-04

Yet another filk

Not many people have heard of the song Lillibulero today, though many have heard the tune on the BBC World News Service without knowing what it was, or that it's generally attributed to Henry Purcell. I was reading an article about the near-total loss of snow on Mount Kilimanjaro in Tanzania, made famous by the Ernest Hemingway story "The Snows of Kilimanjaro". Shortly thereafter, I found the chorus of this song dancing in my head. The classic BBC version is on YouTube; it has about the right tempo, but it's only one verse + chorus. You'll need to put it on Loop (right click on the video before starting it) if you want to sing along. The original lyrics, history, other versions, and much more are at Wikipedia.

Ah, brother mine, have you read the report?
Kilimanjaro's melting away.
The summers grow long, the winters grow short:
Kilimanjaro's melting away.

Chorus:

'Jaro, 'Jaro, Kilimanjaro,
All of your snows are melting away;
'Jaro, 'Jaro, 'Jaro, 'Jaro,
Kilimanjaro, melting away.

The rain it will fall, the storm it will storm,
Kilimanjaro's melting away.
Our planet is growing unpleasantly warm,
Kilimanjaro's melting away.

Chorus

There is a prophecy found in old books,
Kilimanjaro's melting away,
The world will be ruled by dimwits and crooks,
Kilimanjaro's melting away.

Chorus

And if we don't change our culture of waste,
Kilimanjaro won't be alone.
Antarctican ice will slip off its base,
Katrina will look like a weekend at home.

Chorus:

London, London, New York and London,
Bangkok and Singapore under the waves.
Global warming, global warming ‒
Hundreds of millions in watery graves.

'Jaro, 'Jaro, Kilimanjaro,
All of your snows are melting away;
'Jaro, 'Jaro, 'Jaro, 'Jaro,
Kilimanjaro, melting away.

2005-09-01

Dido. As in Queen of Carthage.

Long and long and long ago, my children, before the Internet became a haven for porn, there was a free email-based erotica server. If you sent an appropriately formatted email to louvre at dido dot fa dot indiana dot edu, you'd get back a story that had been posted to the Usenet group rec.arts.erotica. And all was well... until the postmaster at indiana.edu contacted the archive maintainer, and pointed out to him the large number of undeliverable emails that were coming in addressed to louvre at dildo dot fa dot indiana dot edu!

I don't think these addresses work any more, but I've obfuscated them anyhow. Why give the indiana.edu sysadmins even more spam to deal with? By the way, we're talking Indiana University here, not to be confused with Indiana University of Pennsylvania, so called because it's in a town with the unlikely name of Indiana, Pennsylvania (Jimmy Stewart was born there).

Blood groups and true parents

You can't respond anonymously any more. Just make up an identity. Sorry, but the spam got out of control.

Blood type calculator: enter blood types for both parents and find out possible blood types for a child, or blood types for one parent and one child and find out possible blood types for the other parent. Please use this first!

I got a letter two years ago from someone who very much wanted help disentangling her family history. She wrote to me:

I have just found my deceased father's blood group, and it has got me worried. I am AB, my brother is O, my sister is O, and so was my father. My mother was type AB, I think. So the burning question is, Is my father really my father?

I replied:

As you may know, you have two copies of every gene in each cell of your body, and you get one from your mother and one from your father.

For example (and to oversimplify a lot), there are two forms of the gene for eye color, one for brown and one for blue. If you have both genes for blue, you will have blue eyes; if you have one or two genes for brown, you will have brown eyes. I will write B for brown and b for blue. So blue-eyed people have bb genes, whereas brown-eyed people can have BB, Bb, or bB genes. (The gene from the father comes first, so Bb means you got brown from your father and blue from your mother.)

Two blue-eyed parents can only have blue-eyed children, whereas two brown-eyed parents can have blue-eyed children if both of them are of the Bb or bB types, and both happen to give their child the b gene. (You probably know some exceptions: I am one, because my father had blue eyes and my mother had brown ones, whereas my own eyes are blue. But looking closely shows that there are flecks of brown in my eye color; blue here means 100% true blue.)

Moving on to the ABO blood type system. There are three kinds of genes here, A, B, and O. The A gene will cause a person to have red blood cells with the A protein in them, and the B gene will cause a person to have red blood cells with the B protein in them. The O gene doesn't do either one. So if someone's genes are AA or AO or OA, they will have A protein and be of blood type A. Someone whose genes are BB or BO or OB will have B protein and be of blood type B. Someone whose genes are AB or BA will have both proteins and be of blood type AB. And finally, genes that are OO will have neither protein and be of type O.

In your case, your your father's, brother's, and sister's genes are OO. Your mother is AB or BA and so are you. Your mother gave you either an A or a B gene, and you had to get the other B (or A, as the case may be) from somewhere. Your father is OO, so where did the other gene come from?

But that's not the whole story. Your brother and sister are OO, and your mother could not give them an O gene (since she has only an A gene and a B gene), so where did their O genes come from? One possibility is that you're wrong in thinking your mother was AB.

The most probable explanations are adoption, sperm donation, or something else that makes you and your siblings have different genetic parents. A DNA test of you and your siblings, preferably both of them, will nail this very reliably, and I would encourage you to get one. It turns out that about 15% of human beings, on average, are mistaken about their genetic fathers.

There is another possibility. There is another gene known as the H gene, which comes in two varieties: H (working) and h (not working). (The whole issue of ABO and H versus h does not make any difference to health, of course.) Neither the A nor the B protein can be made in your body unless you have at least one H gene. So people who have hh in their genes always appear to have blood type O, because no A or B protein is being made in their bodies even though the A or the B gene might be physically present. So your father might actually have an A or B gene to give you even though his apparent blood type was O, if he also had hh. However, the h gene is quite rare and the hh combination even rarer, so this isn't a very likely explanation.

Finally, she added:

I would be grateful for any help you can give me. I will always love him either way; I just need to know.

I replied:

Of course. As an adoptive parent whose daughter has always known that she is adopted, I know that genetics has very little to do with how we feel about our children or how they feel about us.

Update:

This post has obviously struck a nerve: it has gotten more comment than anything I've ever written. If you are going to comment to ask a question, three things, please:

  1. The Rh blood types (+ and -) are separate from the ABO blood types. The only thing to say about them is that two - parents will always have - children; every other combination is possible.
  2. Look in the following A/B/O chart first. Find your mother's blood type across the top, your father's along the side, and your possible blood types in the box.

ABABO
AA or OAnyAny but OA or O
BAnyB or OAny but OB or O
ABAny but OAny but OAny but OAny but O
OA or OB or OAny but OO

Half a breath

The standard instructions for having a chest X-ray taken are to take a full breath and hold it, then press yourself against the X-ray plate and stand still. When I first got one taken, both the first and second attempts were unusable. An expert (there are geniuses in every field, as Richard Feynman says) was called in, who glanced at the film and told me "Only take half a breath." Apparently my lungs when fully inflated are larger than the standard chest X-ray plate!

It's spelled "colonel", but it's pronounced "kernel"

But why?

Well, because it's French.

But the French word colonel, amazingly for a French word, is pronounced exactly as it's spelled, with no r sound whatever.

The story turns out to be that the Italian word colonello, from Latin columnellus, the leader of a (military) column, got borrowed into French twice. The first time, it became coronel in French, possibly on the notion that it was from Latin corona 'crown' rather than columna.

The form coronel spread to English and Spanish before being replaced in French itself by a second borrowing from Italian, this time more correctly as colonel. The spelling, but not the pronunciation, of this second form then entered English, leaving us with l in the spelling and r in the pronunciation.

Go figure.

Yes, there is that joke

A German professor of philosophy once wrote a three-volume work on das Komische. For the rest of his days, whenever anyone said something funny, he would nod his head soberly and say "Ja, es gibt den Witz."

A diamond-shaped poem on XML schema languages

First, here's the poem:

WXS:
Complex, inflexible,
Boggling, wearying, proliferating,
Circumscribed, inadequate. Future-proofed, compartmentalized,
Systematizing, pleasing, sufficing,
Straightforward, simplified --
RNG!

The constraints on this kind of poem is that there are seven lines: lines 1 and 7 contain one noun, lines 2 and 6 contain two adjectives, lines 3 and 5 contain three present participles, and line 4 contains four past participles. I cheated slightly: inadequate is derived from a Latin past participle.

I decided to add a syllabic constraint as well: the syllable lengths in each line are 3, 6, 10, 15, 10, 6, 3, corresponding to a binomial distribution.