2005-06-29

"To Althea From Prison"

To Althea From Prison
Richard Lovelace (1618-1658)

When Love with unconfined wings
Hovers within my gates,
And my divine Althea brings
To whisper at my grates;
When I lie tangled in her hair
And fettered with her eye
The birds that wanton in the air
Know no such liberty.

When flowing cups pass swiftly round
With no allaying Thames,
Our careless heads with roses crowned,
Our hearts with loyal flames;
When thirsty grief in wine we steep,
When healths and draughts go free,
Fishes that tipple in the deep
Know no such liberty.

When, linnet-like confined,
With shriller throat shall sing
The mercy, sweetness, majesty
And glories of my King;
When I shall voice aloud how good
He is, how great should he,
The enlargèd winds, that curl the flood,
Know no such liberty.

Stone walls do not a prison make,
Nor iron bars a cage;
Minds innocent and quiet take
That for an hermitage:
If I have freedom in my love,
And in my soul am free,
Angels alone, that soar above,
Enjoy such liberty.

The famous lines (the first two of the last stanza) also express a legal truth: you can be "falsely imprisoned" without any sign of stone walls or iron bars.

Absolute and relative names

Before DNS was heard of, there was the Arpanet HOSTS.TXT file, which specified absolute names for the small subset of connected computers on the Arpanet, and then there was UUCP naming. Since UUCP-style names are dead or the nearest thing to it, I will explicate.

Once upon a time, if you weren't lucky enough to be on the Arpanet, you typically addressed your email something like this:

host1!host2!host3!...!hostn!username

Interpreted as routing instructions, this meant to pass the email to the computer named host1, which would pass it to the computer named host2, which ... would pass it to the computer named hostn for delivery to hostn's local user named username. Well and good.

But what was the source of these names: host1, host2, etc.? They had a purely local interpretation. Thus, host1 was the name of that particular computer according to your local computer, which had names for all the computers it could reach directly. And host2 in turn was the name of that second computer according to host1. Your computer might have a completely different name for host2, or no name at all. And as for hostn, that name might be known only to hostn-1. There was neither in principle nor in practice any central registry like DNS where these names could be looked up.

This system, while theoretically completely general, had an obvious practical difficulty: how did you give someone your email address short of actually sending an email? As an email traveled through the system, the sender's address was continuously transformed by prepending the local name of the last relay, thus making it possible to reply (provided naming was sufficiently reciprocal, as sometimes it was not). But how to specify it in the first place?

In practice, people tended to specify "bang paths" (bang = !) from certain "well-known hosts" such as ihnp4, seismo, ucbvax, decvax. (I name them so that old-timers can have the pleasure of seeing these hostnames once again.) So if you knew, as people generally did, the correct bang path from your system to the well-known hosts, you could prepend that to your intended recipient's partial bang path, and hopefully the mail got through.

There was some effort made to map the system as a directed graph with named links with attached costs, and a program called pathalias was used to rewrite the bang paths into cheaper routes with the same effect. But the mapping effort, being post hoc, never quite caught up with reality, and mail loops whereby hosta routed an email to hostb, which routed it back to hosta, were far from rare.

The DNS, on the other hand, grew out of the inability of all Arpanet (later Internet) hosts to keep up with the growth of the HOSTS.TXT file, and the administrative difficulties of managing such a large but flat absolute name space. Converting the name space to a hierarchy, and writing a distributed hierarchical implementation, made the DNS sufficiently scalable to support the many-orders-of-magnitude larger Internet of today.

For many a year, the domainists and bangists met and did intellectual battle on the fields of Usenet. Pathalias was even modified to understand bang paths that contained domain names, but bang paths were finally sent to the scrap heap by the invention of the ISP, which allowed even leaf computers to be effectively, if intermittently, on the Internet. And now though sometimes our email addresses are annoyingly long, or short but arbitrary, and even from time to time still contain the dreaded %-hack (which is not documented anywhere, including here), we can even put them on our business cards, along with those other absolute names, our postal addresses and worldwide telephone numbers.

Shall relative names return? Who is willing to stand up and be counted?

The Antipodes

The Antipodes are the English-speaking countries in the South Pacific Ocean: Australia, New Zealand, Papua New Guinea, and associated locations.

Ironically, not a single scrap of land is actually antipodal to Australia. This is pretty surprising, considering how big Australia is; it is as if a cylindrical core of the Earth had been pushed downunderward, leaving a depression in the North Atlantic Ocean and a rise constituting Australia. Bermuda is almost antipodal, but it actually fits into the bay at Perth.

The United States similarly represents a core that has been pushed upoverward; the only land antipodal to it is a small slice of Kerguelen Island, glacial and uninhabited, which is antipodal to part of Rhode Island (which is not an island). Hawaii is an exception.

Finally, Antarctica is antipodal to the Arctic Ocean, mostly, as if its core had been pushed southward. How to visualize these three cores intersecting at the center of the Earth: Well.

These facts are recycled from W.v.O. Quine's book Quiddities. (No, he didn't write a book called Haecceities, in case you're wondering.)

2005-06-28

Geolibertarianism in rhyme

There was a man named Henry George
Of land monop'ly quite the scourge.
Wealth, said he, is what we make
For profit or consumption's sake:
Stone axe, print book, or Jedi saber,
We make them with Capital, Land, and Labor.
The natural world, you understand,
Is what economists mean by "Land";
Including sea and sky and soil,
And iron, forests, coal, and oil.
"Capital"'s wealth used for production;
On "Labor" we need no instruction.
The return on Labor we call "Wages"
(All this is written on many pages,
Paper and Web; I can barely tap it all);
"Interest" is what's paid to Capital.
And those who by some accident
Own Land, we pay them what's called "Rent".
This maxim Henry carved in stone:
Ourselves, ourselves are what we own.
The products of our mind and hand
Are at no other man's command.
But what is not of our own making
Is anyone's at all for taking,
Provided (this point is due to Locke)
They leave enough so as not to block
Others from taking Land as good
That still is free, it's understood.
If this is so, then why endure
So many who are so very poor?
Simple: we've decided to pay
All Rent to those who (as we say)
By hook or crook have gotten hold
By being there first and being bold.

Then they, or their heirs, get to collect
(Waking or sleeping) what they expect
The traffic will bear. The rest of us
Must pay what they ask, without any fuss,
Or else go scratch -- for without Land
There is no scope for mind or hand.
The amount of Land is fixed, you know,
Higher prices won't make it grow.
(The self-same law we can construe
Of those who own ideas too.)
Why should the rest of us pay Rents
To their current recipients?
Their title's not one we should endorse,
It is commonly based on force.
Instead, community Rent collection
(The so-called "Single Tax") based on inspection
And assessment, would be just
And pay for services that we must
Have, like national defense,
Safe water, protection of innocence.
What's this to do with XML?
Quite a bit. You see, Ideas as well
As Land belong to all mankind.
If we technologists allow the blind
And greedy to enclose this space,
A commons of the human race,
We and our children will be paying
Forevermore (it goes without saying)
To Concept-owners, past all praying.
So by this maxim be impressed:
Use the tools that work the best.
Do not yield your sovereign judgement,
To any sort of political fudgement.
The criterion of sound design
Should be, must be, your guideline.
And if you're designing documents,
Try RNG. We charge no rents.

House numbering

In the United States, at least, house numbers have nothing to do with the Postal Service, but are assigned by city planning commissions and the like. How they do it, however, varies extremely.

In Manhattan, for example, where I live, streets run both east and west from the central spine, and are numbered starting at the spine and working outward. However, even numbers are always on the south side of a street. Manhattan avenues are numbered northwards; for details, see the Comments.

In Queens the houses were renumbered in 1926, inducing Queens resident Ellis Parker Butler (best known for the immortal "Pigs Is Pigs", to have this mnemonic rhyme published in the New York Times:

In Queens to find locations best —
Avenues, roads and drives run west;
But ways to north and south, 'tis plain
Are street or place or even lane;
While even numbers you will meet
Upon the west and south of street.

(You can sing it to the tune of "Little Brown Jug".)

What about the "100 house numbers per block" convention? This does not hold in the older parts of older U.S. cities (Manhattan does not obey it south of 8th St. or so), but is quite general in the U.S. as a whole.

In rural parts, it is not uncommon for houses to be neither named nor numbered; my house in the country has no "address" at all, and only post office boxes are provided (no mail delivery). Anyone who wants to reach me by snail (extremely snail) mail, can do so at:

Cowan
12017-0042
U.S.A.

2005-06-27

Careers for calves

A bull calf has four possible fates: he can become a veal calf, a steer, an ox, or a bull.

  • Veal calves are castrated and killed at about 16 weeks and eaten as veal.
  • Steers are castrated and killed while mature but still fairly young (14-16 months) and eaten as beef.
  • Oxen are castrated and used as work animals. When they die, the meat is too tough to eat, even in Big Mac format.
  • Bulls are not castrated and are used as studs. When they die, the meat is also too tough to eat and probably tastes funny to boot.

(We castrate most calves for two main reasons: it makes the animals docile and easy to handle, and testosterone affects the flavor of the meat, making it gamy and unacceptable to wishy-washy modern palates.)

The number of bulls and oxen is insignificant compared to the other types. Veal calves are generally from dairy herds, of which they are a basically unwanted byproduct.

More bogo-translations

Someone asked me who wrote one of my signature lines:

I amar prestar aen, han mathon ne nen, han mathon ne chae, a han noston ne 'wilith.

I replied that it was written by David Salo, and said by Cate Blanchett. In the book, it's Treebeard who says it, and in the Common Speech:

The world is changing; I feel it in the water, I feel it in the earth, I smell it in the air.

But the point of this posting is to show off two bogo-translations of the Sindarin version into English, made by members of the Conlang list:

  • I am priester than a priest, I have not run one marathon, and I do not wish to run the marathon, for I prefer to stay in my village.
  • I am one bitter priest, I have not a single mathom, I have no mathom to keep, and I have an unwilting mind.

How modern Burmese got two grammars

First there were the Buddhist scriptures in Pali, with Burmese interlinear translation, with conventional markers of number, tense, case, and mood corresponding to Pali ones, just like the 1SG and ACC and PL that linguists use nowadays.

Next were the same texts, but with the Pali original left out: a Burmese relexification of Pali, with conventional markers aforesaid. This is called "Nissaya Burmese": Burmese surface representations, Pali morphology and syntax.

Next were original works written directly in Nissaya Burmese.

Then came more original works in modified Nissaya, with some of the Pali markers left out. The more Pali-esque, the more high-toned the work was considered to be.

Next came writing in plain Burmese, but heavily influenced by Nissaya conventions. This is roughly the level of newspaper writing in Burmese today.

Next came elegant spoken Burmese, which was native morphosyntax with many conventions taken from prose style, which was itself a mixture of native and Nissaya.

Finally, colloquial spoken Burmese picked up many of these conventions as well.

It seems that one cannot fully describe the linguistic habits of the Burmese without using two sets of grammatical rules, one just like the grammar of Pali, the other more characteristic of colloquial Burmese. [Burling]

So are the most Nissaya-ish varieties really Burmese, or are they really Pali? Well, the grammar is surely Pali, but they are connected by an unbroken chain of mutually intelligible 'lects to colloquial Burmese, whereas Pali itself is completely shut off by an impermeable lexical barrier. (Burmese has borrowed many Pali morphemes, but not so much that Pali is even vaguely intelligible without learning it.)

Note that the Pali influence in Burmese has been going on for a thousand years: Burmese itself has evolved considerably from the form in which the interlinear scriptures were first recorded. Pali morphosyntax is now so deeply intertwingled in Burmese that we can only recover a "pure native" grammar from comparative Tibeto-Burman considerations.

References:

John Okell, "Nissaya Burmese", Indo-Pacific Linguistic Studies (Lingua 15), (Amsterdam: North Holland Publishing Company, 1965) [the original]

John Okell, "Nissaya Burmese", Journal of the Burma Research Society, 50 (1967) pp. 95-123 [expanded version]

Robbins Burling, Man's Many Voices (Holt, Rinehart and Winston, 1970), pp. 180-83 [the secondary source that I actually read]

2005-06-24

Giving food

As a resident of New York City, I think it worth pointing out that many homeless people are quite suspicious of gifts of unwrapped food, as there are sickos out there who try to hand out food that they have contaminated with something that will make the homeless person sick (ipecac and laxatives seem to be popular).

In my opinion, the better approach, if you want to give a homeless person food, is to offer to take them into a deli (they are all over the place here) and have them order a sandwich for which you pay. In addition to eliminating the above issue, and making sure that your money actually goes toward food and not something else, this also allows the person the dignity of choosing their own food.

Russian with a Russian accent

When I took a semester of college Russian back in 1975, my Russian teacher spent about the first week of the course teaching us to speak English with a Russian accent. The idea was to learn to do that so thoroughly that when we started to learn Russian, we would be able to speak that with a Russian accent also (i.e. perfectly). I have long since forgotten all the Russian I learned, but hey, my Russian-accented English is still pretty good.

The short ling

Boontling is the short ling that Boonters used to harp, 1880-1920. Only a few codgy harpers are still with us (the rest piked to the dusties long ago, luckily not before the ling was written down), but there is no question about the usability of Boont, as in this version of a familiar English-language nursery rhyme:

The eeld'm piked for the chigrel nook
      For gorms for her bahl beljeemer;
The gorms had shied, the nook was strung,
      And the bahl beljeemer had nemer.

Still, the Mendocino Mushroom Forager issue for 1980 (the "Boont Ite-Steak Greeley Sheet") contained this sentence by Chipmunk (mundanely Bob Glover), warning against ignorantly mistaking poisonous mushrooms for edible ones:

You must do much graymatterin fore pikin for seekin Ite steaks to gorm, cause the sockers might not be bahlers, but nonchers with dusties dust, so deek your bok well.

Note the slight syntactic difference between Boontling (shared with many non-standard English dialects) and standard English: "pikin for seekin" as opposed to English "walking/traveling to hunt".

"The Brightlighter's Jonnem" is a story in neo-Boont with an English translation.

From this hour to that hour

Someone asked for the etymology is of the French word encore (Italian ancora, Rumantsch aunc).

The bizarre but apparently uncontested etymology is Latin hinc ad horam '[from] this [hour] to [that] hour'. Hey, it's no worse than ampersand, which is from and per se and, as in the old style of reciting the alphabet, ending with "x y z and per se [by itself] &".

Update: The Romanian word încă mentioned in an earlier version of this posting is probably from Latin unquam.

2005-06-22

Grandmother Little Bear Woman on conflict

Modernist manuals of writing often conflate story with conflict. This reductionism reflects a culture that inflates aggression and competition while cultivating ignorance of other behavioral options. No narrative of any complexity can be built on or reduced to a single element. Conflict is one kind of behavior. There are others, equally important in any human life, such as relating, finding, losing, bearing, discovering, parting, changing.
     —Ursula K. LeGuin, Steering the Craft

"Do you think I've been lying about it? What do you take me for?"

From Roughing It, Mark Twain's 1872 fictionalized account of his life out West. The applicability of this is up to you.

The Admiral seldom read newspapers; and when he did he never believed anything they said. He read nothing, and believed in nothing, but "The Old Guard," a secession periodical published in New York. He carried a dozen copies of it with him, always, and referred to them for all required information. If it was not there, he supplied it himself, out of a bountiful fancy, inventing history, names, dates, and every thing else necessary to make his point good in an argument. Consequently he was a formidable antagonist in a dispute.

Whenever he swung clear of the record and began to create history, the enemy was helpless and had to surrender. Indeed, the enemy could not keep from betraying some little spark of indignation at his manufactured history — and when it came to indignation, that was the Admiral's very "best hold." He was always ready for a political argument, and if nobody started one he would do it himself. With his third retort his temper would begin to rise, and within five minutes he would be blowing a gale, and within fifteen his smoking-room audience would be utterly stormed away and the old man left solitary and alone, banging the table with his fist, kicking the chairs, and roaring a hurricane of profanity. It got so, after a while, that whenever the Admiral approached, with politics in his eye, the passengers would drop out with quiet accord, afraid to meet him; and he would camp on a deserted field.

But he found his match at last, and before a full company. At one time or another, everybody had entered the lists against him and been routed, except the quiet passenger Williams. He had never been able to get an expression of opinion out of him on politics. But now, just as the Admiral drew near the door and the company were about to slip out, Williams said:

"Admiral, are you certain about that circumstance concerning the clergymen you mentioned the other day?" — referring to a piece of the Admiral's manufactured history.

Every one was amazed at the man's rashness. The idea of deliberately inviting annihilation was a thing incomprehensible. The retreat came to a halt; then everybody sat down again wondering, to await the upshot of it. The Admiral himself was as surprised as any one. He paused in the door, with his red handkerchief half raised to his sweating face, and contemplated the daring reptile in the corner.

"Certain of it? Am I certain of it? Do you think I've been lying about it? What do you take me for? Anybody that don't know that circumstance, don't know anything; a child ought to know it. Read up your history! Read it up ―, and don't come asking a man if he's certain about a bit of ABC stuff that the very southern [epithet]s know all about."

Here the Admiral's fires began to wax hot, the atmosphere thickened, the coming earthquake rumbled, he began to thunder and lighten. Within three minutes his volcano was in full irruption and he was discharging flames and ashes of indignation, belching black volumes of foul history aloft, and vomiting red-hot torrents of profanity from his crater. Meantime Williams sat silent, and apparently deeply and earnestly interested in what the old man was saying. By and by, when the lull came, he said in the most deferential way, and with the gratified air of a man who has had a mystery cleared up which had been puzzling him uncomfortably:

"Now I understand it. I always thought I knew that piece of history well enough, but was still afraid to trust it, because there was not that convincing particularity about it that one likes to have in history; but when you mentioned every name, the other day, and every date, and every little circumstance, in their just order and sequence, I said to myself, this sounds something like — this is history — this is putting it in a shape that gives a man confidence; and I said to myself afterward, I will just ask the Admiral if he is perfectly certain about the details, and if he is I will come out and thank him for clearing this matter up for me. And that is what I want to do now — for until you set that matter right it was nothing but just a confusion in my mind, without head or tail to it."

Nobody ever saw the Admiral look so mollified before, and so pleased. Nobody had ever received his bogus history as gospel before; its genuineness had always been called in question either by words or looks; but here was a man that not only swallowed it all down, but was grateful for the dose. He was taken a back; he hardly knew what to say; even his profanity failed him. Now, Williams continued, modestly and earnestly:

"But Admiral, in saying that this was the first stone thrown, and that this precipitated the war, you have overlooked a circumstance which you are perfectly familiar with, but which has escaped your memory. Now I grant you that what you have stated is correct in every detail — to wit: that on the 16th of October, 1860, two Massachusetts clergymen, named Waite and Granger, went in disguise to the house of John Moody, in Rockport, at dead of night, and dragged forth two southern women and their two little children, and after tarring and feathering them conveyed them to Boston and burned them alive in the State House square; and I also grant your proposition that this deed is what led to the secession of South Carolina on the 20th of December following. Very well."

Here the company were pleasantly surprised to hear Williams proceed to come back at the Admiral with his own invincible weapon — clean, pure, manufactured history, without a word of truth in it.

"Very well, I say. But Admiral, why overlook the Willis and Morgan case in South Carolina? You are too well informed a man not to know all about that circumstance. Your arguments and your conversations have shown you to be intimately conversant with every detail of this national quarrel. You develop matters of history every day that show plainly that you are no smatterer in it, content to nibble about the surface, but a man who has searched the depths and possessed yourself of everything that has a bearing upon the great question. Therefore, let me just recall to your mind that Willis and Morgan case — though I see by your face that the whole thing is already passing through your memory at this moment.

"On the 12th of August, 1860, two months before the Waite and Granger affair, two South Carolina clergymen, named John H. Morgan and Winthrop L. Willis, one a Methodist and the other an Old School Baptist, disguised themselves, and went at midnight to the house of a planter named Thompson — Archibald F. Thompson, Vice President under Thomas Jefferson, — and took thence, at midnight, his widowed aunt, (a Northern woman,) and her adopted child, an orphan — named Mortimer Highie, afflicted with epilepsy and suffering at the time from white swelling on one of his legs, and compelled to walk on crutches in consequence; and the two ministers, in spite of the pleadings of the victims, dragged them to the bush, tarred and feathered them, and afterward burned them at the stake in the city of Charleston. You remember perfectly well what a stir it made; you remember perfectly well that even the Charleston Courier stigmatized the act as being unpleasant, of questionable propriety, and scarcely justifiable, and likewise that it would not be matter of surprise if retaliation ensued. And you remember also, that this thing was the cause of the Massachusetts outrage. Who, indeed, were the two Massachusetts ministers? and who were the two Southern women they burned?

"I do not need to remind you, Admiral, with your intimate knowledge of history, that Waite was the nephew of the woman burned in Charleston; that Granger was her cousin in the second degree, and that the woman they burned in Boston was the wife of John H. Morgan, and the still loved but divorced wife of Winthrop L. Willis. Now, Admiral, it is only fair that you should acknowledge that the first provocation came from the Southern preachers and that the Northern ones were justified in retaliating. In your arguments you never yet have shown the least disposition to withhold a just verdict or be in anywise unfair, when authoritative history condemned your position, and therefore I have no hesitation in asking you to take the original blame from the Massachusetts ministers, in this matter, and transfer it to the South Carolina clergymen where it justly belongs."

The Admiral was conquered. This sweet spoken creature who swallowed his fraudulent history as if it were the bread of life; basked in his furious blasphemy as if it were generous sunshine; found only calm, even-handed justice in his rampart partisanship; and flooded him with invented history so sugarcoated with flattery and deference that there was no rejecting it, was "too many" for him. He stammered some awkward, profane sentences about the ― Willis and Morgan business having escaped his memory, but that he "remembered it now," and then, under pretence of giving Fan [his dog —JC] some medicine for an imaginary cough, drew out of the battle and went away, a vanquished man.

Then cheers and laughter went up, and Williams, the ship's benefactor was a hero. The news went about the vessel, champagne was ordered, and enthusiastic reception instituted in the smoking room, and everybody flocked thither to shake hands with the conqueror. The wheelman said afterward, that the Admiral stood up behind the pilot house and "ripped and cursed all to himself" till he loosened the smokestack guys and becalmed the mainsail.

See also "The Person Sitting in Darkness", a 1901 account of U.S. imperialism.

2005-06-20

The folk process

I got curious about a song half-remembered from my childhood and spent a few hours tracking it down. It makes a marvelous example of the folk process at work, as well as what happens to Irish when the Americans (even those of Irish or Scots-Irish descent) get a hold of it.

The original song is "Shule Aroon", and the first verse and chorus look like this:

I would I were on yonder hill
'Tis there I'd sit and cry my fill,
And every tear would turn a mill,
Is go dtéidh tú, a mhuirnín, slán!
Siúl, siúl, siúl, a rúin!
Siúl go socair, agus siúl go ciúin,
Siúl go dtí an doras agus éalaigh liom,
Is go dtéidh tú, a mhuirnín, slán!

On arrival in the colonies, the song split into two versions. The better-known one shed its Irish altogether, acquired a Revolutionary War motif, and became:

Here I sit on Buttermilk Hill,
Who should blame me cry my fill?
And every tear would work a mill,
Johnny has gone for a soldier.

Buttermilk Hill is in Westchester County, New York; supposedly dairy cattle were hidden there during the Revolution to protect them from raiders from either side. The tune changed too, but all versions can be sung to all tunes, so I ignore this.

But in the southern U.S., where there were lots of Irish and Scots-Irish people, the Irish was retained in singing but its meaning was forgotten and its phonetics garbled. This version was collected in Arkansas in 1958, when I was busily being born:

Well, I wish I was on yonders hill
There I'd set and cry my fill
Every drop would turn a mill
Ish come bibble ahly-boo-so-real.
Shule-shule-shule-a-roo
Shule-like-shagrus-spilly-bolly-qule
First time I saw spilly-bolly-eel
Ish come bibble ahly-boo-so-real.

Not too much later, I learned the "Buttermilk Hill" version but with the following chorus:

Shool, shool, shool a rool,
Shool a rack-a-shack, shool-a-barbecue,
When I saw my Sally-baba-yeel,
Come bibble in the boo-shy laurie.

And so over the past 200+ years, Irish has slowly turned to complete gibberish.... Ghu only knows what will happen to the song if Americans keep singing it for the next 200 years!

Update: The Irish means roughly "Go, go, go, my love / Go quietly and go peacefully / Go to the door and fly away from me / And may you go safely, my darling."

Writing Chinese

This was written in response to someone who wondered how much the common Chinese writing system helped people communicate across the various Chinese languages (usually called "dialects").

Until about 1915 almost all Chinese writing was done in Classical Chinese, using conventions utterly divorced from those of any of the eight to twelve living Sinitic languages. The nearest Western analogy would be the 18th century, when most learned works were still written in Latin, but read -- by translation -- in the local vernacular. Subject matter aside, any written text of 1900 would be perfectly intelligible to a literate Tang Dynasty person of more than a thousand years before. Contrariwise, it would be literally impossible to read any written document (except for a few marginal cases) word for word in any modern language whatever and produce anything but nonsense.

As one of the many knock-on effects from the collapse of the Qing Dynasty in 1911, a widespread tradition of writing arose using the lexical and grammatical conventions of modern Peking Mandarin. This made life much easier for the large majority who spoke Mandarin, and who had most of the political and economic power as well. The new baihua ('plain writing') style, however, was almost as artificial for Southerners speaking non-Mandarin languages as Classical Chinese had been. Rather than having to learn Middle Chinese in order to become literate, they now had to learn modern Mandarin. After a period of confusion in language matters, the PRC government nailed down the Mandarin language and the Peking pronunciation standard as official in 1956. It was optimistically renamed putonghua 'the common language'.

By the 1980s, knowledge of Mandarin had become widespread in the South in all public matters. It is the language of schooling past the first year or two, and it is now possible for non-locals to get along with only the standard language, which was certainly never true before. Learning the standard language, however, is not thought of as "language instruction"; what is learned, explicitly, is reading and writing: speaking and understanding are treated as a by-product of this. Similarly, non-locals who must learn a non-Mandarin language think of it more like adapting to local speech habits rather than learning a truly foreign language like English.

It's interesting to note that the native alphabet of Chinese, Zhuyin Fuhao (or informally Bopomofo) was first used to show pronunciations in the official post-dynastic dictionary of Mandarin, the 1919 edition. The spellings attempted to preserve as many distinctions as possible, not only in the dialects of Mandarin, but across the non-Mandarin Sinitic languages as well. The result was a sort of pseudo-Chinese that resembled nothing ever heard before, and that no one except the great Chinese phonetician Yuen Ren Chao was ever able to pronounce. He made a set of records demonstrating the new official pronunciation, but they were hopeless for pedagogical purposes. In the end, the 1932 revision abandoned the attempt, and recorded the actual pronunciations of the Peking dialect.

2005-06-17

Your rights when you buy a book

When you buy a book, you have the right to read the book silently or out loud (but not to an audience), you can act on the information it gives you, you can study it to see how it is written in the hope of writing a better book yourself, you can even write a different book based on the same facts expressed differently, and scurvily give the author no credit whatever. You may write and publish a review praising or condemning the book in almost unlimited terms.

On a less intellectual plane, you may set the book on fire, or use it to insulate your basement or to check erosion in a gully. You may give or sell it to anyone you please, or leave it around in public (absent littering laws) for the delectation of the next person to pick it up. You may lend it to your friends or the public, though you may have problems if you take money for this.

All of this applies equally to movies, sound recordings, sculptures, magazines, computer programs, and any other copyrightable works. For computer programs, you also have the (U.S.) statutory right to make copies reasonably necessary for the use of the program or for backup.

Copyright gives the author five and only five rights:

  • to control the making of copies
  • to control the making of derivative works
  • to control the distributing of copies and derivative works
  • to control the public performance of the work
  • to control the public display of the work

(There may also be "moral rights" that depend on the country.)

Let's keep it that way.

The art of making order

As the chief cook and former chief bottle-washer (my daughter has replaced me at that function) in my house, I love Le Guin's term for drudge-work: "the art of making order where people live". We are now paying someone once a week to do the physical order part, since Gale's not physically up to it any more (back troubles), but she still does all the organization and strawbossing.

I long ago devised the following classification of males into four grades:

  • Grade 1: Will not do housework, period.
  • Grade 2: Still will not do housework: feels guilty over it.
  • Grade 3: Does anything you (fem.) ask, more or less cheerfully.
  • Grade 4: Sees what needs to be done and does it.

Most men are still 1s or 2s, I am a 3 to a 4, depending on what's going on.

The real 221B Baker Street

The house with its seventeen steps that Dr. Watson called "221B Baker Street" was in fact officially known as "30 York Place", but York Place was a short street joining Baker Street and Upper Baker Street, later relabeled. (York Place ran from Paddington Street/Crawford Street north to Marylebone Road.) There were and are lots of other York this-n-thats on the London map, so "York Place" without specifying "Baker Street" would have been hopelessly ambiguous. The modern (post-1930) number is 111 Baker Street.

Holmes's house was definitively identified by one Dr. Gray Chandler Briggs, based on his chance discovery around 1930 of a building actually bearing the plaque "Camden House", which we are told in "The Adventure of the Empty House" stands directly across from Holmes's building.

(Doyle claimed this was a total coincidence, and said he had not been in Baker Street for at least thirty years -- but from the Holmesian point of view, the identification is far too satisfying to give up.)

Threes

3s (to be sung by Niels Bohr)

I think that I shall never c
A # lovelier than 3;
3 < 6 or 4,
And than 1 it's slightly more.

All things in nature come in 3s,
Like ∴s, trios, Q.E.D.'s;
And $s gain more dignity
When thus augmented: 3 × 3.

A 3 whose slender curves are pressed
By banks, for compound interest;
Oh would that, paying loans or rent,
My rates were only 3%!

3² expands with rapture free,
And reaches toward ∞,
3 complements each x and y
And intimately lives with π.

A circle's # of °s
Are best ÷d up by 3s,
But wrapped in dim obscurity
Is √-3.

Atoms are split by men like me,
But only God is 1 in 3.
     --John Atherton

Why we all hate normalization checking

Posted to the XML Core Working Group mailing list before XML 1.1 became a W3C Recommendation:

There are two kinds of people who didn't want to make XML 1.1 require normalization checking: the Lazy Document Generators and the Lazy Parser Programmers.

Lazy Document Generators want to be able to spew their random Unicode cruft straight into XML documents without worrying about what semantics it might have, and recreate said cruft at the receiver exactly as sent. The fact that the document might contain one million consecutive COMBINING CIRCUMFLEX ACCENTs bothers them not in the least. It's someone else's problem.

Lazy Parser Programmers don't want to bother to put together the necessary few lines of code, according to a well-documented algorithm, to check that documents do not contain gratuitous decompositions like LATIN SMALL LETTER A followed by COMBINING CIRCUMFLEX ACCENT, when obviously LATIN SMALL LETTER A WITH CIRCUMFLEX is what everyone has in their Latin-1 fonts and keyboards, and so is likely to expect. What do they care if their users go blind poring over hex dumps of their documents, trying to figure out where the discrepancy comes from? It's someone else's problem.

LDGs don't want it to be the case that "it is an error" (not necessarily detected) for a document to violate normalization. LPPs don't want to require parsers to check normalization at user option, since then they have to write the code even if it is not used much of the time. The Core WG will have to decide whether to p*ss off one group, both, or neither.

Of course, there are also XML 1.0 Forever types, who sit on xml-dev and chant "No Change! No Change!". X1Fs demand to be paid only in paper money.

This discharges my action.

The upshot was that XML 1.1 parsers SHOULD check for normalization but don't have to.

Naming things, or, Plan ahea

The naming of stars is a difficult matter, not one of your everyday holiday games. The sky is divided into 88 arbitrary areas of varying size called constellations, and ordinary stars are named in order of brightness by a Greek letter followed by the name of the constellation (in Latin, traditionally in the genitive case). Thus Alpha Centauri is the brightest star in the constellation of the Centaur, and Tau Ceti is the 19th brightest star in the constellation of the Whale. This system is fairly simple and rational, since stars are naturally going to be discovered in order from brightest to dimmest, as telescopes become more powerful.

(From the 25th brightest star on, numbers (which are assigned by location in the sky, not by brightness) are used, and for obscure stars, numbers in star catalogues.)

Stars which are, for any reason, of variable brightness don't fit neatly into the rank order. If the star already had an ordinary name before its variability was noticed, it keeps it. Otherwise, variable stars are given Latin-letter names in order of discovery, followed again by the name of the constellation, thus: R, S, T, U, V, W, X, Y, Z. S Doradus, for example, one of the most luminous (and bizarre) stars known, is the second variable star discovered in the constellation of the Dolphin. (The reason for beginning with R is to avoid colliding with the letters designating spectral types, which are O, B, A, F, G, K, M — but that's a story for another day.) All was well.

But some constellations were found to contain more than nine variable stars.

No problem: astronomers went to two Latin letters for the 10th star onwards, thus: RR, RS, ... RZ, SR, SS, ..., SZ, TR, ... ZZ. All was well.

But some constellations were found to contain more than 90 variable stars.

No problem: astronomers wrapped around the Latin alphabet, thus: AA, AB, ... AZ, ..., BA, ... QZ, omitting the letter J (most of this system was invented in Germany, which was still on Fraktur at the time). All was well.

But some constellations were found to contain more than 334 variable stars.

Two-letter sequences beginning with R-Z had already been been used at an earlier stage, so RA, ... RQ, SA, ..., SQ, ... ZQ were rejected. Instead the final stage of nomenclature became (at very long last) V335, V336, .... Which could and should have been done in the first place instead of the fourth place.

Caveat nomenclator.

Grammatical gender has its advantages

Grammatical gender is a pain in the ass for people who speak English (or Turkish, or ...) to learn. But it can haveadvantages, if only by accident. Consider the English sentence:

He removed the manuscript from the briefcase and cast it into the sea.

What went into the sea, the manuscript or the briefcase?

In French, these words happen to differ in gender, so the pronoun neatly disambiguates the sentence:

Il retira le manuscrit de la serviette et le jeta dans le mer.

has "le jeta", meaning that the manuscript (masc.) went in the drink, whereas

Il retira le manuscrit de la serviette et la jeta dans le mer.

has "la jeta", meaning that it's the briefcase (fem.) that got drowned.

Of course, there are other ways to express this distinction. In Bislama, the English-based creole of Vanuatu (a Pacific island nation), you'd say

Hem i tekemaout pepa long kes blong hem, hem i sakem long solwota.
to dunk the manuscript, whereas the briefcase goes under with:
Hem i tekemaout pepa long kes blong hem pastaem, nao hem i sakem kes blong hem long solwota.

These will be easier for people who read Standard English to read if I respell them thus:

Him he take'em-out paper belong case belong him, him he chuck'em belong saltwater.

and

Him he take'em-out paper belong case belong him past-time, now him he chuck'em case belong him belong saltwater.

Note the difference between blong, which is specifically possessive (the case "belongs" to him) and long, which is a general-purpose preposition, both from English belong. English can use "of" for both, but Bislama sharply distinguishes them.

The English and French versions are by Willard van Orman Quine; the Bislama version by Jacques Guy.

The strange case of the word "cell"

Linguistic borrowing is a funny thing.

Old English borrowed "cell" from Latin CELLA, in the sense of a small room (e.g. a monk's cell). Modern English, too, has the word "cell" in the same sense; the more frequent sense "smallest unit of an organism" is derived from it. But Modern English "cell" cannot be the descendant of Old English "cell".

Old English "c" originally always represented the sound [k], but underwent palatalization before the front vowels "i" and "e" to [tʃ], the sound written "ch" in Modern English. This happened within the Old English period itself, though not reflected in the spelling until much later. For example, the analogous Old English borrowing "cist" < Latin CISTA appears in Modern English as "chest". So if "cell" had survived into Modern English, it would be spelled "chell" and pronounced accordingly: [tʃɛl].

In French, of course, original Latin [k] was similarly palatalized, but to [s], which accounts for most of the words written with "c" and pronounced [s] in English today. Since English "cell" is indeed pronounced [sɛl], it must be a French borrowing that replaced the inherited form [tʃɛl].

"Every word has its own story."

A song from Patience

A magnet hung in a hardware shop,
And all around was a loving crop
Of scissors and needles, nails and knives,
Offering love for all their lives;

But for iron the magnet felt no whim,
Though he charmed iron, it charmed not him;
From needles and nails and knives he'd turn,
For he'd set his love on a Silver Churn!

CHORUS: A Silver Churn!

A Silver Churn!

His most aesthetic,
Very magnetic
Fancy took this turn—
"If I can wheedle
A knife or a needle,
Why not a Silver Churn?"

CHORUS: His most aesthetic,
Very magnetic
Fancy took this turn—
"If I can wheedle
A knife or a needle,
Why not a Silver Churn?"
     —W.S. Gilbert

2005-06-16

Macaronics

And now, for your listening pleasure, some Latin/English macaronics. The second one works better if given the "old", or English, pronunciation, saying "Bi" like "bye".

Malum Opus

Prope ripam fluvii solus
A senex silently sat;
Super capitum ecce his wig,
Et wig super, ecce his hat.

Blew Zephyrus alte, acerbus,
Dum elderly gentleman sat;
Et a capite took up quite torve
Et in rivum projecit his hat.

Tunc soft maledixit the old man,
Tunc stooped from the bank where he sat
Et cum scipio poked in the water,
Conatus servare his hat.

Blew Zephyrus alte, acerbus,
The moment it saw him at that;
Et whisked his novum scratch wig,
In flumen, along with his hat.

Ab imo pectore damnavit
In coeruleus eye dolor sat;
Tunc despairingly threw in his cane
Nare cum his wig and his hat.

     L'envoi

Contra bonos mores, don't swear,
It is wicked, you know (verbum sat.),
Si this tale habet no other moral,
Mehercle! You're gratus to that!
     —J.A. Morgan

Motor Bus

What is it that roareth thus?
Can it be a Motor Bus?
Yes, the smell and hideous hum
Indicat Motorem Bum!
Implet in the Corn and High
Terror me Motoris Bi:
Bo Motori clamitabo
Ne Motore caedar a Bo —
Dative be or Ablative
So thou only let us live:
Whither shall thy victims flee?
Spare us, spare us, Motor Be!
Thus I sang; and still anigh
Came in hordes Motores Bi,
Et complebat omne forum
Copia Motorum Borum.
How shall wretched lives like us
Cincti Bis Motoribus?
Domine, defende nos
Contra hos Motores Bos!
     —A. D. Godley

The dangers of heuristic instantiation

Hypothetical case: A customer sends a large sell order in a falling market to its broker B, using a pre-agreed SDV. Due to data corruption, the order as received by B does not conform to the SDV. B, using heuristic processing, interprets the sell order as a buy order, and processes it as such, to the great financial detriment of A.

In fact this actually happened. Back in the 19th century, a customer in San Francisco telegraphed an order to its New York broker in code. Unfortunately, the codeword for "sell" was BAY (dah-di-di-dit di-dah dah-di-dah-dah). A duplicated telegraphic "dit" transmuted this to BUY (dah-di-di-dit di-di-dah dah-di-dah-dah), which was interpreted by the broker as plain English and executed as such, though the rest of the telegram was in code.

If the broker had observed the then-and-now standard policy with coded telegrams, namely not to act on them in the presence of coding errors, the order would have been rejected and the customer would not have suffered.

The customer tried to recover from Western Union, which in accordance with its standard contract, was held liable only for the cost of the corrupted telegram (about $3 at the time), not for the consequential damages. If the sender had requested (for an additional half-price) for the telegram to be repeated back from New York to San Francisco, Western Union would have been liable. (Error-correcting code had not yet been invented.)

Source: David Kahn, The Codebreakers.

Ozymandias

I met a traveller from an antique land
Who said:—Two vast and trunkless legs of stone
Stand in the desert. Near them on the sand,
Half sunk, a shatter'd visage lies, whose frown
And wrinkled lip and sneer of cold command
Tell that its sculptor well those passions read
Which yet survive, stamp'd on these lifeless things,
The hand that mock'd them and the heart that fed.
And on the pedestal these words appear:
"My name is Ozymandias, king of kings:
Look on my works, ye mighty, and despair!"
Also the names of Emory P. Gray,
Mr. and Mrs. Dukes, and Oscar Baer
Of Seventeen West Fourth Street, Oyster Bay.
     --Morris Bishop

On constructive criticism

There is a pervasive equivocation on "constructive" (as in criticism) that is typically overlooked.

When people ask for "constructive criticism", they mean both 1) "helpful", and 2) "assisting in the further construction of the work". Unfortunately, often the most helpful criticism that can be given is that the work is misconceived or misexecuted, and should be abandoned or restarted. Actually saying this leads to such howls for "constructiveness" that many people won't even try.

Here are some fine examples of constructive criticism in that sense:

  1. When carving a statue of an elephant, one must start with a block of material and cut away anything that doesn't look like an elephant. You have already cut away too much.

  2. A reply to a review, by the musician Max Reger (also attributed to many others):

    I am sitting in the smallest room in my house. Your review is before me. Soon it will be behind me.
  3. In the 18th century, an anonymous playwright sent a English theatre manager his verse tragedy on Mariamne, the wife of King Herod. The play, to say the least, did not meet contemporary standards of commercial value. The manager therefore published the following epigram:

    Poet, whoe'er thou art, God damn thee;
    Go hang thyself, and burn thy Mariamne.

2005-06-15

The Pony Express and camels too

The nominal speed of the Pony Express was 60 miles in 6 hours on 6 horses, which over the 1966-mile run should have been about 8 days 5 hours. (60 miles ~ 100 km.) Each horse thus went 20 miles a day, there and back again. Two minutes was allowed for changing horses. Actual end-to-end latency was about 10 days, 12 to 16 days in winter.

How was the Pony Express as a communications network? Throughput was 1 rider in each direction per day carrying a theoretical maximum of 640 half-ounce letters and an average of 56 (but small packages were also carried, so this figure is inflated). The 165 stations were covered by 400 horses, only 2.4 horses per station on average.

The Pony Express operated for only 18 months in 1860-61, charging $5 per half oz (14 g), which would be about $110 today. Although the price was later reduced to $1 ($22 today), the service was a flop, businesswise. However, only one rider was killed and only one mailbag lost.

For comparison, Arabian camels can do 100 miles (161 km) on a good day, though I don't know how long they can keep it up. Bactrian camels, OTOH, can carry up to half a tonne at standard pace, and can do it in 140 F (60 C) temperatures or arctic-like conditions (on the Tibetan plateau) almost as well. Camel endurance races run about 27 miles (44 km) a day and last for 15 days, no camel-changing allowed.

The U.S. Army tried Arabian camels in the American Southwest as a cavalry mount. The experiment was not a success, because a camel's feet are better adapted for the Sahara's soft sand dunes than for the rocks and thorns of Arizona. They kept going lame. The U.S. Civil War, plus a program of road-building, had a lot to do with aborting the experiment too.

There'll never be a Camel Express, but they're hard to beat for hauling large loads overland, and it's no wonder that camels made wheeled vehicles obsolete in their areas of use, until the coming of internal combustion. Camels rule. On the other hand, camels also have nasty dispositions, and they stink.

Dostoyevsky pessimality

Pareto optimality is the economic condition in which it is not possible to make anyone better off without making at least one person worse off. By the same token, I introduce the term "Dostoevsky pessimality" for the condition in which everyone is so badly off that it is not possible to make anyone worse off without making at least one person better off.

The English horn (cor anglais in French) is an alto oboe with two bends in the sound pipe (unlike the ordinary oboe, which is straight). It is supposed that its name was originally cor anglé, the bent horn, and was changed in French by folk etymology, which was then translated into English. There is no documentary proof of this, however. The name of the oboe is also interesting. It is Italian in origin, and came into English in the usual way, by copying the spelling and applying an English pronunciation /owbow/ to it; the French version was hautbois, which was at the time /o:bwe/, very like the Italian pron. (English took up "hautboy" for a while but eventually abandoned the word.)

When building a house, a single foundation is a Good Thing. The world is round, though, and the universe is a hypersurface whose circumference is everywhere and its center nowhere. If two walk together, and one should fall, the other may lift him up. Or consider Escher's "Drawing Hands". Or even more simply: ☯.

Two are one, life and death, lying Like lovers together in kemmer Like hands joined together the end and the Way. --Ursula K. Le Guin, "Tormer's Lay"

"I published a response arguing that Higginbotham was entirely wrong about the facts, and he replied indignantly ... that I was completely wrong about him being wrong, and naturally I believe that he is completely wrong about me being wrong about him being wrong. (These things tend to drag on; in future work, Higginbotham will argue that my eyes are too close together, and I will argue that, on the contrary, his head is too round.)" --Geoffrey Pullum (citations omitted)

Within the appropriate range of mechanical forces, steel is far more elastic than rubber. The difference between a finger-tight nut (on a bolt) and a wrench-tightened one is precisely the elastic takeup of the metal, which is why you never make a ceramic nut (e.g. on a toilet seat) more than finger-tight: it will crack instead of deforming elastically.

Hideous Headline: CLUB FIGHT BLOCKS RAIL RIVER TUBE PLAN. This unpacks to "A dispute in a (political) club is preventing a plan for building a railroad tunnel under the river from coming to fruition."

"It is chiefly through books that we enjoy intercourse with superior minds." --William Ellery Channing

If you have any European ancestors at all, you are descended from Charlemagne. He lived about 40 generations ago, and is known to have living descendants, so his line has not died out. A person with European ancestry has about 239 ~~ 1012 theoretical male ancestors from his time. However, the male European population was only about 15 million, so there is a great deal of convergence. So , the probability that any one of those 240 possible male ancestors is not Charlemagne is about 1 - 1/1.5E8, or .0.999999933. But the probability that all of them are not Charlemagne is vanishingly small: 0.999999933239, or about 1E-15000.

More tidbits of truth and fiction

I decided this was too long and moved the top half to another posting.

The state of New York, where I live, is divided into counties, and each county is fully partitioned into towns, cities, and Indian reservations. The term "village" also has legal significance: a village must be contained in a county, can overlap various towns, but cannot overlap any cities. A city can unilaterally (if the state legislature agrees) incorporate any towns adjacent to it, but cannot incorporate an adjacent city without a reciprocal agreement. Therefore, New York City is completely surrounded by technical cities, some of them quite small.

It's been said that technology cannot solve business problems, but automobile technology solved the business problems of the buggy-whip manufacturers rather well. Internal-combustion buggy whips save the industry, but with an engine in the handle, the whips were too unbalanced and hit the horses too hard.

In Latin class, I once said something to the effect that Caesar won such-and-such a battle multo equo 'with much horse'. Priscian's head shrank and cracked. It should have been multis equis, of course -- 'with many horses'. I'm still blushing.

Roses are red,
violets are blue,
I have D.I.D.,
and so do I.

There are basically five known roots of the writing-system tree: Egyptian hieroglyphics (which led to the Semitic syllabaries), Chinese characters, cuneiform, Cretan (Linear B etc.), and Mesoamerican. In some cases, we are dealing with "stimulus diffusion" rather than direct inheritance: the idea of writing Georgian must have been derived from the Greek alphabet, as the order of the letters shows, but their actual shapes seem to owe nothing to Greek script.

English is full of idioms involving the word "Dutch". Here's a few: Dutch anchor 'a useless object (archaic)', Dutch uncle 'someone who talks to you patronizingly', Dutch treat 'each pays for himself, thus not a treat at all', Dutch auction 'the price is lowered until someone bids', if that's so I'm a Dutchman '[emphatic negation]', Dutch courage 'courage induced by alcohol', double Dutch 'jargon'.

I'm glad to see that H. L. Mencken's rewrite of the U.S. Declaration of Independence in spoken American made it to the Net.

At International Falls, Minnesota, on the Canadian border in the heart of the heart of North America, the annual temperature range is about -25 C to +25 C, with the records (not in the same year) being -51 and +46. Average precipitation days per year is only 132, certainly far from rain-forest conditions, but still involving plenty of rain (660 mm per year) and snow (1524 mm per year). International Falls is distinctly noticeable on the U.S. weather map as usually being at the center of the most brightly colored spot.

My wife, many moons ago, worked for GTE in Florida as a long-distance operator — this was before, or nearly before, subscriber-dialed long distance. People typically asked for numbers by city and local number, or even city and subscriber name, and she set up the trunk connection. When she moved to Denver, she got the same job with Mountain Bell. One day, a subscriber asked to be connected with such-and-such a number in "Hollywood". He meant Hollywood, California, of course, but out of habit my wife promptly forwarded him to the same number in Hollywood, Florida. And there was much confusion.

URIs, URNs, and SGML/XML public identifiers

When I was helping to develop the spec for URIs beginning urn:publicid:, which are the URI equivalent of SGML and XML public identifiers (either formal or not), I worked out this table of what was and was not legal according to publicid syntax, URN syntax, and URI syntax (I'm referencing an older RFC for URI syntax rather than the current one, because I used the older one and some terminology has changed.)

Character Name(s) Pubid URN URI Status
LATIN CAPITAL LETTER ? yes upper lowalpha NORM
LATIN SMALL LETTER ? yes lower upalpha NORM
DIGIT * yes number digit NORM
HYPHEN-MINUS yes other mark NORM
LEFT PARENTHESIS yes other mark NORM
RIGHT PARENTHESIS yes other mark NORM
FULL STOP yes other mark NORM
EXCLAMATION MARK yes other mark NORM
ASTERISK yes other mark NORM
LOW LINE yes other mark NORM
PLUS SIGN yes other reserved AVAIL
COMMA yes other reserved AVAIL
COLON yes other reserved AVAIL
EQUALS SIGN yes other reserved AVAIL
SEMICOLON yes other reserved AVAIL
COMMERCIAL AT yes other reserved AVAIL
DOLLAR SIGN yes other reserved AVAIL
QUESTION MARK yes reserved reserved ENCODE
SOLIDUS yes reserved reserved ENCODE
NUMBER SIGN yes reserved delims ENCODE
PERCENT SIGN yes reserved delims ENCODE
SPACE yes excluded space ENCODE
APOSTROPHE yes excluded mark ENCODE
AMPERSAND no excluded reserved AVAIL
TILDE no excluded mark NULL
REVERSE SOLIDUS no excluded delims NULL
QUOTATION MARK no excluded delims NULL
LESS-THAN SIGN no excluded delims NULL
GREATER-THAN SIGN no excluded delims NULL
LEFT SQUARE BRACKET no excluded unwise NULL
RIGHT SQUARE BRACKET no excluded unwise NULL
CIRCUMFLEX no excluded unwise NULL
GRAVE ACCENT no excluded unwise NULL
LEFT CURLY BRACE no excluded unwise NULL
VERTICAL LINE no excluded unwise NULL
RIGHT CURLY BRACE no excluded unwise NULL

What the codewords in the tables mean:

URN
upper, lower, number, otherMAY be used without %-encoding.
reservedSHOULD NOT be used without %-encoding.
excludedMUST NOT be used without %-encoding.
URI
lowalpha, upalpha, digits, markMAY be used without %-encoding; %-encoding MUST NOT affect semantics.
reservedMAY be used without %-encoding; %-encoding MAY affect semantics.
space, delims, unwiseMUST NOT be used without %-encoding.
Status
NORMNo encoding needed, can't be used as syntax.
ENCODEMUST be encoded (%-encoded or privately).
AVAILAvailable for use as syntax character if literal use is %-encoded (AMPERSAND has no literal use).
NULLNot usable in pubids, included for completeness.

We spell it s-u-b-t-y-p-e around here

Tim Bray wonders why generic declarations extend rather than implement interfaces in Java 5.0.

The answer is simple: extends is how you spell "is a subtype of" in Java. The declarative semantics of implements is likewise "is a subtype of", but operationally it further means "provides an implementation for", and is a relationship specifically between a class and an interface. Types as such don't implement other types.

By the same token, interfaces extend their superinterfaces and always have. They also extend Object, which is a class.

On beyond Makefiles

Makefiles are ugly technology, but we're mostly stuck with them for now. People often misunderstand and misuse Makefile dependency declarations. For example, if foo.o is the compiled form of foo.c, which in turn includes foo.h, people wind up writing something like this:

foo.o: foo.c
foo.c: foo.h
        touch foo.c

But that's Just Wrong. It's not true that if you change foo.h you must rebuild foo.c; indeed, foo.c is maintained by hand and can't be "rebuilt". The Right Thing is of course:

foo.o: foo.c foo.h

The GCC compiler can generate rules like this automatically with the -M switch, and there is a makedepend program packaged with X Windows that slurps up preprocessor output and figures out the dependencies from that. But people have to make sure to use these features carefully to modify their Makefiles, and random changes to the Makefile can ruin them.

Furthermore, there are ugly problems with programs that have multiple source directories; people tend to write one Makefile per directory, and when there are cross-directory dependencies, things can go very wrong indeed. See Peter Miller's excellent paper Recursive Make Considered Harmful.

Fortunately, there are other possibilities out there. The Ada language imposes requirements that object files always be consistent with all the source files they depend on (at least two and frequently more). Classical Ada translators manage this with an "Ada software library", which keeps the programmer out of the frying pan -- and into the fire. But the implementors of GNAT (the GNU Ada Translator) have evolved an excellent general solution.

{I am not advocating a wholesale rewrite of the world's code in Ada! Though that might not be such a bad long-term idea in a few cases: Ada's design point is stand-alone high-reliability embedded programs, which is what an operating system kernel really is. The downside is the scarcity of Ada programmers relative to C programmers, of course.)

The GNAT approach is to maintain a parallel file for each .o file, called the .ali (Ada Library Information) file. The .ali file is conceptually part of the .o file, and is not physically incorporated into it only because GNAT has to handle a.out and other inflexible object formats. Every successful GNAT compilation run produces both an .o and an .ali file.

An .ali file contains the pathnames of all the source files read by GNAT to produce this file. It also contains the last modification times of those source files. An .o file is considered up-to-date if there is a corresponding .ali file and all the source files mentioned in it have the same timestamps as the actual sources. (If a particular source cannot be found, that may or may not be an error, depending on switch settings.)

Otherwise, the .o file is recompiled by compiling the Ada source with the same name, and the regenerated or newly generated .ali file is then examined to determine what to do next. Make-ing in GNAT therefore requires no error-prone Makefiles; just say gnatmake vmlinux :-) and everything needed will be compiled, pre-linked ("bound" in Ada jargon), and linked.

For more detail on the tao of GNAT compilation, read The GNAT Compilation Model, part of the GNAT documentation. There's an excellent paper of the same name by Robert Dewar, but it seems the only online copy is behind the ACM's content firewall.

2005-06-12

Little Boxes (thanks, Malvina, wherever you are)

Little boxes in the colo,
Little boxes made of ticky-tacky,
Little boxes, little boxes,
Little boxes, all the same.
There's a Dell one and a Sun one
And a Blue one and a Compaq one
And they're all made out of ticky-tacky
And they all run just the same.

And the folks that write the programs
All go to the university,
And they all get put in cubicles,
Little boxes, all the same.
And write systems for e-commerce
And XML Web Services,
And they're all made out of ticky-tacky
And they all look just the same.

And they all play their RPGs,
And write in their blogs by night,
And they all have pretty pictures,
Pretty pictures, all the same.
And the pictures all have metadata
And they export it with RDF
And they all get put in aggregators
And they all come out the same.

And the geeks go into business,
And their start-ups raise some capital
And their code gets put in boxes,
Little boxes, all the same.
There's a Perl one and a Java one
And a Lisp one and a Python one
And they're all made out of ticky-tacky
And they all lose just the same.

Tune

English's nearest relative

If you look it up, you'll generally find that the language that's most closely related to English is Frisian, which is actually three closely related languages: West Frisian in the Netherlands, North Frisian on the Schleswig-Holstein peninsula and the offshore islands, and East Frisian (or Saterfrisian) in the Saterland in northern Germany. (The area called "East Frisia" no longer speaks Frisian, but Frisian-influenced Low German.)

But in fact, there is another language that's much closer to English but still separate: Scots. Not Scots Gaelic, which is closely related to Manx and Irish. And not Scottish English, which is a variety of English heavily influenced by Scots. But Scots itself, which is spoken in various dialects all over Scotland, in Northern Ireland, and in the Orkney Islands.

Like most minority languages, Scots went through a low point in the 20th century, and is now undergoing something of a revival: some of the Scottish Parliament's reports are written in it. Fortunately, it has had quite a lot of literature one way and another: I'm particularly fond of a horror story by Robert Louis Stevenson called "Thrawn Janet". That's a version written in modern standard Scots spelling: Stevenson's original, which is more English-adapted, is also available at the excellent website Scots-Online.org.

Anyhow, here's a bit of Scots that should be fairly easy going while still showing off the contrast between English and Scots: William Lorimer's New Testament in Scots (only the Devil speaks Standard English). Here's the first few verses of the Gospel of John:

IN THE BEGINNIN o aa things the Wurd wis there ense, an the Wurd bade wi God, an the Wurd wis God. He wis wi God I the beginnin, an aa things cam tae be throu him, an wiout him no ae thing cam tae be. Aathing at hes come tae be, he wis the life in it, an that life wis the licht o man; an ey the licht shines i the mirk, an the mirk downa slocken it nane.

There kythed a man sent frae God, at his name wis John. He cam for a witness, tae beir witness tae the licht, at aa men micht win tae faith throu him. He wisna the licht himsel; he cam tae beir witness tae the licht. The true licht, at enlichtens ilka man, wis een than comin intil the warld. He wis in the warld, an the warld hed come tae be throu him, but the warld miskent him. He cam tae the place at belanged him, an them at belanged him walcomed-him-na. But til aa sic as walcomed him he gae the pouer tae become childer o God; een tae them at pits faith in his name, an wis born, no o bluid or carnal desire o the will o man, but o God.

Sae the Wurd becam flesh an made his wonnin amang us, an we saw his glorie, sic glorie as belangs the ae an ane Son o the Faither, fu o grace an trowth. We hae John's witness til him: "This is him," he cried out loud, "at I spak o, whan I said, 'Him at is comin efter me is o heicher degree nor me, because he wis there afore iver I wis born'." Out o his fouth ilkane o us hes haen his skare, ay! grace upo` grace; for, atho` the Law wis gien throu Moses, grace an trowth hes come throu Jesus Christ. Nae man hes e'er seen God: but the ae an ane Son, at is God himsel, an liggs on the breist o the Faither, hes made him kent.

As the slogan says, "Scotland: A Twa-Leidit Folkrick" (a bilingual culture).

The deepest divide

Tim Bray said, "For what it's worth, I think that at the architecture level, the choice between demanding to see the bits-on-the-wire and being satisfied with an API is probably the deepest one there is."

I agree. "Zen is drinking water and knowing yourself that it is cold."

Monguor

Monguor is one of the offshoots of Mongolian that arose because the Mongols conquered China and ruled it for about a century. Most of China's border garrisons were composed of Mongol-speakers, some of whom took root. For example, there are a few Mongol-speakers near the Vietnamese border, others in northeast Manchuria (later relocated), others in the provinces of Gansu and Qinghai, and some even in northeastern Afghanistan. Their languages have of course diverged over the centuries from the standard Mongolian of the Mongolian Republic and (Chinese) Inner Mongolia.

The speakers of Monguor in Qinghai have been living with (Han) Chinese and especially Tibetans for a long time. Unlike any other Altaic language, their language has developed consonant clusters at the beginning of syllables. Well, nothing strange about that: some short vowels fell, and the result was a bunch of clusters: it's happened in lots of languages, including English (more at the end of syllables, though, as in /lɪvd/ < /lɪvəd/ 'lived').

But just which consonant clusters arose and which didn't? The Middle Mongolian verb stem [hudaru-] 'destroy' became [xtaru-] in earlier Monguor, and [stari-] in the current language. But [hulaːn] 'red' did not become *[xlaːn]. (In standard Mongolian, it has become [ulaːn], as in Ulaanbaatar 'Red Victory', the capital of Mongolia.) What made [xt] privileged and [xl] not?

Well, most Monguor have for a very long time been bilingual in Tibetan, or the local variety thereof, in which [xt] is a valid cluster but [xl] is not. So when the Monguor borrowed the Tibetan word [xtorma] 'sacrificial offering', they did not adapt it to their Mongolian habits, but took it in directly as [xtorma]. Having learned to pronounce [xt], they adapted some of their own words to use it as well. In fact, vowels fell if and only if the resulting consonant cluster occurred in Tibetan as well.

How is that for high?

Primary source: Rona-Taš, A. "Remarks on the phonology of the Monguor language", Acta Orientalia Academiae Scientarum Hungaricae 10.3 (1960), pp. 263-67. A more accessible source is Robert Ramsey, The Languages of China.

OOP without inheritance

Well, I've read Schärli's Traits thesis, which is getting uptake in Perl 6 and Fortress [PDF] among other places. And it makes me furiously to think.

Do we really need inheritance (or delegation, which is inheritance at the object level) in a traits-based world? Why bother with overriding methods when they can be just replaced selectively? Sending to super is a marginal feature, and can easily be simulated by selective including and renaming from traits.

Here's my current vision, best if eaten by <date> yada yada:

Code is encapsulated in methods, so we still have methods, but classes go away in favor of two sort-of-new concepts, behaviors and types. A behavior is just a set of methods and associated private state variables. The methods in the behavior can be local (in which case they are not visible outside the behavior, and are basically just subroutines), standard, or abstract. You cannot instantiate a behavior as such, nor can a variable be typed (in a statically typed language) to a behavior. It's just a pile of things that an object might be able to do. If a behavior wants to expose its private state, it does so with a getter and/or setting method; a smart language will make it easy to specify that you want these things.

Types are used to instantiate objects and declare variables. They are composed out of behaviors: the methods available on an object of type T are the non-local methods provided by the behaviors out of which T is composed. Unless the type is itself abstract, any abstract methods in one behavior must be supplied by a standard method in another behavior, a sort of peg-and-hole operation. You can bring in a single method from a behavior, suppress a method in a behavior, or rename a method in a behavior when constructing a type; that allows you to compose behaviors that don't quite fit together perfectly. All this is verified when the type is compiled; a clever IDE can notice discrepancies and warn the programmer about them.

We also need a notion of private vs. public methods, but it's not clear to me whether this should be declared directly on the method (i.e., where the behavior is) or at the type level. That's just notational, however. With that established, we can give an implementation-independent definition of subtypes and supertypes, declared at the type level. The compiler verifies that the methods of a declared {sub,super}type are a {super,sub}set of the type currently being specified, and that arguments are appropriately contravariant and results covariant (in a statically typed language) so as to provide minimum requirements for Liskov substitutability.

I'm not sure yet what the constructor/destructor story might be: I like factory methods better than constructors anyhow, and perhaps that's the Right Thing.

Ideas? Comments? WAGs?

2005-06-11

Short takes on literature

From Laozi's time to James Joyce's, no one successfully wrote a sacred scripture in the comic mode. Finnegans Wake is the greatest Menippean satire ever written.

If we, like the inhabitants of Glubbdubbdrib, could call up the ghost of Shakespeare, to ask him what he meant by such-and-such a passage, he could only reply with maddening iteration that he meant it to form part of the play.
     --Northrop Frye (from memory)

Some have called Theodore Dreiser a great writer who couldn't write very well. I reject the notion of artists who aren't craftspeople. Did anyone ever hear of a great painter who didn't happen to paint very well?

Tom Shippey points out in J. R. R. Tolkien: Author of the Century that the split between Tolkien-lovers and Tolkien-haters is basically not between the uneducated and the educated, but between the generally and the specially educated. Tolkien-despising has been a self-perpetuating passion among the English professors (and the literati) for fifty years.

Two Scottish lords in Samuel Johnson's time: "Have you read my latest book?" "I have not, my Lord. You write a great deal faster than I am able to read."

Why did the Hollywood agent respect Shakespeare? Because he said if he made enough money he'd quit and retire to Stratford, and when he made enough money, he (unlike many a Hollywood writer) quit and retired to Stratford.

Buying land from the Indians, as seen by the Indians

"Suppose a white man should come to me and say, 'Joseph, I like your horses. I want to buy them.'

"I say to him, 'No, my horses suit me; I will not sell them.'

"Then he goes to my neighbor and says to him, 'Joseph has some good horses. I want to buy them, but he refuses to sell.'

"My neighbor answers, 'Pay me the money and I will sell you Joseph's horses.'

"The white man returns to me and says, 'Joseph, I have bought your horses and you must let me have them.'

"If we sold our lands to the government, this is the way they bought them."

     --Chief Joseph (aka Thunder Rolling in the Mountains) of the Nez Perce

Nikolai Ivanovich Vavilov

The great Russian plant geneticist Nikolai Ivanovich Vavilov, who collected 200,000 different varieties of plants from literally every part of the world, collected a particular sunflower variety in West Texas in 1932. (Unfortunately, the Linnaean name is garbled in the article where I read this.) Forty years later, a hybrid descendant of this variety, now much richer in oil production, was reintroduced into West Texas by sunflower-oil producers, an early and sterling example of U.S.-U.S.S.R. cooperation.

During the German siege of Leningrad (9/41 to 1/44), which caused a famine in the city, either nine or fourteen of Vavilov's staff (accounts vary) died at their desks of starvation rather than allow any of the collection's tons of irreplaceable genetic stocks to be touched. "Patriotism is not enough." -- Edith Cavell

Vavilov himself wasn't there. He had openly defied the powerful pseudo-scientist Trofim Lysenko, who had Stalin's full support for his "Marxist genetics", and had been thrown into prison. He died there in 1943.

Songs from Cornwall

Cornwall is the southwesternmost part of England -- nowadays. But it wasn't always so; once it was a separate culture with its own language, Cornish, closely related to Welsh. Cornish died out sometime in the 18th century, and has been revived in our day, as Hebrew was, but less successfully. Still, there are about 3500 fluent speakers nowadays of the three different revived versions.

Here's a familiar nursery rhyme in English, Cornish, and Kerno. What's Kerno? A constructed language invented by Padraic Brown, based on the constructed language Brithenig. Both of them are Brito-Romance languages, a subfamily of languages that doesn't really exist, but are their creators' best efforts to figure out what would have happened if the Vulgar Latin that was spoken in parts of Britain until the 5th century hadn't gone extinct and been replaced by Welsh (and later by Scots Gaelic and English).

Baa, baa, black sheep
Have you any wool?
Yes, sir, yes, sir,
Three bags full.

One for my master
And one for my dame,
And one for the little boy
Who lives down the lane.

Brȳf, Brȳf, te dhavas dhu
ues genes gwlān?
Ues, syrra, lenwys yw
try sagh a'm ran

Onen ues dhe'm mēster lōs
ün aral ues dh'em dama
saw nyns ues man dhe'n meppyk plos
y'n vownder ues ow carma!

A charcat dhuv en dhuv; a charcat dhuv!
tens ty cholles le laine le laine; tens ty laine?
A vaysteor, vaysteor dhack; a vaysteoran,
trew sackes y vowghes traw; di laine llen.
Yen per li don li don; ce vowgga 'ci;
Yen per li dam li dam; il sackis 'ca;
Yen per li map li map; il l' ystrathe!

Chorus:
A charcat dhuv en dhuv; a charcat dhuv!
A cant commeck-commeck; a charcat dhuv!

Padraic comments:

That's Vorriseor Yowenck's moderny take on the old nursery rhyme. While they're more known for fusing Celtic and Cajun; they have at times turned their attentions to the truly weird.

This song is kind of slow, but not drudgy; kind of quiet and soothing. Think of "Donal agus Morag" and you'll not be far off the mark. It does feature the yspatha musical and the cornet, along with the expected tambeor, croutha and various background instruments.

Here's another song in the three languages, this one with a specifically Cornish theme, based on a historical event of 1688. The song, though, was written in the 19th century by Robert Stephen Hawker; some have called it the national anthem of Cornwall. Of course, in Ill Bethisad, the universe where Kerno and Brithenig are spoken, the historical event was a bit different.

A good sword and a trusty hand,
A merry heart and true!
King James's men shall understand
What Cornish men can do.
And have they fixed the where and when?
And shall Trelawny die?
Then twenty thousand Cornish men
Will know the reason why!

Dorn ues dh'y drestya, cledha da!
Lēl hudhyk an golon!
Tüs Mytern Jamys aswon a wrā
Gallus an Gernowyon
Yw ordnys lē ha prȳs dhodho?
A verow Trelawny brās?
Otomma ügans mȳl Kernow
a woffyth oll an cās!

Yn clathimoris dack, et yn lams vere!
Yn cor yoieos et vere;
compruindruront y varren le Jeamon
que pothont facer y vap Kernow
Ach fiskateor il couand' et jeond'?
Ach morris-s' il Drewlaunis?
Aci ce Kernow le ouygaint mil;
et savuront y pher-que!

You can read and hear the full anthem (this is just the first verse) in English and a different revived version of Cornish.

2005-06-10

Flow, Stuckness, and Interruptions

Programming work is not very much like digging ditches (the work for which project-management techniques were originally designed); it is a lot more like digging holes in very stony ground. Sometimes things go beautifully, sometimes a programmer spends whole days beating his head against the wall on a single point, until finally the light bulb lights up. These are two very different modes of mental operation, commonly known in the industry as Flow and Stuckness.

Flow is the mental state when a programmer understands what he is doing and directly interacting with the process of program writing or debugging. In Flow, hours can go by like minutes, and amazing productivities can be achieved. Error rates drop drastically, and work becomes a kind of highly productive play. Maintaining this state as much as possible is very important to successful programming work.

The difficulty with Flow is that it takes about 15 minutes to enter the state, and an interruption that must be dealt with (phone call, visit, bio-break) will terminate the state. Therefore, if there is one interruption every 15 minutes on average, the amount of Flow-assisted work (which in practice can mean the amount of work, period) drops to zero! It is no accident that programmers still report, as they have for the last 50 years, that the most productive programming times are after hours, because it becomes possible to enter Flow and remain in it hour after hour until the belly or the bladder demand attention.

Stuckness is the reverse of Flow; the state where the programmer simply can't see what's wrong. The point may be and often is completely obvious in hindsight, but discovering what that point is, is literally the same kind of discovery process that scientists use (or mechanics, for that matter): make a hypothesis, test it, attempt a fix, see if it solves the problem, test to make sure a new bug hasn't been introduced. There is also variety of Stuckness that arises during design rather than debugging, a kind of analysis paralysis where it is clear that the current way of doing things is wrong, but no clear vision of the correct way has yet emerged.

Stuckness cannot be overcome by will or planning, only by insight, and the arrival of insight is not predictable. Striving for insight can often be what prevents it from being achieved. The motto "Sleep on it" reflects the perception that blocking the conscious mind can be just the right thing to allow the unconscious mind to operate, often giving the key to the problem as if from nowhere. Distraction is an equally successful approach. During times of stuckness, therefore, interruptions are actually desirable, as they allow this useful distraction.

All these things make completion time estimates for programming projects something akin to a black art. One cannot predict from one day to the next when interruptions will occur, of course, but their effects on the project can vary from very good (intense Stuckness) to very bad (intense Flow disruption).

A relatively successful model for programming management, however, is based on the notion of trouble-ticket tracking. Since projects are not for the most part self-generated but rather arise out of requests from editorial, sales, or customers directly, each of them can be treated as a trouble ticket at whatever scale, and then the question becomes not "Is this project past its highly arbitrary deadline?" but rather "How many projects being starved of the time and attention they need?" Mechanised and human support for this process is extremely helpful and is not usually provided to the extent necessary.

References:

Weinberg, Gerald. The Psychology of Computer Programming.

Brooks, Frederick. The Mythical Man-Month.

DeMarco, Tom, and Tim Lister. Peopleware.