Being a HAXEor

(thanks to d8uv for the title)

I usually describe myself as an "'ex' troglodyte", because I prefer the Unix line editor ex(1) to all other text editors. This makes people look at me like I'm something they found by turning over a rock, but what do I care?

(I know that ed(1) is the standard text editor, but I'm willing to trade off a little minimalism for a little convenience.)

Anyhow, in the tradition of Tim Bray's MARS, I will now say that I make my web site with HAXE, standing for HTML, Apache, and ex (reversed for euphony and cuteness).

Update: Though I don't like cokebottle editors, much of what's said in The Case For Emacs is relevant to me. I don't use ex(1), it is part of me.


"Irrumabo vos et pedicabo"

In my first year of college, long ago,
I took a class on Ovid and Catullus.
One of the sexual poems I found confusing,
and the book we were using
was quite devoid of commentary on it,
grammatical or otherwise.

So at the next class, I asked my professor
what the poet meant by such-and-such.
He was hesitating, doubtful, maybe-yes-maybe-no.
Yet at the following meeting of the class,
he was entirely changed:
he explained forthrightly just how the poem worked.

I could not understand the sudden change
until I looked about the studentry
and saw the only female student
absent that day.
I was shocked and outraged --
naïve nerd from a feminist family that I was --
to think that a professor! of the liberal arts!
and of Latin of all things! could be so sexist,
so crude, so utterly indifferent to his duties
to all his students.

Many years later, it occurred to me to wonder
if he had sunk so low as to ask her
to be absent that day so that he could answer my questions.
All the worse, I thought.
All the worse.

Looking back today, I think:
perhaps he was, poor man, in a cleft stick,
caught between the fear of being accused of harassment
by the woman for openly discussing sex in class,
and the fear of having his dean (who happened to be my mother)
coming down on him for neglecting the questions
of her precious darling (little did he know
that while she might have disapproved,
she would never have punished him for that --
my mother believed in justice).

It's a hell of a thing
when students can't learn
for fear or for shame
what the poets sing.


An annoying ambiguity about which nothing can be done now

The phrase "COMBINING DOUBLE" in a Unicode character can mean either of two things. Sometimes the diacritical mark is doubled with respect to some other mark:


But sometimes it means that the mark extends over two characters, the one it applies to and the following one:


Of course U+1D18A MUSICAL SYMBOL COMBINING DOUBLE TONGUE  𝆊 is something else again.

Thank you. I feel much better now.

The lackmus test

In a post to one of the innumerable technical mailing lists I belong to, a native speaker of German used the phrase lackmus test, meaning a simple method for detecting differences. In English, the phrase is litmus test; why the difference?

Middle English had both the native English word lykemose and the Scandinavian borrowing litemose; only the latter has survived. The second morpheme in each case is that of English moss, but the first morphemes are different, meaning 'drip' and 'dye, color' respectively.

Litmus is made by drying and powdering certain lichens; it was originally used as a water-soluble dye, but is now generally used as a quick-and-dirty test for acidity, hence the metaphorical use of the term (it turns red in acids, blue in bases).

East is west and west is east

Little Diomede Island (U.S.) in the Bering Strait (not the Aleutians, as I mistakenly wrote earlier) is reckoned to be some tens of thousands of kilometers west of Big Diomede Island (Russia), despite the obvious fact that Little Diomede is about four kilometers east of Big Diomede.

The reason for that is that in the state of nature, Europe is east of North America, which is east of Asia, which is east of Europe. So it makes no sense to ask "Is X east or west of Y?" unless we have instituted a convention of some sort.

One possible convention is: "X is east of Y if and only if the easterly great-circle course between them is shorter than the westerly one." That's the rule we apply in ordinary life, and by that rule, the Russian island is west of the U.S. one.

But the navigator's convention unwraps the globe at the 180 degree meridian, and says that the entire Eastern Hemisphere is east of the entire Western Hemisphere. Using this convention, the Russian island is east of the U.S. one.

And by the same token, Alaska, since it sticks into the Eastern Hemisphere, is the easternmost U.S. state as well as the westernmost and the northernmost. The southernmost state is Hawaii. Of the 48 contiguous states, the westernmost is Washington, the easternmost Maine, the southernmost Florida (thanks to Key West), and the northernmost Minnesota, due to a surveying error.

Say who?

I got four stupid financial spams with interesting From: lines the other day. These are the ones that pick dictionary words for first and last names: what is that supposed to be about, anyhow? Anyhow, here they are:

  • The peculiar Queueing M. Secretively,
  • The paradoxical Tough D. Frailty,
  • The malapropos Foolhardiness T. Phoneticians,
  • And the Marxist (tendance Chico) Spumoni P. Brickbat.

I guess it was the last one that put me over the top.


Recording your phone calls.

Can you record your own phone calls, ingoing and outgoing? Usually, at least in the United States.

Most states are “one-party-consent law” states. If you live in one of these, you can always record your own in-state calls either openly or surreptitiously, since only one participant’s consent is needed. Likewise, you can get someone else to record them for you.

In interstate calls, it’s important to check this state-by-state summary, because in interstate calls, both states’ laws apply, and you need to apply the most stringent applicable law. For example, if you live in California or are even just speaking to someone in California (an “all-party-consent law” state), you must get the other party's permission to record the call, or risk having to pony up $5000 in statutory damages (or three times the actual damages, whichever is greater). In general, announcing your intent to record and letting the other party hang up if they don’t like it is sufficient in all states: continued participation implies consent.

The all-party-consent law states are: California, Connecticut, Florida, Illinois, Maryland, Massachusetts, Michigan, Montana, Nevada, New Hampshire, Pennsylvania, Washington. In Delaware, Indiana, Iowa, Missisippi, and probably New Mexico as well, a participant may record but a non-participant may not, even with consent. In Vermont the law is unsettled.

I am not a lawyer; this is not legal advice; laws change; errors happen.


Slashdot, eWeek, Microsoft, the OSI, Groklaw, and me

Well, it seems I've made Slashdot, quite unintentionally. The article there references an eWeek article about how I proposed that the Open Source Initiative approve two Microsoft licenses, the Microsoft Permissive License and the Microsoft Community License. Here's a FAQ:

  1. Why is this story news in August 2006? Ya got me. Groklaw reported on it back in December 2005, when it was in fact news.
  2. Do you speak for Slashdot, eWeek, Microsoft, the OSI, Groklaw, or any of your past or present employers? No, only for myself.
  3. Is what the eWeek story says about you true? Yes, except that I no longer volunteer for ccil.org; I did some work for them in the past.
  4. Why did you propose the licenses for OSI approval? Because I believe they meet the elements of the Open Source Definition.
  5. Are the licenses basically similar to other OSI-approved licenses? Yes.
  6. Then why ask OSI to approve them? Because I want to encourage Microsoft to release software under an OSI-approved license, even if they feel it necessary to use their own license
  7. Microsoft release anything under an Open Source license? Surely you jest. No, actually. Microsoft released WiX under the Common Public License, an OSI-approved license. And there have been other such releases.
  8. Why did you withdraw the request for OSI approval? For a number of reasons, it's awkward for OSI to approve licenses that are not proposed by the author of the license. The OSI wants to keep all approved licenses on its site, and may not have copyright permission to do so. Furthermore, if the OSI wants to request changes, only the author can make them.
  9. Does that mean you have changed your mind about the licenses? No, only about the suitability of OSI approving them.
  10. Are you a shill/astroturfer for Microsoft? No.
  11. What's your view on open-source software? I use a lot of it and have released my own code and other stuff under several different open-source licenses.
  12. What do you want to do with your fifteen minutes of fame? Wait for it to pass.
  13. Can I leave a comment? Yes. However, as Le Guin says, I can take a little inaccuracy or a little accusation, but the combination is poison. I reserve the right to remove comments I think are poisonous.


The Elements of Style Revised

Get the little book!

I've been working off and on for the last few weeks on updating the original 1918 edition of William Strunk's short book on the basics of elementary composition. No, it isn't "Strunk and White"; White's additions are still in copyright and thus untouchable. Nor is it the book I would have written myself from scratch; that would look a lot more like Mapping the Model, except Rosemary Hake has already written it, so why should I? (Alas, that book is out of print....)

Here's part of the Reviser's Introduction, so you can see if it's for you:

My revisions to the original are founded on the principle that rules of usage and style cannot be drawn out of thin air, nor constructed a priori according to "logic"; they must depend on the actual practice of those who are generally acknowledged to be good writers. For a larger work founded on the same principles and giving much more detailed and up-to-date advice on usage, the reader is urged to consult the current edition of Merriam-Webster's Concise Dictionary of English Usage, as I have done with both pleasure and profit while preparing this revision.

I have attempted to remain within the scope of the original. This book, therefore, is intended as a compendium of helpful advice to novice writers in freshman composition classes, not a code of general laws of writing for all works by all writers in all circumstances. Violations of the rules can be found within the book itself — this is neither inconsistent nor hypocritical, as The Elements of Style Revised is not a paper written for a composition class.

In updating Strunk's work from the 19th century to the late early 21st century, I have retained as much of Strunk's spirit and characteristic style as I could. I have removed the obsolete, the erroneous, and the merely idiosyncratic (Strunk's arbitrary dislike of "student body", for example) both from Strunk's own usage and from the rules laid down in his book. Like White, I have also added a few points to Chapters IV and V that seemed to me important enough to justify their presence, as well as removing Strunk's Chapter VI on spelling. I have not hesitated to replace Strunk's opinions with contrary ones, though I was pleasantly surprised to find that many of those I expected to require changing (strictures against split infinitives and final prepositions, as well as the preposterous which/that rule) did not appear in the 1918 edition at all.

Share and enjoy, and of course send me critiques.


Well, wuddaya know

Here's a poem in blank verse:

Oh, moralists, who treat of happiness
And self-respect, innate in every sphere
Of life, and shedding light on every grain
Of dust in God's highway, so smooth below
Your carriage-wheels, so rough beneath the tread
   Of naked feet, bethink yourselves
   In looking on the swift descent
Of men who have lived in their own esteem,
That there are scores of thousands breathing now,
And breathing thick with painful toil, who in
That high respect have never lived at all
Nor had a chance of life! Go ye, who rest
So placidly upon the sacred Bard
Who had been young, and when he strung his harp
Was old...

Go, Teachers of content and honest pride,
   Into the mine, the mill, the forge,
The squalid depths of deepest ignorance,
And uttermost abyss of man's neglect,
And say can any hopeful plant spring up
In air so foul that it extinguishes
The soul's bright torch as fast as it is kindled!

Who wrote it? Well, Charles Dickens, in Martin Chuzzlewit. What's that? You didn't know Dickens was a poet? Well, the above passage appears, printed as prose, in Chapter 13, with the additional words "and had never seen the righteous forsaken, or his seed begging their bread" between the two stanzas, quoted from Ps. 37:25. Such things can prose writers fall into when they are trying to be high-flown and not watching themselves carefully.

Kudos to H. W. Fowler for spotting this example.


Futbol en masse

At the summer camp I attended as a yoot in the 60s, there was a game known as Mass Soccer. The chief feature of this game of games was that any number could play on either side. Kids being what they are, this meant that the entire center of the field was occupied by a permanent clot of forwards of all shapes and sizes, with a relative handful of backs whose chief task was to assist the goalies when a ball occasionally escaped from this huge scrum.

To keep things interesting, any number of balls could be in play at once. To keep things fair, each side was entitled to as many goalies as there were balls — when a ball went out of play, one of the goalies would do the usual thing to bring it back in while the game raged on around him. To keep things safe, the footwear rules were strictly enforced: no shoes allowed, socks required.

It was a hell of a lot of fun.


TagSoup 1.0 Final released!

TagSoup is a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. By providing a SAX interface, it allows standard XML tools to be applied to even the worst HTML. TagSoup also includes a command-line processor that reads HTML files and can generate either clean HTML or well-formed XML that is a close approximation to XHTML.
TagSoup is free and Open Source software, licensed under the Academic Free License version 3.0, a cleaned-up and patent-safe BSD-style license which allows proprietary re-use. It's also licensed under the GNU GPL version 2.0, since unfortunately the GPL and the AFL are incompatible. You can choose to license TagSoup from me under either the GPL or the AFL.
This release represents the end of my current plans for TagSoup. I will continue to fix bugs, but it now does everything that I foresaw back in 2002 when I started this project, and a great deal more. Thanks to everyone on the tagsoup-friends mailing list for their efforts.



I've added a slide presentation/rant on RELAX NG to my home page.



This blog is about recycled knowledge. That means there are often facts or ideas in it that I remember, but don't remember the source of. If you read this and recognize your own ideas, please let me know and I'll fix it up.

Sometimes I deliberately don't say where an idea comes from or mention people in a "background" sort of way, because I don't know if they want their names attached to something they wrote in a more private space than the Web five or ten years ago. If you are one of those, and you do want credit, again let me know.


No cross, no crown

Someone used the phrase "No cross, no crown" on a mailing list, and explained it as meaning "Don't discuss religion or politics". I was fairly sympathetic with the intent, but unfamiliar with this use of the phrase: I had always understood it to mean "If you don't take pains you won't achieve anything", and to be a specifically Christian metaphor: "No earthly cross of metaphorical crucifixion, no heavenly crown of sainthood." I decided to look into the question.

I quickly checked the first 500 Google references to the phrase, and all of them except three clearly referred to the sense I already knew, drawn from all over Christianity, mostly Catholic and Quaker, but Episcopal, AMEZ, Orthodox, and even Rosicrucian. The Christian Scientist symbol of a cross surmounted by a crown probably alludes to the saying as well. The saying is of course also found in purely secular contexts, with the same sense.

Two of the three exceptions are basically accidental: a song "Ojo por Ojo", which says "And in that place there is no cross / no crown, no sacred ground / all is done and left unsaid"; and a page denying that the Mormon Temple is a Christian edifice, enumerating the Christian symbols that it does not have (at least on the outside): "There is no cross, no crown, no alpha or omega, no icthys, no lion, no lamb, nor any other recognizable historical Christian symbol."

The final exception appears in a speech on Voltaire by Robert Ingersoll, the 19th-century agnostic, who is clearly using the expression in the mode of parody; he associates it with King James I's maxim "No bishop, no King", by which the King meant that if Presbyterianism came to dominate England as well as Scotland, he would swiftly find himself either out of a job or a tolerated figurehead.

The Quaker references clearly allude to a book of that name by William Penn, in which he says, "No pain, no palm; no thorns, no throne; no gall, no glory; no cross, no crown." The modern phrases "No pain, no gain" and "No guts, no glory" are clearly reminiscences of this. I also found "No pruning, no grapes; no grinding mill, no flour; no battle, no victory; no Cross, no Crown!" and "No laming, no naming, no struggle, no Promised Land; no cross, no crown" in the works of others.


Changing names

The names of Unicode characters, once published, can't ever be changed again, not even when they are obviously wrong. In this case, stability is considered to trump correctness.

A participant in the Unicode development process once complained: "If biologists had insisted that names once assigned could not be changed because of advances in knowledge, or even to correct errors, then surely the system would have broken down centuries ago."

But in fact, the international Linnaean names of plants and animals are not changed for either of those reasons, nor for any other reason whatsoever: though we now know that Basilosaurus is a proto-whale and not any sort of reptile, Basilosaurus it will remain forever.

The only thing that can happen in Linnaean nomenclature is the recognition that two names are synonymous. In that case, there is a question which shall be the preferred name, and normally it is the first name to be published, but exceptions sometimes occur. Thus when the dinosaurs Brontosaurus and Apatosaurus were found to be the same, Apatosaurus was chosen as the preferred name because it was published first; however, this is not properly to be described as "changing the name of Brontosaurus to Apatosaurus". Brontosaurus is a perfectly good name and may still be used even though it is dispreferred.

When are later names preferred to earlier ones? Usually when the earlier name has long been forgotten, and the later name is widely used in the scientific literature.

And per se and

The name of the & character, ampersand, is short for and per se and, meaning and by itself and. People used to recite it at the end of the alphabet. About that much there's no doubt.

But of the two ands in that phrase, which one designates the ampersand? Is it and per se & or & per se and? It seemed clear to me that the former is the correct reading, so I did a little desultory research.

Most sources say the derivation is & per se and, but the story of reciting the alphabet is firmly established and I don't see how and, the conjunction, could possibly appear at the end. The Morris Dictionary of Word and Phrase Origins takes my point of view.

An alternative adopted by the American Heritage and Merriam-Webster dictionaries is that the words and per se and are to be construed as & by itself [means] 'and', but that seems far more strained to me than the natural x, y, z, and per se &.

So I suppose you can say what you like.


It was a dark and stormy night. I stalked my enemy through the tall grass. I saw the flash of his muzzle as his shot went whistling over my head. I fired! I killed him!

I walked to the nearest town. Casually smoking a cigarette, I entered the nearest bar.

"I have killed a man!", said I.

"His name?" demanded a tall, dark, and handsome stranger at the other end of the bar.

"His name? His name was Zanzibar!"

"Zanzibar! He was my brother! We must meet."

It was a dark and stormy night....



"But this is terrible!" cried Frodo. "Far worse than the worst that I imagined from your hints and warnings. O Gandalf, best of friends, what am I to do? For now I am really afraid. What am I to do? What a pity that Bilbo did not stab that vile creature, when he had a chance!"

"Pity? It was Pity that stayed his hand. Pity, and Mercy: not to strike without need."

     --J.R.R. Tolkien


The War (after Simonides)

This is a villanelle, a very special verse form that I have taken some liberties with. I had always wanted to write one, but I never could come up with a couplet strong enough to support all the required repetitions. It finally dawned on me that the couplet didn't have to be original with me, if I was clever enough about it.

Even if you don't click on all the links, be sure to mouse over them: they provide a first-level commentary on the poem.

"Go and tell the Spartans, passerby,"
   The man of Keos sings in lines that soar,
"That here, obedient to their laws, we lie."
The king of Lakedaimon will not fly,
   Though "Kill them all!" the hordes of Persia roar:
Go and tell the Spartans, passerby.
The news is carried of their terse goodbye
   From Hot Gates to far Atlantic shore
That there, obedient to their laws, they lie.
The law of nations ready to defy,
   Atlantis rising plots aggressive war --
Go and tell the Persians, passerby.
The Archon smiles, and smiles: real men don't cry.
   His friends sell not reality but lore
As here, obedient to his will, they lie.
Boxed in flags, the dead all verse deny.
   They cannot serve their country any more.
"Go and tell our people, passerby,
That here, obedient to their laws, we lie."


Earth Day

This lyric was circulating around the time of the first Earth Day on April 22, 1970. George Carlin wrote the original version; this one's been modified by what Pete Seeger called the folk process. Sing it slowly, maestoso, and with great irony.

Oh beautiful for smoggy skies,
Insecticided grain,
For garbage mountain majesties
Above the asphalt plain:
America, America,
Man spreads his trash on thee,
And makes a mess with filthiness
From sea to oily sea.

(Original lyrics; melody.)


Celebes Kalossi 2.0

I've decided it's time to post about my object-oriented programming model, Celebes Kalossi, again. All previous statements are inoperative, so you don't have to look back at my earlier postings. This posting will be mostly about terminology.

In CK, there are classes. A class contains declarations of state variables (aka instance variables, fields, data members) and both declarations and definitions of methods. State variables are only accessible from within the class: they are all private in Java terminology.

A declaration of a method specifies the method's name and its signature; that is, the type of its return value and the names and types of its arguments. In the model, no two methods in a class can have the same name; an actual implementation might provide Java-style method overriding, since overriding is resolved at compile time and is basically convenient syntactic sugar.

A definition of a method specifies everything the corresponding declaration does, but also includes the code of the method. If a class contains a definition of a method, it has no need to contain its declaration too.

A method may be public, private, or neither; the third type will be called standard methods here. A public method can be called from anywhere, and can be invoked on any object. A private method cannot be invoked outside the class in which it is defined, so there is no point in declaring one. Basically, it's just a subroutine. The difference between standard and private methods will be explained in another posting.

Standard and private methods can only be invoked on the self (this in Java) object, implicitly or explicitly. The most important rule of CK is that you cannot invoke a method on self that is not declared (not necessarily defined) in the current class.

Finally, by "Java" I mean "Java or C#". More later.

Speaking in Ander-Saxon

Some while back I wrote a posting on partially understanding languages that included a well-known quotation from Old English specialist Tom Shippey about how English became simplified over time.

Here's a translation (by me) of that explanation into Ander-Saxon, a variety of English in which French, Latin, and Greek words and roots are replaced by native English ones.

Reckon what happens when somebody who speaks, shall we say, good Old English from the south of the land runs into somebody from the northeast who speaks good Old Norse. They can without fear pass on with each other, but the hardnesses in both tongues are going to get lost. So if the Anglo-Saxon from the South wants to say (in good Old English) "I'll sell you the horse that pulls my cart", he says: Ic selle the that hors the draegeth minne waegn.
Now the old Norseman -- if he had to say this -- would say: Ek mun selja ther hrossit er dregr vagn mine. So, roughly speaking, they understand each other. One says waegn and the other says vagn. One says horsand draegeth; the other says hros and dregr, but broadly they are onpassing. They understand the root words. What they don't understand are the wizardly bits of the wholespeech.
For a showdeal, the man speaking good Old English says for one horse that hors, but for two horses he says tha hors. Now the Old Norse speaker understands the word hors all right, but he's not sound if it means one or two, byspring in Old English you say "one horse", "two horse". There is no apartness between the two words for "horse". The apartness is carted in the word for "the", and the old Norseman might not understand this, byspring his word for "the" doesn't behave like that. So: are you trying to sell me one horse or are you trying to sell me two horses? If you get enough sittings like that there is a strong drive toward straightening out the tongue.

(I ran this past Professor Shippey in email.)


On this and that

Here's a few little bits scoured up from here and there.

Boswell on Johnson's Dictionary:

A few of his definitions must be admitted to be erroneous. Thus, Windward and Leeward, though directly of opposite meaning, are defined identically the same way; as to which inconsiderable specks it is enough to observe, that his Preface announces that he was aware there might be many such in so immense a work; nor was he at all disconcerted when an instance was pointed out to him. A lady once asked him how he came to define Pastern the knee of a horse: instead of making an elaborate defence, as she expected, he at once answered, "Ignorance, Madam, pure ignorance." His definition of Network ["Any thing reticulated or decussated, at equal distances, with interstices between the intersections"] has been often quoted with sportive malignity, as obscuring a thing in itself very plain.

To which we may add his definition of lexicographer: "a writer of dictionaries, a harmless drudge".

On the names for people with variously colored hair:

Blond and blonde are masculine and feminine forms, though the latter is rarely used as an adjective nowadays, only as a noun. Brunette, on the other hand, is feminine only; the form brunet which is sometimes found is not French, not English, and entirely barbarous. -ette is inherently both feminine and diminutive (though the latter sense dominates in English, as in cassette, diskette, kitchenette, statuette), and not to be split up into two separate affixes.

On whiteboards:

Whiteboards are common in corporations, but I have never seen one in any educational establishment in the U.S. (which is by no means to say there are none). The coolest variety have a large canvas which can be scrolled left or right, by full screens or by smaller steps, and can even save copies of what's currently in view using a giant scanner; you can hook up a conventional printer for hard copy or (I suppose) put them on a network. I only got to use such a Wundergerät once or twice, alas.

On Latin in Great Britain:

The Great Vowel Shift that changed the pronunciation of the English long vowels in the 15th century affected not only English but also the spoken Latin of the monasteries. Indeed, there was a period where English and Scottish Latiners could not understand one another, because Scottish Latin did not undergo the Shift even though Scots itself (mostly) did!

On how history could have gone:

Could the Internet have been invented if telephones hadn't been invented first? I think so. Telegraphy is a lot simpler than telephony, and telegraph operators had something socially very like the Internet (but involving a lot fewer people, of course) more than a hundred years ago. There were even routers and protocol gateways, instantiated by human beings.

A technical civilization might well go from semaphore telegraphs to electric telegraphs to teletypewriters to Morse-code radio to high-speed wired and wireless digital transmissions, missing analog telephones and radio altogether.

On the root *tag-:

Ruminating over the English words tact and tactics led me to realize how interestingly convergent in meaning they have become, descending from the same PIE root *tag- through different branches, respectively Latin tangere, tactus 'touch(ed)'; Greek taktikh 'deployment < arrangement'.

On tornadoes:

Conventional wisdom says tornadoes never happen in the Eastern U.S. Conventional wisdom, as all too often, does not know its history. Tornadoes have been recorded in all of the fifty states and D.C. Indeed, only the following 10 states have not had a major tornado (causing death or property damage) since 1980:

Alaska (1959), Hawaii (1971), Indiana (1974), Iowa (1979), Kentucky (1974), Minnesota (1978), Missouri (1973), North Dakota (1978), Vermont (1970), West Virginia (1974).

A Creative Choice

This piece was submitted by me to the mailing list Heroic Stories. It appears here in slightly modified form.

I used to work as a programmer for a news service, a small subsidiary of a larger news and financial information company. We write and publish medical news over the Internet; our customers include companies with medical websites, pharmaceutical companies, newspapers, and specialized and general-use web portals.

Back in 2002, advertising-supported media (which means most media) had fallen on hard times as a result of the slow economy. Our subsidiary, like many media companies, had to cut back on its staff. For us the need was particularly acute, as most of our customers were Internet-based, and about half of them went belly-up after the dot-com bubble burst in 2001.

We had staved off the problem for about a year, thanks to having annual contracts. But eventually we had to cut costs, and the only way we could do that and still maintain service to our remaining customers was to cut staff. As a result, in August 2002 the "powers that be" declared that one or two people would have to be sacrificed from each department: sales, financial, news-writing, and technical.

The financial department was abolished altogether and its functions transferred to a group in the parent company. Most of the other groups naturally suffered as a result of losing journalists, editors, and salespeople -- but they survived, still able to perform their missions.

Our technical department, however, consisted of just two programmers and a system administrator. Without the programmers, we couldn't maintain our existing systems and implement new ones. Without the system administrator, who doubled as a help-desk person, we would have been unable to support the rest of the subsidiary or keep our production systems backed up and running smoothly. Terminating any of us would have meant a massive workload for the remaining two, much of it work they were not trained to perform. It was an ugly choice to make.

The director of the technical department decided to meet the challenge in a creative way. She was going on maternity leave just after the announcement came out, and decided to terminate herself instead of one of her staff. She said that she considered herself the "most expendable" person in the technical department.

Management was shocked by the idea of losing a department manager instead of regular staff. They protested loudly and tried to make Sandra change her mind, but to no avail. Her clear-headed analysis prevailed, and it was decided that after Sandra's departure we would report jointly to a technical manager within the parent company and the CEO of our subsidiary.

Sandra returned from maternity leave and worked until the end of 2002, then left to devote herself to motherhood and free-lance work. As a result of her selfless action, the three of us who remained were able to fulfill our customers' and employer's needs. In the end, however, both of the other two were let go, leaving me to perform all the remaining technical functions until the end of 2005, when I too was laid off.


I had to say that

Josiah Willard Gibbs was the greatest American physicist of the 19th century, perhaps the only world-class American physicist of his time.

He was well-known among his friends and peers not only for his brilliance but also for his extraordinary modesty. They were astounded, therefore, when on one occasion when Gibbs was testifying as an expert witness, and the opposing lawyer asked him what right he had to say such things, he replied "I am the greatest living authority on the matter."

Gibbs's explanation after the fact: "I had to say that; I was on oath".

French fries are not chips

It's commonly held that English "chips" are American "French fries", but I deny it. McDonald's-style fried potatoes are canonical French fries, but they are not canonical chips. Leftpondians don't eat chips very often, and perhaps think of them as "fat French fries" if they don't know any better, but the point is that the two terms refer to different things. Congress is the American Parliament, no doubt, but it would be absurd to say that the terms had the same referent!

Rightpondians often do claim that the fried potato products sold by McDonald's are "chips". Since they have only one term available, they will tend to use it for all fried potato products other than crisps ("potato chips" in American English), whether or not the potatoes are julienned (as in French fries proper) or sliced in large wedges or bars (as in chips proper). But taking a transatlantic perspective, the two terms are not really interchangeable, for they have different prototypes. This is not the case with "crisps" vs. "potato chips", which have only one prototype.

But when an American goes to a place (whether in America or elsewhere; in my case, about 600 meters away) where "chips" are served and openly called by that name, s/he will have quite a different gustatory experience from what results from eating "French fries".

To consider the flip side of the issue for the moment, if I went to England and saw an Erithecus rubecula, I would have to call it a "robin", because no other English term is available. That doesn't mean that I don't know it's a different bird from the specimens of Turdus migratorius that I commonly denote by that term.

Ralph vs. the Tortoise

Consider the following statements:

  1. Ralph believes that Ortcutt is not a spy.
  2. Ralph believes that the man in the brown hat is a spy.
  3. The man in the brown hat is Ortcutt.


  1. Ralph believes of Ortcutt that he is not a spy.
  2. Ralph believes of Ortcutt that he is a spy.

This is apparently no problem, as long as Ralph does not believe "Ortcutt is a spy and Ortcutt is not a spy", which he does not. People with appropriate false beliefs or appropriate ignorance can believe (de re) contradictory things.

But now consider Hofstadter's Tortoise:

  1. The Tortoise affirms "My shell is green".
  2. The Tortoise affirms "My shell is not green".
  3. The Tortoise rejects "My shell is green and my shell is not green".

It seems to follow that:

  1. The Tortoise believes of his shell that it is green.
  2. The Tortoise believes of his shell that is is not green.

Must we accept that the Tortoise's beliefs are not contradictory de re, but only de dicto? The de re version seems exactly parallel to Ralph's de re beliefs. Yet Ralph is merely ignorant of a key point (viz. #3), whereas the Tortoise seems to be "logically insane".

Writing out XML

You can't just embed plain text into an XML element or attribute; character content and attribute values have to be escaped in a number of ways, not necessarily obvious. Here's a checklist of things to make sure to do. (Once again, this post will look terrible in RSS readers that don't fully understand Atom.)

  1. Escape all & characters as &amp;.
  2. Escape all < characters as &lt;.
  3. Escape all > characters as &gt;. Technically it's enough to do so only when they are preceded by ]] in character content, but in my opinion making that check is more trouble than it's worth.
  4. Escape all carriage-return characters as &#xD;. These should be very rare in XML content, as they will have been converted to line-feeds on parsing.
  5. Escape all tab characters in attribute values as &#x9;. You can escape them in character content if you want, but it's not necessary.
  6. Escape all line-feed/newline characters in attribute values as &#xA; (not D as I first wrote).
  7. Output all line-feed/newline characters in character content as the local line terminator: carriage-return (on Mac Classic), line-feed (on Unix) or both (on Windows). You can provide alternative line terminators at user option.
  8. Escape all characters that can't be represented in the output character set. If the output character set is UTF-8 or UTF-16 (in any flavor), this step is not necessary.
  9. Directly output everything else.

I'm glad to say that XOM, my favorite XML tree representation, does all these things in its Serializer class.


My favorite errata page



p. viii: for "ERRATA" read "ERRATUM".

How to write XHTML even if you don't know how

Warning to RSS users: this post may not be legible in newsreaders that don't understand Atom very well.

  1. Put all tag names and attribute names in lower case.
  2. Make sure every start-tag has an end-tag. This rule does not apply to the HTML empty tags, namely basefont, br, area, link, img, param, hr, input, col, frame, and isindex. (If you don't know what some of these are, don't worry about it).
  3. Replace the > at the end of an empty tag with the three-character sequence " />".
  4. Make sure all start-tags and end-tags are properly nested.
  5. Make sure all attribute values are in quotation marks, either single or double.
  6. Make sure attributes like "checked", that don't have values, are written "checked='checked'".
  7. Any & and < characters, even in scripts or stylesheets, must be replaced by &amp; and &lt; respectively.
  8. Don't wrap scripts in comment markers (<!-- ... -->).
  9. Make sure you use the semicolon after an entity reference like &aacute;.

That's all.

Gorillas in the desert

Here's a story about an anthropologist working among the Yaqui, an Indian nation in northern Mexico. In New World Spanish, the word indio 'Indian' has two senses, one neutral, one derogatory, and it is all too commonplace for speakers to slip between the meanings without really being aware of it.

On this particular occasion, the anthropologist was sitting around a fire with some Yaquis. One of them, who was rather large and rather drunk, got up suddenly and began to circle the campfire, beating his chest and shouting "Soy indio ... soy indio ..." (as much as to say, "I'm an [epithet], and what are you going to do about it?").

The anthropologist felt a bit intimidated by this, and decided he had to do something to deflect possible violence. So he too got up, began to circle the fire, beat his chest, and shout "Soy judío ... soy judío ...". It all ended happily.


Why per-CPU pricing for software can be sensible

Contra Tim Bray, per-CPU pricing is actually quite a reasonable thing to do if you think your product is not readily replaceable (that is, if all competitive products are actually substitutes only). It's a form of price discrimination, aka "charge the rich high, the poor low".

This approach is sustainable when 1) you can reliably tell who the rich are, and 2) you can prevent a secondary market from arising (so that the poor sell to the rich at a profit, undercutting you). Because (proprietary) software is copyright, and is licensed not sold, it meets the second condition; predicating high prices on expensive features of the platform meets the first condition.

Quine's Paradox

We all (being reasonable persons and not fanatics) are trapped by Quine's Paradox: namely, to believe a statement p is to believe that p is true, so I believe that each of my beliefs is true. Yet I also believe that some of my beliefs (I know not which) will turn out to be false if and when tested. (I believe I left my glasses in my bedroom today, but beliefs of this sort have turned out to be false often enough...)


I believe that each of my beliefs is true;
I believe that some of my beliefs are false.

Saith Quine: "I for one had hoped for better from reasonable persons."

Typographical variety

  1. The Polish acute accent is shorter and stubbier than the Western one.
  2. French likes to put spaces in front of certain terminal punctuations, notably semicolon and colon, and also inside guillemets.
  3. Quotation marks have at least six flavors in Europe alone:
    • 6-quotes ... 9-quotes (English, Dutch, Italian, Spanish, Turkish)
    • 9-quotes ... 9-quotes (Scandinavian languages)
    • low-9-quotes ... 6-quotes (German, Czech, Slovak)
    • low-9-quotes ... 9-quotes (Hungarian, Polish)
    • guillemets pointing in (Slovene, German sometimes)
    • guillemets pointing out (French, Greek, Russian)
  4. Some languages like initial dashes for dialogue, some don't.
  5. French c with cedilla can be written with a detached comma below, but not so in Portuguese or Catalan. Turkish insists on s with cedilla, Romanian on s with a comma below for their respective sh-sounds. (The story for Gagauz, which is a Turkic language spoken in Romania, is still uncertain.)
  6. Inverted punctuation marks are unique to Spanish.
  7. Lojban uses dots at the beginnings of words. :-)

To continue

Thanks, Tim, for getting me off my butt. I was gonna post today already, I was, I was. Been thinking about it all week. Really. No more excuses. Gonna post. Watch this space.

Oh yes. This is post #160.