2009-12-18
Noun-noun Compounds
After the English gloss of each compound, there's a list of non-English languages that use it. If the compound is not used in English, there is a definition as well. The abbreviations are explained below. If you don't care about the Lojban, you can ignore it.
1. The head represents an action, and the modifier then represents the object of that action.
pinsi kilbra = pencil sharpener (Hun)
zgike nunctu = music instruction (Hun)
mirli nunkalte = deer hunting (Hun)
finpe nunkalte = fish hunting (Tur, Kor, Udm, Aba 'fishing')
smacu terkavbu = mousetrap (Tur, Kor, Hun, Udm, Aba)
zdani turni = house ruler (Kar 'host')
zerle'a nunte'a = thief fear (Skt 'fear of thieves')
cevni zekri = god crime (Skt 'offense against the gods')
2. The head represents a set, and the modifier the type of the elements contained in that set.
zdani lijgri = house row
selci lamgri = cell block
karda mulgri = card pack (Swe)
rokci derxi = stone heap (Swe)
tadni girzu = student group (Hun)
remna girzu = human-being group (Qab 'group of people')
cpumi'i lijgri = tractor column (Qab)
cevni jenmi = god army (Skt)
cevni prenu = god folk (Skt)
3. Conversely: the head is an element, and the modifier represents a set in which that element is contained. Implicitly, the meaning of the head is restricted from its usual general meaning to the specific meaning appropriate for elements in the given set. Note the opposition between "zdani linji" in the previous group, and "linji zdani" in this one, which shows why this kind of compound is called "asymmetrical".
carvi dirgo = raindrop (Tur, Kor, Hun, Udm, Aba)
linji zdani = row house
4. The modifier specifies an object and the head a component or detail of that object; the compound as a whole refers to the detail, specifying that it is a detail of that whole and not some other.
junla dadysli = clock pendulum (Hun)
purdi vorme = garden door (Qab)
purdi bitmu = garden wall (Que)
moklu skapi = mouth skin (Imb 'lips')
nazbi kevna = nose hole (Imb 'nostril')
karce xislu = automobile wheel (Chi)
jipci pimlu = chicken feather (Chi)
inji rebla = airplane tail (Chi)
5. Conversely: the modifier specifies a characteristic or important detail of the object described by the head; objects described by the compound as a whole are differentiated from other similar objects by this detail.
pixra cukta = picture book
kerfa silka = hair silk (Kar 'velvet')
plise tapla = apple cake (Tur)
dadysli junla = pendulum clock (Hun)
6. The head specifies a general class of object (a genus), and the modifier specifies a sub-class of that class (a species).,
ckunu tricu = pine tree (Hun, Tur, Hop)
7. The head specifies an object of possession, and the modifier may specify the possessor (the possession may be intrinsic or otherwise). In English, these compounds have an explicit possessive element in them: "lion's mane", "child's foot", "noble's cow".
cinfo kerfa = lion mane (Kor, Tur, Hun, Udm, Qab)
verba jamfu = child foot (Swe)
nixli tuple = girl leg (Swe)
cinfo jamfu = lion foot (Que)
danlu skapi = animal skin (Ewe)
ralju zdani = chief house (Ewe)
jmive munje = living world (Skt)
nobli bakni = noble cow (Skt)
nolraitru ralju = king chief (Skt 'emperor')
8. The head specifies a habitat, and the modifier specifies the inhabitant.
lanzu tumla = family land
9. The head specifies a causative agent, and the modifier specifies the effect of that cause.
kalselvi'i gapci = tear gas (Hun)
terbi'a jurme = disease germ (Tur)
fenki litki = crazy liquid (Hop 'whisky')
pinca litki = urine liquid (Hop 'beer')
10. Conversely: the head specifies an effect, and the modifier specifies its cause.
djacu barna = water mark (Chi)
11. The head specifies an instrument, and the modifier specifies the purpose of that instrument.
taxfu dadgreku = garment rack (Chi)
tergu'i ti'otci = lamp shade (Chi)
xirma zdani horse = house (Chi 'stall')
nuzba tanbo = news board (Chi 'bulletin board')
12. More vaguely: the head specifies an instrument, and the modifier specifies the object of the purpose for which that instrument is used.
cpina rokci = pepper stone (Que 'stone for grinding pepper')
jamfu djacu = foot water (Skt 'water for washing the feet')
grana mudri = post wood (Skt 'wood for making a post')
moklu djacu = mouth water (Hun 'water for washing the mouth')
lanme gerku = sheep dog (dog for working sheep)
13. The head specifies a product from some source, and the modifier specifies the source of the product.
moklu djacu = mouth water (Aba, Qab 'saliva')
ractu mapku = rabbit hat (Rus)
jipci sovda = chicken egg (Chi)
sikcurnu silka = silkworm silk (Chi)
mlatu kalci = cat feces (Chi)
bifce lakse = bee wax (Chi 'beeswax')
cribe rectu = bear meat (Tur, Kor, Hun, Udm, Aba)
solxrula grasu = sunflower oil (Tur, Kor, Hun, Udm, Aba)
bifce jisra = bee juice (Hop 'honey')
tatru litki = breast liquid (Hop 'milk')
kanla djacu = eye water (Kor 'tear')
14. Conversely: the head specifies the source of a product, and the modifier specifies the product.
silna jinto = salt well (Chi)
kolme terkakpa = coal mine (Chi)
ctile jinto = oil well (Chi)
15. The head specifies an object, and the modifier specifies the material from which the object is made. This case is especially interesting, because the referent of the head may normally be made from just one kind of material, which is then overridden in the compound.
rokci cinfo = stone lion
snime nanmu = snow man (Hun)
kliti cipni = clay bird
blaci kanla = glass eye (Hun)
blaci kanla = glass eye (Que 'spectacles')
solji sicni = gold coin (Tur)
solji junla = gold watch (Tur, Kor, Hun)
solji djine = gold ring (Udm, Aba, Que)
rokci zdani = stone house (Imb)
mudri zdani = wood house (Ewe 'wooden house')
rokci bitmu = stone wall (Ewe)
solji carce = gold chariot (Skt)
mudri xarci = wood weapon (Skt 'wooden weapon')
cmaro'i dargu = pebble road (Chi)
sudysrasu = cutci straw shoe (Chi)
16. The head specifies a typical object used to measure a quantity and the modifier specifies something measured. The compound as a whole refers to a given quantity of the thing being measured. English does not have compounds of this form, as a rule.
tumla spisa = land piece (Tur 'piece of land')
tcati kabri = tea cup (Kor, Aba 'cup of tea')
nanba spisa = bread piece (Kor 'piece of bread')
bukpu spisa = cloth piece (Udm, Aba 'piece of cloth')
djacu calkyguzme = water calabash (Ewe 'calabash of water')
17. The head specifies an object with certain implicit properties, and the modifier overrides one of those implicit properties.
kensa bloti = spaceship
bakni verba = cattle child (Ewe 'calf')
18. The modifier specifies a whole, and the head specifies a part which normally is associated with a different whole. The compound then refers to a part of the modifier which stands in the same relationship to the whole modifier as the head stands to its typical whole.
kosta degji coat finger (Hun = coat sleeve)
denci genja tooth root (Imb)
tricu stedu tree head (Imb = treetop)
19. The head specifies the producer of a certain product, and the modifier specifies the product. In this way, the compound as a whole distinguishes its referents from other referents of the head which do not produce the product.
silka curnu silkworm (Tur, Hun, Aba)
20. The head specifies an object, and the modifier specifies another object which has a characteristic property. The compound as a whole refers to those referents of the head which possess the property.
sonci manti = soldier ant
ninmu bakni = woman cattle (Imb 'cow')
mamta degji = mother finger (Imb 'thumb')
cifnu degji = baby finger (Imb 'pinky')
pacraistu zdani = hell house (Skt)
fagri dapma = fire curse (Skt 'curse destructive as fire')
21. As a particular case (when the property is that of resemblance): the modifier specifies an object which the referent of the compound resembles.
grutrceraso jbama = cherry bomb
solji kerfa = gold hair (Hun 'golden hair')
kanla djacu e= ye water (Kar 'spring')
bakni rokci = bull stone (Mon 'boulder')
22. The modifier specifies a place, and the head an object characteristically located in or at that place.
ckana boxfo = bed sheet (Chi)
mrostu mojysu'a = tomb monument (Chi 'tombstone')
jubme tergusni = table lamp (Chi)
foldi smacu = field mouse (Chi)
briju ci'ajbu = office desk (Chi)
rirxe xirma = river horse (Chi 'hippopotamus')
xamsi gerku = sea dog (Chi 'seal')
cagyce'u zdani = village house (Skt)
23. Specifically: the head is a place where the modifier is sold or made available to the public.
cidja barja = food bar (Chi 'restaurant')
cukta barja = book bar (Chi 'library')
24. The modifier specifies the locus of application of the head.
kanla velmikce = eye medicine (Chi)
jgalu grasu = nail oil (Chi 'nail polish')
denci pesxu = tooth paste (Chi)
25. The head specifies an implement used in the activity denoted by the modifier.
me.la.pinpan. bolci = Ping-Pong ball (Chi)
26. The head specifies a protective device against the undesirable features of the referent of the modifier.
carvi mapku = rain cap (Chi)
carvi taxfu = rain garment (Chi 'raincoat')
vindu firgai = poison mask (Chi 'gas mask')
27. The head specifies a container characteristically used to hold the referent of the modifier.
cukta vasru = book vessel (Chi 'satchel')
vanju kabri = wine cup (Chi)
spatrkoka lanka = coca basket (Que)
djacu calkyzme = water calabash (Ewe)
rismi dakli = rice bag (Ewe, Chi)
tcati kabri = tea cup (Chi)
ladru botpi = milk bottle (Chi)
rismi patxu = rice pot (Chi)
festi lante = trash can (Chi)
bifce zdani = bee house (Kor 'beehive')
cladakyxa'i = zdani sword house (Kor 'sheath')
manti zdani = ant nest (Gua 'anthill')
28. The modifier specifies the characteristic time of the event specified by the head.
vensa djedi = spring day (Chi)
crisa citsi = summer season (Chi)
cerni bumru = morning fog (Chi)
critu lunra = autumn moon (Chi)
dunra nicte = winter night (Chi)
nicte ckule = night school (Chi)
29. The modifier specifies a source of energy for the referent of the head.
dikca tergusni = electric lamp (Chi)
ratni nejni = atom energy (Chi)
brife molki = windmill (Tur, Kor, Hun, Udm, Aba)
There are some compounds which don't fall into any of the above categories.
ladru denci = milk tooth (Tur, Hun, Udm, Qab)
kanla denci = eye tooth
It is clear that "tooth" is being specified, and that "milk" and "eye" act as modifiers. However, the relationship between "ladru" and "denci" is something like "tooth which one has when one is drinking milk from one's mother", a relationship certainly present nowhere except in this particular concept. As for "kanla denci", the relationship is not only not present on the surface, it is hardly possible to formulate it at all.
Here are some types of compounds where there is no effective difference between the modifier and the head. In some languages, it is common for these compounds to occur in the opposite order as well.
30. The compound may refer to things which are correctly specified by both components. Some of these instances may also be seen as asymmetrical compounds where the modifier specifies a material.
cipnrstrigi pacru'i = owl demon (Skt)
nolraitru prije = royal sage (Skt)
remna nakni = human-being male (Qab 'man')
remna fetsi = human-being female (Qab 'woman')
sonci tolvri = soldier coward (Que)
panzi nanmu = offspring man (Ewe 'son')
panzi ninmu = offspring woman (Ewe 'daughter')
solji sicni = gold coin (Tur)
solji junla = gold watch (Tur, Kor, Hun)
solji djine = gold ring (Udm, Aba, Que)
rokci zdani = stone house (Imb)
mudri zdani = wooden house (Ewe)
rokci bitmu = stone wall (Ewe)
solji carce = gold chariot (Skt)
mudri xarci = wooden weapon (Skt)
zdani tcadu = home town (Chi)
31. The compound may refer to all things which are specified by either of the compound components. English does not have compounds of this form, as a rule.
nunji'a nunterji'a = victory defeat (Skt 'victory or defeat')
donri nicte = day night (Skt 'day and night')
lunra tarci = moon stars (Skt 'moon and stars')
patfu mamta = father mother (Imb, Kaz, Chi 'parents')
tuple birka = leg arm (Kaz 'extremity')
nuncti nunpinxe = eating drinking (Udm 'cuisine')
bersa tixnu = son daughter (Chi 'children')
32. Alternatively, the compound may refer to things which are specified by either of the compound components or by some more inclusive class of things which the components typify.
curnu jalra = worm beetle (Mon 'insect')
jalra curnu = beetle worm (Mon 'insect')
kabri palta = cup plate (Kaz 'crockery')
jipci gunse = hen goose (Qab 'housefowl')
xrula tricu = flower tree (Chi 'vegetation')
33. The compound components specify crucial or typical parts of the referent of the compound as a whole. English does not have compounds of this form, as a rule.
tumla vacri = land air (Fin 'world')
moklu stedu = mouth head (Aba 'face')
sudysrasu cunmi = hay millet (Qab 'agriculture')
gugde ciste = state system (Mon 'politics')
prenu so'imei = people multitude (Mon 'masses')
djacu dertu = water earth (Chi 'climate')
Here are the explanations of the three-letter language-name abbreviations:
Aba = Abazin
Chi = Chinese
Eng = English
Ewe = Ewe
Fin = Finnish
Geo = Georgian
Gua = Guarani
Hop = Hopi
Hun = Hungarian
Imb = Imbabura Quechua
Kar = Karaitic Hebrew
Kaz = Kazakh
Kor = Korean
Mon = Mongolian
Qab = Qabardian
Que = Quechua
Rus = Russian
Skt = Sanskrit
Swe = Swedish
Tur = Turkish
Udm = Udmurt
2009-11-06
More of my blather
Update: Alas, this service is dead.
2009-10-24
More female programmers
I tried to post this comment to a public site, but failed repeatedly. The topic of the original post isn't relevant to my comment, which was in response to a comment that read, in its entirety:
Why would we would want more female programmers?
My answer:
The world needs more effectively mobilized brains. We can't afford to constrain ourselves on what size or shape or color the bodies are that house those brains. Also, diversity is good in itself: it improves flexible response, and it's silly to throw away a cheap source of diversity.
A major U.S. university with a strong CS program (I am contractually prevented from naming it) that had female CS undergraduate admissions in the single digits year after year was able to raise their admission to the same rate as other engineering programs by changing just one thing: they no longer gave people who already had programming experience preferential admission. There have been no changes in the overall performance of the student body in the years since.
2009-10-22
"Omnilingual"
This is to announce my edited version of H. Beam Piper's classic story of linguistic archaeology on Mars, "Omnilingual". Why edit a classic? Here's my Editor's Introduction:
H. Beam Piper's 1956 story "Omnilingual" is one of the few, and still one of the best, science fiction stories in which the science is linguistic archaeology. While the meat of the story holds up marvelously fifty years later, the particulars are firmly rooted in the 1950s. Everyone smokes like a chimney — on Mars! The women are called girls, and their gender is mentioned at every conceivable opportunity. All the work is still done with pencil and paper and sketching boards and looseleaf notebooks.
My edits, then, are intended to modernize the work, to help the 2009 reader not stumble over the details. Notebooks are computerized; sketchbooks have been replaced by tablets. Gender equality and the metric system are taken for granted. Smoking isn't even mentioned. I wedged in a mention of the Classic Maya decipherment of the 1980s (a counterexample to the story's thesis!), but let one of the characters dismiss it as irrelevant. I set the story, as Piper did, forty years in the future, but that is now 2049 rather than 1996. There are fewer This Is Science Fiction flags, so "Earth" instead of "Terra", "U.N." instead of "Federation Government".
Piper's Mars and his Martians are completely impossible based on what we know of Mars today. Rather than trying to change all that, which would have involved wholesale destruction and re-invention, I have changed the planet's name to Ares after the Greek rather than the Roman god of war. The intention is to suggest someplace analogous to Mars as we know it in 2009, but different in detail. The atmosphere on Ares is thin, but breathable with supplementary oxygen; the humidity, while low, supports plenty of life forms. As for the too-human Martians (or Areans), I have made them an offshoot of Homo sapiens whose presence on the fourth planet from the sun remains a mystery.
However, the characters, the plot, the underlying logic remain the same. Hopefully I haven't damaged the story too much in trying to adjust it to modern taste. Those who prefer the original form can easily find it at Project Gutenberg, who provided the public-domain base text from which this revision was made. They also have the original Frank Kelly Freas drawings, which I didn't feel right about using -- they were made in the 1950s, too, and no longer seemed to fit the revised text.
Read and enjoy!
2009-10-16
Fragments
David Moser's relentlessly self-referential story "This Is the Title of This Story, Which Is Also Found Several Times in the Story Itself" begins simply enough with the fairly ordinary sentence "This is the first sentence of this story."
But by the fourth paragraph, a harbinger of what is to come: "Introduces, in this paragraph, the device of sentence fragments. A sentence fragment. Another. Good device. Will be used more later."
True enough. "Incest. The unspeakable taboo. The universal prohibition. Incest. And notice the sentence fragments? Good literary device. Will be used more later."
A later passage from the same increasingly disconnected tale: "Bizarre. A sentence fragment. Another fragment. Twelve years old. This is a sentence that. Fragmented. And strangling his mother. Sorry, sorry. Bizarre. This. More fragments. This is it. Fragments. The title of this story, which. Blond. Sorry, sorry. Fragment after fragment. Harder. This is a sentence that. Fragments. Damn good device."
Still further down: "The purpose. Of this paragraph. Is to apologize. For its gratuitous use. Of. Sentence fragments. Sorry. "
And then: "Or this sentence fragment? Or three words? Two words? One?"
Getting near the end: "By the throat. Harder. Harder, harder."
Lastly: "This is."
Read. The whole thing. Worthwhile. NSFW, technically.
2009-10-01
Why Are PHBs Stupid?
However we decide to define "manager", this group is certainly now the object of a complex of negative stereotypes. When and how did this start? I don't know, and I welcome suggestions. These attitudes may be connected to the antique European aristocratic disdain for those who are "in trade", and to the (I think related) modern intellectual disdain for the world of business. These attitudes seem to have been imported from the intelligentsia into industry through the medium of engineers and especially programmers, who (at least at lower levels) maintain a very different culture from the "suits" in finance, marketing, product planning, and so on.I think Mark's right to speak of "engineers and especially programmers", and I think the key phrase is "maintain a very different culture". Historically, the boss that most people dealt with was the foreman, which the OED defines in the relevant sense as "the principal workman; specifically, one who has charge of a department of work." You began by doing the work, and if you got good at it, you ended up telling other people with less experience or less competence how to do it instead. This could go right up to the top: Thomas Edison began as an inventor, and wound up running a huge "invention factory", the first modern industrial research lab.
Two factors undermined this, though: the sense that promoting high-quality workmen instead of continuing to take advantage of their work made no sense, and the idea that management was or could be a profession abstracted from the particular work being managed. The first factor appeared particularly strongly in computer programming because of the huge disparity in productivity: the best programmers are literally two orders of magnitude more productive than the average. Losing a top steelworker to foremanship might cost the company the labor of 2-3 standard steelworkers, but losing the productivity of 100 merely competent programmers seemed insane. And of course geeks tend to like their jobs, and to be uninterested in (and incompetent at) people-managing. Companies had to deal with the widespread appearance of workers who did not want to be promoted, ever.
At the same time, the rise of the MBA spread the meme among the suits that managing people was a learned profession like law or medicine or engineering, where you primarily apply what you have learned from books, courses, etc. to the requirements of the job. Before that, management had always been seen as a job, like digging ditches or being President of the United States: you can prepare for it to some extent, but mostly you do a job by applying whatever you have to whatever you need to do.
Making management a profession was arguable; the associated notion that you could manage workers with no understanding of what they did was a disaster. Computer programmers were in the forefront of knowing what had happened: they quickly saw that their bosses had no idea of how the work was done, the necessary conditions for doing it, or the difference between what could be done, what could be done with extraordinary effort, and what could not be done at all. The boss had always been seen as a mean fellow (after all, he tells you what to do and can fire you), but now he also appeared clueless and even stupid, someone who could not be made to understand no matter what.
None of the early citations in the OED, nor the quotes that I find in LION, seem to reflect the modern Dilbertian managerial stereotype. That stereotype clearly predates Dilbert — but when did it arise? and where did it come from?Scott Adams is not only a manager now, he has always been one by training: he was an economics major, not any kind of scientist or engineer, and he got an MBA before he worked with his first geek. He is extraordinarily observant (especially for an MBA, I add snarkily) and he actually does grasp how geeks think, but despite appearances he basically sees them from the outside. When I discovered this, the shock was so great that I started to see him as an outsider mocking my culture rather than an insider mocking its excesses (though to be sure Dilbert is harder on suits than on nerds), and I lost interest in the strip completely.
In this context, we have to return to Andrew's question: What is a manager, anyhow? By now, I suppose that the Dilbert empire employs a certain number of people, whom Scott Adams in some sense manages — does he thereby consider himself a "manager" in the relevant sense?
(Note: Even though Mark says he's been a manager since 1980, I think that industrial research and academia still basically run on the old model, and therefore their managers, including him, are mostly exempt from the trend I am reporting here.)
2009-09-21
Common Lisp symbols bound in more than one namespace
These are the Common Lisp symbols which are bound in more than one namespace: for example, + is both a function (addition) and a variable (the most recent form evaluated by the REPL). The links point into the Common Lisp Hyperspec.
- *
- +
- -
- /
- abort
- and
- atom
- bit
- character
- complex
- cons
- continue
- eql
- error
- float
- function
- lambda
- list
- logical-pathname
- member
- method-combination
- mod
- muffle-warning
- nil
- not
- null
- or
- pathname
- rational
- setf
- store-value
- string
- t
- type
- use-value
- values
- vector
2009-05-24
Two Kinds
There's three kinds of people in the world, those who can count and those who can't.
There's 10 kinds of people on the world, those who can do binary and those who can't.
There's 10 kinds of people in the world, those who understand trinary, those who don't understand trinary, and those who mistake it for binary.
And, of course, there's two kinds of people in the world, those who can tell a joke, and those who can't.
Or perhaps there are really three kinds, those who can tell a joke, those who can't, and those who can but run it into the ground.
But Little Anthony and the Imperials said it best.
2009-05-09
No more anonymous comments; sorry.
2008-12-26
Recycled Nursery Rhymes and Songs for Secular Babies
Air: Three Blind Mice
Dor-i-an, Dor-i-an
See who I am, see who I am,
I am the Drool- and the Burpinator,
I am the Fart- and the Poopinator,
I am the Squeal- and the Howlinator,
I'll be baaaack, I'll be baaaack.
Air: Puttin' on the Ritz
Who's that baby, what is he doin'
He's my grandson, he is a-chewin'
Dor-i-an . . . Chewin' on his bib.
Who's that baby, where is he goin'
I don't know and there is no knowin'
Dor-i-an . . . Chewin' on his bib.
Air: Jesus Loves Me
Grandpa loves me, this I know,
'Cause his caring tells me so,
Little me with him belongs,
Till I'm bold and brave and strong.
Yes, Grandpa loves me (3x)
His caring tells me so.
(This gets changed to Grandma or Mommy or even Grownups on occasion.)
Air: Deck the Halls
Fast away the bottle's draining,
Do-do-do-do-do, do-do-ri-an.
On the bib the drips are raining,
Do-do-do-do-do, do-do-ri-an.
Soon the back we will be pounding,
Do-do-do, do-do-do, Do-ri-an.
And the burps will be resounding,
Do-do-do-do-do, do-do-ri-an.
Air: Tell Me Why
Tell me why the stars do shine,
Tell me why the ivy twines,
Tell me why the sky's so blue,
Tell me, oh tell me, just why I love you.
Nuclear fusion makes stars to shine,
Tropism makes the ivy twine,
Scattering makes the sky so blue,
Gonadal hormones are why I love you.
(This is the only one I didn't make up myself.)
2008-10-14
Converting Restricted XML to Good-Quality JSON
- The XML can't contain mixed content (elements with both children/attributes and text).
- The XML cannot depend on the order of child elements with distinct names (order dependence in children with the same name is okay).
- There can't be any attributes with the same name as child elements.
- There can't be any elements or attributes that differ only in their namespace names.
- Whether it MUST appear at most once (a singleton element) or MAY appear more than once (a multiplex element).
- Whether it only contains text (an element with simple type) or child elements and/or attributes (an element with complex-type).
- A singleton element of simple type, and likewise an attribute, is converted to a JSON simple value: a number or boolean if syntactically possible, otherwise a string.
- A multiplex object of simple type is converted to a JSON array of simple values.
- A singleton element of complex type is converted to a JSON object that maps the local names of child elements and attributes to their content. Namespace names are discarded.
- A multiplex element of complex type is mapped to a JSON array of JSON objects that map the local names of child elements and attributes to their content. Namespace names are discarded.
2008-09-17
I before E except after C
Here's a better version of the little poem. I don't know who wrote it; I touched it up a bit for better rhythm:
When IE and EI both say EE,
Who can tell which it should be?
After C, use E then I;
Otherwise IE will apply.
Some exceptions we may note
Which one needs to learn by rote:
Protein, caffeine, weird, and seize,
And in the U.S., leisure, please.
2008-07-01
2008-06-18
Dorian Sion Cowan
My grandson Dorian was born at 9:08 PM yesterday, June 17, 2008 (New York time). He weighed 9 lb 0.9 oz (4110 g) at birth, and was 22 inches (56 cm) long. And he is the Best Baby In The World.
(Well, when I say that, I make a mental reservation in favor of Irene, Dorian's mommy, who is now almost 21 but was certainly the Best Baby in her day.)
Baby and mother are doing wonderfully well -- Dorian is starting to breastfeed very nicely, and already knows a great many Proto-Indo-European roots. Irene's Caesarean incision is still very sore, and the IV is in her hand, not her arm, which makes handling him a little awkward for her. Her best friends have been hovering around the two of them, and so have Gale and I as far as we have been able. They will be coming home Friday morning.
Anyhow, I sang him a lullaby the night he was born, not that he needed it -- he was pretty well drifting off anyhow. But even though my voice was cracking, I needed to sing it to him. It's by Fred Small, and is called "Everything Possible". This is the slightly altered version of the chorus that Dorian actually got:
You can be anybody you want to be,
You can love whomever you will.
You can travel any country where your heart leads,
And know I will love you still.
You can live by yourself, you can gather friends around,
Or find one special one,
And the only measure of your words and your deeds
Is the love you leave behind you when you're gone.
And this is the second song he heard from me, this morning when I stopped by to see him:
Rockabye Dorian, on the tree-top
When you are fed, your poop will go plop
When you have plopped, your diaper we'll change
And then you'll be cleaned up and happy again.
Okay, it doesn't quite rhyme, but it's his.
Dorian, if you are reading this, you already know your grandfather is a crazy old man who embarrasses the hell out of people. You'll live this one down too.
2008-05-04
Essentialist Explanations, 14th edition
2008-04-25
Eulogy
The following was said of David Ricardo by Maria Edgeworth:
I never argued or discussed a question with any person who argues more fairly, or less for victory and more for truth. He gives full weight to every argument brought against him, and seems not to be on any side of the question for one instant longer than the conviction of his mind is on that side. It seems quite indifferent to him whether you find the truth or whether he finds it, provided it be found.
Or more concisely: He wanted to be right, whether or not he had been right.
Ricardo died at fifty-one. I myself am almost fifty, and if I were to die next year, I hope as much could truthfully be said of me.
2008-03-19
On the word "bumblebee"
The story of the word bumblebee is curious, but (contra Mr. Burns of the Simpsons) certainly doesn't lead back to a form like bumbled bee, in the way that ice cream leads back to iced cream, or the American form skim milk descends from the form skimmed milk still current elsewhere. The bee part is transparent, and there is a Middle English verb bomb(e)len, meaning to make a humming sound, presumably of imitative origin. So there you are.
However, it's clear that the older form was humble-bee, where hum(b)le is an intensive of hum, which is also presumably of imitative origin. Whether bumblebee is a new coinage based on bombelen, or whether it is an alteration of humble-bee by dissimilation, or a mixture of both, it's impossible to say.
But when we look in Pokorny's etymological dictionary of Indo-European for hum, we see it under the root kem²-, as expected by Grimm's Law, and with Lithuanian reflexes in k- and Slavic ones in ch- that also refer to humming noises and bees. That certainly does not sound imitative to me -- the sharp sound of [k] is nothing like a bee hum, which has no beginning and no end. So in the end the obvious imitative nature of bumblebee leads to a riddle wrapped in a mystery inside an enigma.
And there remains at least one dangling oddity: Pokorny also lists an Old Persian -- at least I think that's what "Ai." means -- reflex meaning "yak". Yaks grunt (as the Linnaean name Bos grunniens indicates), they don't hum, and what is Old Persian doing with an inherited word for "yak" anyhow? English, like most modern languages, has borrowed its word from Tibetan.
2008-03-03
Elements or attributes?
General points:
- Attributes are more restrictive than elements, and all designs have some elements, so an all-element design is simplest -- which is not the same as best.
- In a tree-style data model, elements are typically represented internally as nodes, which use more memory than the strings used to represent attributes. Sometimes the nodes are of different application-specific classes, which in many languages also takes up memory to represent the classes.
- When streaming, elements are processed one at a time (possibly even piece by piece, depending on the XML parser you are using), whereas all the attributes of an element and their values are reported at once, which costs memory, particularly if some attribute values are very long.
- Both element content and attribute values need to be escaped, so escaping should not be a consideration in the design.
- In some programming languages and libraries, processing elements is easier; in others, processing attributes is easier. Beware of using ease of processing as a criterion. In particular, XSLT can handle either with equal facility.
- If a piece of data should usually be shown to the user, use an element; if not, use an attribute. (This rule is often violated for one reason or another.)
- If you are extending an existing schema, do things by analogy to how things are done in that schema.
- Sensible schema languages, meaning RELAX NG, treat elements and attributes symmetrically. Older and cruder schema languages tend to have better support for elements.
Using elements:
- If something might appear more than once in a data model, use an element rather than introducing attributes with names like part1, part2, part3 ....
- If order matters between two pieces of data, use elements for them: attributes are inherently unordered.
- If a piece of data has, or might have, its own substructure, use it in an element: getting substructure into an attribute is always messy. Similarly, if the data is a constituent part of some larger piece of data, put it in an element.
- An exception to the previous rule: multiple whitespace-separated tokens can safely be put in an attribute. In principle, the separator can be anything, but schema-language validators are currently only able to handle whitespace, so it's best to stick with that.
- If a piece of data extends across multiple lines, use an element: XML parsers will change newlines in attribute values into spaces.
- If a piece of data is in a natural language, put it in an element so you can use the xml:lang attribute to label the language being used. Some kinds of natural-language text, like Japanese, also require annotations that are conventionally represented using child elements; right-to-left languages like Hebrew and Arabic may similarly require child elements to manage bidirectionality properly.
Using attributes:
- If the data is a code from an enumeration, code list, or controlled vocabulary, put it in an attribute if possible. For example, language tags, currency codes, medical diagnostic codes, etc. are best handled as attributes.
- If a piece of data is really metadata on some other piece of data (for example, representing a class or role that the main data serves, or specifying a method of processing it), put it in an attribute if possible.
- In particular, if a piece of data is an ID (either a label or a reference to a label elsewhere in the document) for some other piece of data, put the identifying piece in an attribute. When it's a label, use the name xml:id for the attribute.
- Hypertext references (hrefs) are conventionally put in attributes.
- If a piece of data is applicable to an element and any descendant elements unless it is overridden in some of them, it is conventional to put it in an attribute. Well-known examples are xml:lang, xml:space, xml:base, and namespace declarations.
- If terseness is really the most important thing, use attributes, but consider gzip compression instead -- it works very well on documents with highly repetitive structures.
Michael Kay says:
Those with a little experience express their opinions passionately.
Experts tell you there is no right answer.
I say:
Newbies always ask:
"Elements or attributes?
Which will serve me best?"
Those who know roar like lions;
Wise hackers smile like tigers.
--a tanka, or extended haiku
Final words:
Break any or all of these rules rather than create a crude, arbitrary, disgusting mess of a design if that's what following them slavishly would give you. In particular, random mixtures of attributes and child elements are hard to follow and hard to use, though it often makes good sense to use both when the data clearly fall into two different groups such as simple/complex or metadata/data.
2008-02-07
Which characters are excluded in XML 5th Edition names?
The list of allowed name characters in the XML 1.0 Fifth Edition looks pretty miscellaneous. The clue to what's really going on is that unlike the rule of earlier XML 1.0 versions, where everything not permitted was forbidden, now everything that is not forbidden is permitted. (I emphasize that this is only about name characters: every character is and always has been permitted in running text and attribute values except the ASCII controls.)
So what's forbidden, and why?
- The ASCII control characters and their 8-bit counterparts. Obviously.
- The ASCII and Latin-1 symbolic characters, with the exceptions of hyphen, period, colon, underscore, and middle dot, which have always been permitted in XML names. These characters are commonly used as syntax delimiters either in XML itself or in other languages, and so are excluded.
- The Greek question mark, which looks like a semicolon and is canonically equivalent to a regular semicolon.
- The General Punctuation block of Unicode, with the exceptions of the zero-width joiner, zero-width non-joiner, undertie, and character-tie characters, which are required in certain languages to spell words correctly. Various kinds of blank spaces and assorted punctuation don't make sense in names.
- The various Unicode symbols blocks reserved for "pattern syntax", from U+2190 to U+2BFF. These characters should never appear in identifiers of any sort, as they are reserved for use as syntactic delimiters in future languages that exploit non-ASCII syntax. Many are assigned, some are not.
- The Ideographic Description Characters block, which is used to describe (not create) uncoded Chinese characters.
- The surrogate code units (which don't correspond to Unicode characters anyhow) and private-use characters. Using the latter, in names or otherwise, is very bad for interoperability.
- The Plane 0 non-characters at U+FDD0 to U+FDEF, U+FFFE, and U+FFFF. The non-characters on the other planes are allowed, not because they are a good idea, but to simplify implementation.
Note that the undertie and character tie, the European digits 0-9, and the diacritics in the Combining Characters block are not permitted at the start of a name. Other characters could have sensibly been excluded, particularly combining characters that don't happen to be in the Combining Characters block, but it simplifies implementation to permit them.
This list is intentionally sparse. The new Appendix J gives a simplified set of non-binding suggestions for choosing names that are actually sensible.
2008-02-06
Who do I work for?
Well, a company that provides an email service with about 107 users, and a calendar service with about 106 users, and a news syndicate with about 104 sources, and a video sharing facility that displays about 108 video views a day, and an image index with about 109 images. And it connects about 105 advertisers with about 105 online publishers and 103 offline ones, and provides online wallets for about 106 buyers and 105 sellers, and is localized in about 102 interface languages, and employs about 104 people, and is rated 100 in the list of best companies to work for. And it is not best known for any of these things.
Who are they?
10100.