2006-11-12

An annoying ambiguity about which nothing can be done now

The phrase "COMBINING DOUBLE" in a Unicode character can mean either of two things. Sometimes the diacritical mark is doubled with respect to some other mark:

  • U+030B COMBINING DOUBLE ACUTE ACCENT  ̋
  • U+030E COMBINING DOUBLE VERTICAL LINE ABOVE  ̎
  • U+030F COMBINING DOUBLE GRAVE ACCENT  ̏
  • U+0333 COMBINING DOUBLE LOW LINE  ̳
  • U+033F COMBINING DOUBLE OVERLINE  ̿
  • U+0348 COMBINING DOUBLE VERTICAL LINE BELOW  ͈
  • U+035A COMBINING DOUBLE RING BELOW  ͚
  • U+20E6 COMBINING DOUBLE VERTICAL STROKE OVERLAY  ⃦

But sometimes it means that the mark extends over two characters, the one it applies to and the following one:

  • U+035D COMBINING DOUBLE BREVE  ͝ 
  • U+035C COMBINING DOUBLE BREVE BELOW  ͜ 
  • U+035E COMBINING DOUBLE MACRON  ͞ 
  • U+035F COMBINING DOUBLE MACRON BELOW  ͟ 
  • U+0360 COMBINING DOUBLE TILDE  ͠ 
  • U+0361 COMBINING DOUBLE INVERTED BREVE  ͡ 
  • U+0362 COMBINING DOUBLE RIGHTWARDS ARROW BELOW  ͢ 

Of course U+1D18A MUSICAL SYMBOL COMBINING DOUBLE TONGUE  𝆊 is something else again.

Thank you. I feel much better now.

3 comments:

Anonymous said...

Would it be possible to keep the codes and characters and properties but clarify their names? Character names are meant to be displayed to humans, most applications should have no trouble updating them and legacy documentation with obviously similar old names is no big deal.
For example, COMBINING DOUBLE FOO could be renamed to COMBINING DOUBLE FOO MARK or COMBINING TWIN FOO in your first set of examples and COMBINING WIDTH 2 FOO in the second set.

John Cowan said...

The names of Unicode characters cannot be changed, although clarifying information can be added outside the name.

Michael Everson said...

I thought I was the only one irked by that.