When I was helping to develop the spec for URIs beginning urn:publicid:
, which are the URI equivalent of SGML and XML public identifiers (either formal or not), I worked out this table of what was and was not legal according to publicid syntax, URN syntax, and URI syntax (I'm referencing an older RFC for URI syntax rather than the current one, because I used the older one and some terminology has changed.)
Character Name(s) | Pubid | URN | URI | Status |
---|---|---|---|---|
LATIN CAPITAL LETTER ? | yes | upper | lowalpha | NORM |
LATIN SMALL LETTER ? | yes | lower | upalpha | NORM |
DIGIT * | yes | number | digit | NORM |
HYPHEN-MINUS | yes | other | mark | NORM |
LEFT PARENTHESIS | yes | other | mark | NORM |
RIGHT PARENTHESIS | yes | other | mark | NORM |
FULL STOP | yes | other | mark | NORM |
EXCLAMATION MARK | yes | other | mark | NORM |
ASTERISK | yes | other | mark | NORM |
LOW LINE | yes | other | mark | NORM |
PLUS SIGN | yes | other | reserved | AVAIL |
COMMA | yes | other | reserved | AVAIL |
COLON | yes | other | reserved | AVAIL |
EQUALS SIGN | yes | other | reserved | AVAIL |
SEMICOLON | yes | other | reserved | AVAIL |
COMMERCIAL AT | yes | other | reserved | AVAIL |
DOLLAR SIGN | yes | other | reserved | AVAIL |
QUESTION MARK | yes | reserved | reserved | ENCODE |
SOLIDUS | yes | reserved | reserved | ENCODE |
NUMBER SIGN | yes | reserved | delims | ENCODE |
PERCENT SIGN | yes | reserved | delims | ENCODE |
SPACE | yes | excluded | space | ENCODE |
APOSTROPHE | yes | excluded | mark | ENCODE |
AMPERSAND | no | excluded | reserved | AVAIL |
TILDE | no | excluded | mark | NULL |
REVERSE SOLIDUS | no | excluded | delims | NULL |
QUOTATION MARK | no | excluded | delims | NULL |
LESS-THAN SIGN | no | excluded | delims | NULL |
GREATER-THAN SIGN | no | excluded | delims | NULL |
LEFT SQUARE BRACKET | no | excluded | unwise | NULL |
RIGHT SQUARE BRACKET | no | excluded | unwise | NULL |
CIRCUMFLEX | no | excluded | unwise | NULL |
GRAVE ACCENT | no | excluded | unwise | NULL |
LEFT CURLY BRACE | no | excluded | unwise | NULL |
VERTICAL LINE | no | excluded | unwise | NULL |
RIGHT CURLY BRACE | no | excluded | unwise | NULL |
What the codewords in the tables mean:
URN | ||
upper, lower, number, other | MAY be used without %-encoding. | |
reserved | SHOULD NOT be used without %-encoding. | |
excluded | MUST NOT be used without %-encoding. | |
URI | ||
lowalpha, upalpha, digits, mark | MAY be used without %-encoding; %-encoding MUST NOT affect semantics. | |
reserved | MAY be used without %-encoding; %-encoding MAY affect semantics. | |
space, delims, unwise | MUST NOT be used without %-encoding. | |
Status | ||
NORM | No encoding needed, can't be used as syntax. | |
ENCODE | MUST be encoded (%-encoded or privately). | |
AVAIL | Available for use as syntax character if literal use is %-encoded (AMPERSAND has no literal use). | |
NULL | Not usable in pubids, included for completeness. |
1 comment:
just linked this article on my facebook account. it’s a very interesting article for all.
Urns
Post a Comment