Convert-Translit
view release on metacpan or search on metacpan
lib/Convert/rfc1345 view on Meta::CPAN
are due to demands of the compositional methods of these techniques.
2.2 ISO Official Long Descriptive Character Name
For all mnemonics, the character for which it stands is indicated in
the following table by a long descriptive name. This name is
identical to the ISO name of the character as given in reference (2).
For a few characters that are not included there, descriptive names
of the same kind are introduced in this memo. The source of each
character is stated in the table after the name and should be
consulted for a reliable identification of the character.
These long descriptive names consists only of the capital Latin
letters of the invariant part of ISO 646, the digits, "-", and SPACE.
Digits are only used in names of ideographic and Hangul characters
and never as the first character.
2.3 The 2-character Mnemonics
The two-character mnemonics include various accented Latin letters,
Greek, Cyrillic, Hebrew, Arabic, Hiragana and Katakana. Also a fair
number of special characters are included. Almost all ISO or ISO
registered 7- and 8-bit graphical coded character sets are covered
with these two-character mnemonics.
The two characters are chosen so the graphical appearance in the
reference set resembles as much as possible (within the possibilities
available) the graphical appearance of the character. The basic
character set of ISO 646 is used as the reference set, as mentioned
above.
The characters in the reference character set are chosen to represent
themselves.
For control characters from ISO 646 the two-character acronyms of ISO
2047 are used as mnemonics. For the other control characters of ISO
6429, two-character mnemonics have been selected based on the
variable-length acronyms used in that standard.
Letters, including Greek, Cyrillic, Arabic and Hebrew, are
represented with the base letter as the first letter, and the second
letter represents an accent or relation to a non-Latin script. Non-
Latin letters are transliterated to Latin letters, following
transliteration standards as closely as possible. This is also done
with the Latin letters such as ETH and THORN, and the
Danish/Norwegian/Swedish letter A WITH RING ABOVE is transliterated
into "aa".
Simonsen [Page 3]
RFC 1345 Character Mnemonics & Character Sets June 1992
After a letter, the second character signifies the following:
Exclamation mark ! Grave
Apostrophe ' Acute accent
Greater-Than sign > Circumflex accent
Question Mark ? tilde
Hyphen-Minus - Macron
Left parenthesis ( Breve
Full Stop . Dot Above
Colon : Diaeresis
Comma , Cedilla
Underline _ Underline
Solidus / Stroke
Quotation mark " Double acute accent
Semicolon ; Ogonek
Less-Than sign < Caron
Zero 0 Ring above
Two 2 Hook
Nine 9 Horn
Equals = Cyrillic
Asterisk * Greek
Percent sign % Greek/Cyrillic special
Plus + smalls: Arabic, capitals: Hebrew
Three 3 some Latin/Greek/Cyrillic letters
Four 4 Bopomofo
Five 5 Hiragana
Six 6 Katakana
In designing the mnemonics the following special characters were
reserved: The ampersand is reserved as an intro character, indicating
that the following string is in the mnemonic character set. The
underline character is reserved for the variable-length mnemonics.
This use does not eliminate usage as an accent or language
identifier.
Special characters are encoded with some mnemonic value. These are
not systematic thruout, but most mnemonics start with a related
special character of the reference set.
2.4 The Variable-length Character Mnemonics
The Variable-length Character Mnemonics are primarily meant for the
ideographic characters in larger Asian character sets, but are also
used for accented characters with several accents and some special
characters. To have the mnemonics as short as possible, which both
saves storage and is easier to input, a quite short name is
preferred. Considering the Chinese standard GB 2312-1980, the
Japanese standards JIS X0208 and JIS X0212, and the Korean standard
KS C 5601, they are all given by row and column numbers between 1 and
94. So two positions for row and column and a character set
identifier of one character would be almost as short as possible.
The following character set identifiers are defined:
Simonsen [Page 4]
RFC 1345 Character Mnemonics & Character Sets June 1992
c GB 2312-1980
j JIS X0208-1990
J JIS X0212-1990
k KS C 5601-1987
This system for the representation of ideographic characters and
Hangul characters is not truly mnemonic, but it provides short
representations that are easy to connect to the corresponding
( run in 0.851 second using v1.01-cache-2.11-cpan-39bf76dae61 )