Convert-Translit
view release on metacpan or search on metacpan
lib/Convert/rfc1345 view on Meta::CPAN
characters are listed with their mnemonic in ascending order. A
character mnemonic of "??" indicates that the position is unused. A
character mnemonic of "__" indicates that the character set is not
completely defined with the specifications in this memo.
"&code2" has 2 parameters specifying the row and column in certain
16-bit character sets. The value 32 must be added to obtain the
first and second byte respectively. Mnemonics can be specified after
the "&code2" specification as mentioned for the "&code"
specification.
"&codex" has 5 parameters, specifying the character set prefix
string, the start row number, the end row number, the start column
number and the end column number respectively. This is equivalent to
specifying a series of mnemonics of the form "nrrcc" where "n" is the
character set name prefix string, "rr" is the row number running from
the specified start row number to the end row number, and "cc" is the
column number running from the specified start column number to the
end column number. The thereby created series mnemonics are
allocated to code positions which are added 32 to the row and column
numbers to get the row and column octet.
"&duplicate" has a special meaning indicating that a position is
being used for more than one character. This is an ugly convention
but it is a sad fact of life that same code in one coded character
set can mean different characters. "&duplicate" takes two parameters
Simonsen [Page 43]
RFC 1345 Character Mnemonics & Character Sets June 1992
- the first is the code to be duplicated, the other is the new
mnemonic.
"&rem" is followed by text to explain something in the table to a
human reader. All lines in such a remark has to start with this
keyword.
"&comb2" specifies a combination of two characters which signifies a
third character. All characters in the specification are given by
their mnemonic. The two combining characters must be specified
previously in the code table. The first combining character is
specified as the first character after the keyword, and then the
following pairs of characters are the second combining character and
the result, respectively. The specification can be repeated,
terminated by an occurrence of a keyword.
4.3 Mnemonic charsets
The following is compatible with current practice on the internet
within EUnet - the European not-for-profit networking organisation in
Europe and North Africa currently operating in 24 countries.
The mnemonic charsets are a family of charsets which have the
facility that within the relevant parts of the message, encoded in an
ordinary coded character set, text may have occurrences of the
following sequence: an intro character sequence, followed by a string
of characters that represent a character mnemonic, as described
below. Similarly, the intro character sequence may be doubled,
indicating a single occurrence of the respective symbols in decoded
format.
Note that many characters within a mnemonic character set may be
represented in two different ways. Normally the character itself is
used, but it is also possible to use the mnemonic allocated to the
character in a mnemonic sequence.
In this way all characters with assigned mnemonics can be represented
without information loss in any character set, which contains the
invariant ISO 646 characters as a subset. As a consequence, using a
mnemonic character set all these characters can be generated
uniformly on all keyboards and presented uniformly on all terminal
equipment, whenever the real character is not available.
Data encoded in a mnemonic charset is intended to be read by the end
user possibly without further treatment. If the transport encoding
and the presentation encoding for the user differ, it is recommended
that the data be translated into a mnemonic representation in the
presentation encoding.
A mnemonic charset is specified with the name
"mnemonic+charset+intro" where "mnemonic" is written as given and
"charset" and "intro" is specified as described below. The mnemonic
charset "mnemonic" is a shorthand for "mnemonic+ascii+38". The
Simonsen [Page 44]
RFC 1345 Character Mnemonics & Character Sets June 1992
mnemonic charset "mnem" is a shorthand for "mnemonic+ascii+8200".
It is discouraged to use mnemonics for Chinese characters of either
Chinese, Japanese or Korean origin, as the probability that the end
user equipment can deal with the original encoding is very high for
the intended receiver, and the mnemonics for such Chinese characters
described in this memo convey very little meaning to humans.
4.3.1 charset
The charset is given as one of the charset names in this memo and is
the encoding used for the transport. It cannot be a mnemonic
charset.
4.3.2 Intro
The intro character sequence is given as the decimal value of the
intro characters in the transport character set. There may be up to
two characters used in the intro character sequence, and the decimal
value for two-character intro sequences are then the first character
value multiplied with 256 to the power of the number of octets used
in the character set, plus the second character value. The
recommended value is 38 for the ampersand (&) character in ASCII.
Another common value is 29 for the control character "Group
Separator", or 8200 for "space" followed by "backspace", which may be
convenient when operating in some environments, and ordinary text is
not changed. Only the ampersand character may be chosen as intro
from the invariant ISO 646 charset, but any character not in the
invariant ISO 646 character can be used as intro. The intro
character sequence is used for introducing character mnemonics when a
( run in 0.786 second using v1.01-cache-2.11-cpan-df04353d9ac )