Encode
view release on metacpan or search on metacpan
lib/Encode/Supported.pod view on Meta::CPAN
MacRomanian
MacRumanian
Latin3[1] iso-8859-3
Latin4[2] iso-8859-4
Cyrillics iso-8859-5 cp855 cp1251 MacCyrillic
(See also next section) cp866 MacUkrainian
Arabic iso-8859-6 cp864 cp1256 MacArabic
cp1006 MacFarsi
Greek iso-8859-7 cp737 cp1253 MacGreek
cp869 (DOSGreek2)
Hebrew iso-8859-8 cp862 cp1255 MacHebrew
Turkish iso-8859-9 cp857 cp1254 MacTurkish
Nordics iso-8859-10 cp865
cp861 MacIcelandic
MacSami
Thai iso-8859-11[3] cp874 MacThai
(iso-8859-12 is nonexistent. Reserved for Indics?)
Baltics iso-8859-13 cp775 cp1257
Celtics iso-8859-14
Latin9 [4] iso-8859-15
Latin10 iso-8859-16
Vietnamese viscii cp1258 MacVietnamese
----------------------------------------------------------------
[1] Esperanto, Maltese, and Turkish. Turkish is now on 8859-9.
[2] Baltics. Now on 8859-10, except for Latvian.
[3] TIS 620 + Non-Breaking Space (0xA0 / U+00A0)
[4] Nicknamed Latin0; the Euro sign as well as French and Finnish
letters that are missing from 8859-1 were added.
All cp* are also available as ibm-*, ms-*, and windows-* . See also
L<http://czyborra.com/charsets/codepages.html>.
Macintosh encodings don't seem to be registered in such entities as
IANA. "Canonical" names in Encode are based upon Apple's Tech Note
1150. See L<http://developer.apple.com/technotes/tn/tn1150.html>
for details.
=item KOI8 - De Facto Standard for the Cyrillic world
Though ISO-8859 does have ISO-8859-5, the KOI8 series is far more
popular in the Net. L<Encode> comes with the following KOI charsets.
For gory details, see L<http://czyborra.com/charsets/cyrillic.html>
----------------------------------------------------------------
koi8-f
koi8-r cp878 [RFC1489]
koi8-u [RFC2319]
----------------------------------------------------------------
=back
=head2 gsm0338 - Hentai Latin 1
GSM0338 is for GSM handsets. Though it shares alphanumerals with
ASCII, control character ranges and other parts are mapped very
differently, mainly to store Greek characters. There are also escape
sequences (starting with 0x1B) to cover e.g. the Euro sign.
This was once handled by L<Encode::Bytes> but because of all those
unusual specifications, Encode 2.20 has relocated the support to
L<Encode::GSM0338>. See L<Encode::GSM0338> for details.
=over 2
=item gsm0338 support before 2.19
Some special cases like a trailing 0x00 byte or a lone 0x1B byte are not
well-defined and decode() will return an empty string for them.
One possible workaround is
$gsm =~ s/\x00\z/\x00\x00/;
$uni = decode("gsm0338", $gsm);
$uni .= "\xA0" if $gsm =~ /\x1B\z/;
Note that the Encode implementation of GSM0338 does not implement the
reuse of Latin capital letters as Greek capital letters (for example,
the 0x5A is U+005A (LATIN CAPITAL LETTER Z), not U+0396 (GREEK CAPITAL
LETTER ZETA).
The GSM0338 is also covered in Encode::Byte even though it is not
an "extended ASCII" encoding.
=back
=head2 CJK: Chinese, Japanese, Korean (Multibyte)
Note that Vietnamese is listed above. Also read "Encoding vs Charset"
below. Also note that these are implemented in distinct modules by
countries, due to the size concerns (simplified Chinese is mapped
to 'CN', continental China, while traditional Chinese is mapped to
'TW', Taiwan). Please refer to their respective documentation pages.
=over 2
=item Encode::CN -- Continental China
Standard DOS/Win Macintosh Comment/Reference
----------------------------------------------------------------
euc-cn [1] MacChineseSimp
(gbk) cp936 [2]
gb12345-raw { GB12345 without CES }
gb2312-raw { GB2312 without CES }
hz
iso-ir-165
----------------------------------------------------------------
[1] GB2312 is aliased to this. See L<Microsoft-related naming mess>
[2] gbk is aliased to this. See L<Microsoft-related naming mess>
=item Encode::JP -- Japan
Standard DOS/Win Macintosh Comment/Reference
----------------------------------------------------------------
euc-jp
shiftjis cp932 macJapanese
7bit-jis
iso-2022-jp [RFC1468]
iso-2022-jp-1 [RFC2237]
jis0201-raw { JIS X 0201 (roman + halfwidth kana) without CES }
jis0208-raw { JIS X 0208 (Kanji + fullwidth kana) without CES }
( run in 0.646 second using v1.01-cache-2.11-cpan-5511b514fd6 )