BOM results from the CPAN

Encode
falls into this category.  See L<perlunicode/"Unicode Encodings"> to
find out how UTF-8 maps Unicode to a byte sequence.

You may also have found out by now why 7bit ISO-2022 cannot comprise
a CCS.  If you look at a byte sequence \x21\x21, you can't tell if
it is two !'s or IDEOGRAPHIC SPACE.  EUC maps the latter to \xA1\xA1
so you have no trouble differentiating between "!!". and S<"  ">.

=head1 Encoding Classification (by Anton Tagunov and Dan Kogai)

This section tries to classify the supported encodings by their 
applicability for information exchange over the Internet and to 
choose the most suitable aliases to name them in the context of 
such communication.

=over 2

=item * 

To (en|de)code encodings marked by C<(**)>, you need 
C<Encode::HanExtra>, available from CPAN.

=back

Encoding names

  US-ASCII    UTF-8    ISO-8859-*  KOI8-R
  Shift_JIS   EUC-JP   ISO-2022-JP ISO-2022-JP-1
  EUC-KR      Big5     GB2312

are registered with IANA as preferred MIME names and may
be used over the Internet.

C<Shift_JIS> has been officialized by JIS X 0208:1997.
L<Microsoft-related naming mess> gives details.

C<GB2312> is the IANA name for C<EUC-CN>.
See L<Microsoft-related naming mess> for details.

C<GB_2312-80> I<raw> encoding is available as C<gb2312-raw>
with Encode. See L<Encode::CN> for details.

  EUC-CN
  KOI8-U        [RFC2319]

have not been registered with IANA (as of March 2002) but
seem to be supported by major web browsers. 
The IANA name for C<EUC-CN> is C<GB2312>.

  KS_C_5601-1987

is heavily misused.
See L<Microsoft-related naming mess> for details.

C<KS_C_5601-1987> I<raw> encoding is available as C<kcs5601-raw>
with Encode. See L<Encode::KR> for details.

  UTF-16 UTF-16BE UTF-16LE

are IANA-registered C<charset>s. See [RFC 2781] for details.
Jungshik Shin reports that UTF-16 with a BOM is well accepted
by MS IE 5/6 and NS 4/6. Beware however that

=over 2

=item *

C<UTF-16> support in any software you're going to be
using/interoperating with has probably been less tested
then C<UTF-8> support

=item *

C<UTF-8> coded data seamlessly passes traditional
command piping (C<cat>, C<more>, etc.) while C<UTF-16> coded
data is likely to cause confusion (with its zero bytes,
for example)

=item *

it is beyond the power of words to describe the way HTML browsers
encode non-C<ASCII> form data. To get a general impression, visit
L<http://www.alanflavell.org.uk/charset/form-i18n.html>.
While encoding of form data has stabilized for C<UTF-8> encoded pages
(at least IE 5/6, NS 6, and Opera 6 behave consistently), be sure to
expect fun (and cross-browser discrepancies) with C<UTF-16> encoded
pages!

=back

The rule of thumb is to use C<UTF-8> unless you know what
you're doing and unless you really benefit from using C<UTF-16>.

  ISO-IR-165    [RFC1345]
  VISCII
  GB 12345
  GB 18030 (**)  (see links below)
  EUC-TW   (**)

are totally valid encodings but not registered at IANA.
The names under which they are listed here are probably the
most widely-known names for these encodings and are recommended
names.

  BIG5PLUS (**)

is a proprietary name. 

=head2 Microsoft-related naming mess

Microsoft products misuse the following names:

=over 2

=item KS_C_5601-1987

Microsoft extension to C<EUC-KR>.

Proper names: C<CP949>, C<UHC>, C<x-windows-949> (as used by Mozilla).

See L<http://lists.w3.org/Archives/Public/ietf-charsets/2001AprJun/0033.html>
( run in 0.504 second using v1.01-cache-2.11-cpan-c966e8aa7e8 )