ShiftJIS-CP932-MapUTF
view release on metacpan or search on metacpan
Currently, only coderefs are allowed as C<UNICODE_CALLBACK>.
A string returned from the coderef is inserted
in place of the unmapped character.
A coderef as C<UNICODE_CALLBACK> is called with one or more arguments.
If the unmapped character is a partial character (an illegal byte),
the first argument is C<undef>
and the second argument is an unsigned integer representing the byte.
If not partial, the first argument is an unsigned interger
representing a Unicode code point.
For example, characters unmapped to CP-932 are
converted to numerical character references for HTML 4.01.
sub toHexNCR {
my ($char, $byte) = @_;
return sprintf("&#x%x;", $char) if defined $char;
die sprintf "illegal byte 0x%02x was found", $byte;
}
$cp932 = utf8_to_cp932 (\&toHexNCR, $utf8_string);
$cp932 = unicode_to_cp932(\&toHexNCR, $unicode_string);
$cp932 = utf16le_to_cp932(\&toHexNCR, $utf16le_string);
The return value of C<UNICODE_CALLBACK> must be legal in CP-932.
C<UNICODE_OPTION> may be specified after C<STRING>. They can be combined
like C<'fg'> and C<'gsf'> (the order does not matter).
'g' add mappings of Gaiji (user defined characters)
[0xF040 to 0xF9FC (rows 95 to 114) in CP-932]
from Unicode's PUA [0xE000 to 0xE757] (1880 characters).
's' add mappings of undefined Single-byte characters:
U+0080 => 0x80, U+F8F0 => 0xA0,
U+F8F1 => 0xFD, U+F8F2 => 0xFE, U+F8F3 => 0xFF.
'f' add some Fallback mappings from Unicode to CP-932.
The characters additionally mapped are
some characters in latin-1 region [U+00A0..U+00FF], and
HIRAGANA LETTER VU [U+3094, to KATAKANA LETTER VU, 0x8394].
=over 4
=item C<utf8_to_cp932([UNICODE_CALLBACK,] STRING [, UNICODE_OPTION])>
Converts UTF-8 to CP-932.
=item C<unicode_to_cp932([UNICODE_CALLBACK,] STRING [, UNICODE_OPTION])>
Converts Unicode to CP-932.
This B<Unicode> is coded in the Perl's internal format (see F<perlunicode>).
If not flagged with C<SVf_UTF8>, upgraded as an ISO 8859-1 (latin1) string.
B<This function is provided only for Perl 5.6.1 or later, and via XS.>
=item C<utf16_to_cp932([UNICODE_CALLBACK,] STRING [, UNICODE_OPTION])>
Converts UTF-16 (with or w/o C<BOM>) to CP-932.
=item C<utf16le_to_cp932([UNICODE_CALLBACK,] STRING [, UNICODE_OPTION])>
Converts UTF-16LE to CP-932.
=item C<utf16be_to_cp932([UNICODE_CALLBACK,] STRING [, UNICODE_OPTION])>
Converts UTF-16BE to CP-932.
=item C<utf32_to_cp932([UNICODE_CALLBACK,] STRING [, UNICODE_OPTION])>
Converts UTF-32 (with or w/o C<BOM>) to CP-932.
=item C<utf32le_to_cp932([UNICODE_CALLBACK,] STRING [, UNICODE_OPTION])>
Converts UTF-32LE to CP-932.
=item C<utf32be_to_cp932([UNICODE_CALLBACK,] STRING [, UNICODE_OPTION])>
Converts UTF-32BE to CP-932.
=back
=head2 Export
B<By default:>
cp932_to_utf8 utf8_to_cp932
cp932_to_utf16le utf16le_to_cp932
cp932_to_utf16be utf16be_to_cp932
cp932_to_unicode unicode_to_cp932 (only for XS)
B<On request:>
cp932_to_utf32le utf32le_to_cp932
cp932_to_utf32be utf32be_to_cp932
utf16_to_cp932 [*]
utf32_to_cp932 [*]
[*] Their counterparts C<cp932_to_utf16()> and C<cp932_to_utf32()>
are not implemented yet. They need more investigation
on return values from C<SJIS_CALLBACK>...
(concatenation needs recognition of and coping with C<BOM>)
=head1 CAVEAT
Pure Perl edition of this module doesn't understand
any logically wide characters (see F<perlunicode>).
Use C<utf8::decode>/C<utf8::encode> (see F<utf8>) on Perl 5.7 or later
if necessary.
=head1 AUTHOR
SADAHIRO Tomoyuki <SADAHIRO@cpan.org>
Copyright(C) 2001-2007, SADAHIRO Tomoyuki. Japan. All rights reserved.
This module is free software; you can redistribute it and/or modify
it under the same terms as Perl itself.
=head1 SEE ALSO
=over 4
=item Microsoft PRB, Article ID: Q170559
Conversion Problem Between Shift-JIS and Unicode
=item cp932 to Unicode table
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit932.txt
http://www.microsoft.com/globaldev/reference/dbcs/932.htm
=back
=cut
( run in 1.508 second using v1.01-cache-2.11-cpan-2398b32b56e )