Encode-ZapCP1252
view release on metacpan or search on metacpan
0.40 2020-02-04T15:30:50Z
- Added link to Encoding::FixLatin. Suggested by Jonas Smedegaard.
- Removed Pod tests from the distribution.
- Fixed an issue discovered on Perl 5.31 that incorrectly resulted in
the creation of malformed UTF-8 when fixing unicode strings. Thanks to
Karl Williamson for the continuing improvement of Unicode support in
Perl and for the fix to this module.
0.33 2011-11-23T05:19:36Z
- Require Test::Pod 1.41 to support `L<text|url>` syntax in the Pod.
- Support for decoded strings is supported only in 5.8.8 and higher due
to bugs in the handling of UTF-8 characters in regular expressions
prior to that release.
0.32 2010-06-16T22:45:06
- Fixed bug where `zap_cp1252` was failing to return a value when called
in a list or scalar context.
0.31 2010-06-14T03:09:06
- Fixed test failures on Perls less than 5.10.
without doing anything.
- Strings are no longer modified in-place unless the conversion
subroutines are called in a void context.
- The conversion functions may optionally be called with no argument
when run in Perl 5.10 or higher, in which case they will instead act
on `$_`.
0.20 2010-06-12T00:39:35
- Added `local` to examples of changing the mapping tables.
- When the Encode module is installed, zapping and fixing CP1252
gremlins now works in decoded strings, too.
- For convenience, the functions now return the strings they've
modified.
- Shipping with a traditional `Makefile.PL` rather than one that
passes through to Module::Build.
- Moved repository to [GitHub](https://github.com/theory/encode-zapcp1252).
0.12 2008-06-23T17:48:04
- Fixed pasto in the "Support" section of the docs.
- Fixed a typo in the "Synopsis" section of the docs, thanks to David
Beaudet.
Build.PL
Changes
lib/Encode/ZapCP1252.pm
Makefile.PL
MANIFEST This list of files
META.json
META.yml
README.md
t/base.t
t/decoded.t
t/perl-510.t
lib/Encode/ZapCP1252.pm view on Meta::CPAN
In Perl 5.10 and higher, the functions may optionally be called with no
arguments, in which case C<$_> will be converted, instead:
zap_cp1252; # Modify $_ in-place.
fix_cp1252; # Modify $_ in-place.
my $zapped = zap_cp1252; # Copy $_ and return zapped
my $fixed = zap_cp1252; # Copy $_ and return fixed
In Perl 5.8.8 and higher, the conversion will work even when the string is
decoded to Perl's internal form (usually via C<decode 'ISO-8859-1', $text>) or
the string is encoded (and thus simply processed by Perl as a series of
bytes). The conversion will even work on a string that has not been decoded
but has had its C<utf8> flag flipped anyway (usually by an injudicious use of
C<Encode::_utf8_on()>. This is to enable the highest possible likelihood of
removing those CP1252 gremlins no matter what kind of processing has already
been executed on the string.
That said, although C<fix_cp1252()> takes a conservative approach to replacing
text in Unicode strings, it should be used as a very last option. Really,
avoid that situation if you can.
=head1 Conversion Table
t/decoded.t view on Meta::CPAN
unless $] >= 5.008_008;
plan tests => 10;
}
BEGIN { use_ok 'Encode::ZapCP1252' or die; }
use utf8;
my $ascii = q{e , f ,, ... + ++ ^ % S < OE Z ' ' " " * - -- ~ (tm) s > oe z Y};
my $utf8 = q{⬠, Æ â ⦠â â¡ Ë â° Å â¹ Å Å½ â â â â ⢠â â Ë â¢ Å¡ ⺠Šž Ÿ};
# Test conversion of text decoded from ISO-8859-1.
my $fix_me = Encode::decode(
'ISO-8859-1',
join ' ', map { chr } 0x80, 0x82 .. 0x8c, 0x8e, 0x91 .. 0x9c, 0x9e, 0x9f
);
fix_cp1252 $fix_me;
is $fix_me, $utf8, 'Convert decoded from Latin-1 to utf-8';
# Try ascii.
$fix_me = Encode::decode(
'ISO-8859-1',
join ' ', map { chr } 0x80, 0x82 .. 0x8c, 0x8e, 0x91 .. 0x9c, 0x9e, 0x9f
);
zap_cp1252 $fix_me;
is $fix_me, $ascii, 'Convert decoded from Latin-1 to ascii';
# Test conversion with utf8 bit flipped.
$fix_me = join ' ', map { chr } 0x80, 0x82 .. 0x8c, 0x8e, 0x91 .. 0x9c, 0x9e, 0x9f;
Encode::_utf8_on($fix_me);
fix_cp1252 $fix_me;
is $fix_me, $utf8, 'Convert utf8-bit-flipped to utf-8';
# Try it with ascii.
$fix_me = join ' ', map { chr } 0x80, 0x82 .. 0x8c, 0x8e, 0x91 .. 0x9c, 0x9e, 0x9f;
Encode::_utf8_on($fix_me);
zap_cp1252 $fix_me;
is $fix_me, $ascii, 'Convert utf8-bit-flipped to ascii';
# Test conversion to decoded with modified table.
my $euro = $Encode::ZapCP1252::utf8_for{"\x80"};
$Encode::ZapCP1252::utf8_for{"\x80"} = 'E';
$utf8 =~ s/â¬/E/;
$fix_me = Encode::decode(
'ISO-8859-1',
join ' ', map { chr } 0x80, 0x82 .. 0x8c, 0x8e, 0x91 .. 0x9c, 0x9e, 0x9f
);
fix_cp1252 $fix_me;
is $fix_me, $utf8, 'Convert decoded from Latin-1 with modified table';
# Test it with the valid use of one of the gremlins (Ï is [0xcf,0x80]) in UTF-8.
is fix_cp1252 'Ï', 'Ï', 'Should not convert valid use of 0x80';
is zap_cp1252 'Ï', 'Ï', 'Should not zap valid use of 0x80';
# But it should convert it if it's not UTF-8.
my $utf8_euro = Encode::encode_utf8($euro);
$Encode::ZapCP1252::utf8_for{"\x80"} = $utf8_euro;
is fix_cp1252 "\xCF\x80", "\xCF" . $utf8_euro,
'Should convert 0x80 when not parsing UTF-8';
( run in 0.248 second using v1.01-cache-2.11-cpan-0d8aa00de5b )