decoded results from the CPAN

Encode-ZapCP1252

view release on metacpan or search on metacpan

0.40  2020-02-04T15:30:50Z
      - Added link to Encoding::FixLatin. Suggested by Jonas Smedegaard.
      - Removed Pod tests from the distribution.
      - Fixed an issue discovered on Perl 5.31 that incorrectly resulted in
        the creation of malformed UTF-8 when fixing unicode strings. Thanks to
        Karl Williamson for the continuing improvement of Unicode support in
        Perl and for the fix to this module.

0.33  2011-11-23T05:19:36Z
      - Require Test::Pod 1.41 to support `L<text|url>` syntax in the Pod.
      - Support for decoded strings is supported only in 5.8.8 and higher due
        to bugs in the handling of UTF-8 characters in regular expressions
        prior to that release.

0.32  2010-06-16T22:45:06
      - Fixed bug where `zap_cp1252` was failing to return a value when called
        in a list or scalar context.

0.31  2010-06-14T03:09:06
      - Fixed test failures on Perls less than 5.10.

Changes view on Meta::CPAN

        without doing anything.
      - Strings are no longer modified in-place unless the conversion
        subroutines are called in a void context.
      - The conversion functions may optionally be called with no argument
        when run in Perl 5.10 or higher, in which case they will instead act
        on `$_`.

0.20  2010-06-12T00:39:35
      - Added `local` to examples of changing the mapping tables.
      - When the Encode module is installed, zapping and fixing CP1252
        gremlins now works in decoded strings, too.
      - For convenience, the functions now return the strings they've
        modified.
      - Shipping with a traditional `Makefile.PL` rather than one that
        passes through to Module::Build.
      - Moved repository to [GitHub](https://github.com/theory/encode-zapcp1252).

0.12  2008-06-23T17:48:04
      - Fixed pasto in the "Support" section of the docs.
      - Fixed a typo in the "Synopsis" section of the docs, thanks to David
        Beaudet.

MANIFEST view on Meta::CPAN

Build.PL
Changes
lib/Encode/ZapCP1252.pm
Makefile.PL
MANIFEST			This list of files
META.json
META.yml
README.md
t/base.t
t/decoded.t
t/perl-510.t

lib/Encode/ZapCP1252.pm view on Meta::CPAN


In Perl 5.10 and higher, the functions may optionally be called with no
arguments, in which case C<$_> will be converted, instead:

  zap_cp1252; # Modify $_ in-place.
  fix_cp1252; # Modify $_ in-place.
  my $zapped = zap_cp1252; # Copy $_ and return zapped
  my $fixed = zap_cp1252; # Copy $_ and return fixed

In Perl 5.8.8 and higher, the conversion will work even when the string is
decoded to Perl's internal form (usually via C<decode 'ISO-8859-1', $text>) or
the string is encoded (and thus simply processed by Perl as a series of
bytes). The conversion will even work on a string that has not been decoded
but has had its C<utf8> flag flipped anyway (usually by an injudicious use of
C<Encode::_utf8_on()>. This is to enable the highest possible likelihood of
removing those CP1252 gremlins no matter what kind of processing has already
been executed on the string.

That said, although C<fix_cp1252()> takes a conservative approach to replacing
text in Unicode strings, it should be used as a very last option. Really,
avoid that situation if you can.

=head1 Conversion Table

t/decoded.t view on Meta::CPAN

        unless $] >= 5.008_008;
    plan tests => 10;
}

BEGIN { use_ok 'Encode::ZapCP1252' or die; }

use utf8;
my $ascii = q{e , f ,, ... + ++ ^ % S < OE Z ' ' " " * - -- ~ (tm) s > oe z Y};
my $utf8  = q{â‚¬ , Æ’ â€ž â€¦ â€  â€¡ Ë† â€° Å  â€¹ Å’ Å½ â€˜ â€™ â€œ â€ â€¢ â€“ â€” Ëœ â„¢ Å¡ â€º Å“ Å¾ Å¸};

# Test conversion of text decoded from ISO-8859-1.
my $fix_me = Encode::decode(
    'ISO-8859-1',
    join ' ', map { chr } 0x80, 0x82 .. 0x8c, 0x8e, 0x91 .. 0x9c, 0x9e, 0x9f
);

fix_cp1252 $fix_me;
is $fix_me, $utf8, 'Convert decoded from Latin-1 to utf-8';

# Try ascii.
$fix_me = Encode::decode(
    'ISO-8859-1',
    join ' ', map { chr } 0x80, 0x82 .. 0x8c, 0x8e, 0x91 .. 0x9c, 0x9e, 0x9f
);
zap_cp1252 $fix_me;
is $fix_me, $ascii, 'Convert decoded from Latin-1 to ascii';

# Test conversion with utf8 bit flipped.
$fix_me = join ' ', map { chr } 0x80, 0x82 .. 0x8c, 0x8e, 0x91 .. 0x9c, 0x9e, 0x9f;
Encode::_utf8_on($fix_me);
fix_cp1252 $fix_me;
is $fix_me, $utf8, 'Convert utf8-bit-flipped to utf-8';

# Try it with ascii.
$fix_me = join ' ', map { chr } 0x80, 0x82 .. 0x8c, 0x8e, 0x91 .. 0x9c, 0x9e, 0x9f;
Encode::_utf8_on($fix_me);
zap_cp1252 $fix_me;
is $fix_me, $ascii, 'Convert utf8-bit-flipped to ascii';

# Test conversion to decoded with modified table.
my $euro = $Encode::ZapCP1252::utf8_for{"\x80"};
$Encode::ZapCP1252::utf8_for{"\x80"} = 'E';
$utf8 =~ s/â‚¬/E/;

$fix_me = Encode::decode(
    'ISO-8859-1',
    join ' ', map { chr } 0x80, 0x82 .. 0x8c, 0x8e, 0x91 .. 0x9c, 0x9e, 0x9f
);

fix_cp1252 $fix_me;
is $fix_me, $utf8, 'Convert decoded from Latin-1 with modified table';

# Test it with the valid use of one of the gremlins (Ï€ is [0xcf,0x80]) in UTF-8.
is fix_cp1252 'Ï€', 'Ï€', 'Should not convert valid use of 0x80';
is zap_cp1252 'Ï€', 'Ï€', 'Should not zap valid use of 0x80';

# But it should convert it if it's not UTF-8.
my $utf8_euro = Encode::encode_utf8($euro);
$Encode::ZapCP1252::utf8_for{"\x80"} = $utf8_euro;
is fix_cp1252 "\xCF\x80", "\xCF" . $utf8_euro,
    'Should convert 0x80 when not parsing UTF-8';

( run in 0.248 second using v1.01-cache-2.11-cpan-0d8aa00de5b )