ODF-lpOD_Helper
view release on metacpan or search on metacpan
lib/ODF/lpOD_Helper/Unicode.pod view on Meta::CPAN
=head1 NAME
ODF::lpOD_Helper::Unicode - Unicode and ODF::lpOD (and XML::Twig)
=head1 SYNOPSIS
use ODF::lpOD;
use ODF::lpOD_Helper;
use feature 'unicode_strings';
=head1 INTRODUCTION
We once thought Unicode forced us
to fiddle with bytes to handle "international" characters.
That thinking came from low-level languages like C.
Perl saved us but it took years before everyone believed.
And more years before Perl's Unicode paradigm clearly emerged.
Meanwhile lots of pod and code was written which in hindsight
was confused or misleading.
=head1 THE PERL UNICODE PARADIGM
=over
=item 1.
"B<Decode>" input from binary into Perl characters as soon as possible
after getting it from the outside world (e.g. from a disk file).
=item 2.
As much of the application as possible works with Perl characters,
I<paying absolutely no attention to encoding>.
=item 3.
"B<Encode>" Perl characters into binary data as late as possible,
just before sending the data out.
=back
See "Not always so tidy" below for more discussion.
=head1 ODF::lpOD
For historical reasons
ODF::lpOD by default is incompatible with the above paradigm
because every method encodes result strings into UTF-8
before returning them to you, and attempts to decode strings you pass in before
using them. Therefore you must work with binary octets rather than
abstract characters;
Regex match, substr(), length(), and comparison with string literals
do not work with non-ASCII/latin1 characters.
Also, you can't print such already-encoded binary to STDOUT if that
file handle auto-encodes because the data will be encoded twice.
B<< use ODF::lpOD_Helper >>
disables ODF::lpOD's internal encoding and decoding.
Methods then speak and listen in characters, not octets.
You should also B<< use feature 'unicode_strings'; >> to
avoid problematic aspects of legacy Perl behavior.
=head1 IF LEGACY BEHAVIOR IS NEEDED
The import tag B<:bytes> will tell ODF::lpOD_Helper to not disable
internal encoding & decoding, i.e. retain the original ODF::lpOD behavior.
It is also possible to toggle between the old and new behavior at
run-time:
I<< lpod->Huse_octet_strings() >> will will re-enable implicit
decoding/encoding of method arguments (using UTF-8 encoding by default)
and I<< lpod->Huse_character_strings() >> will disable the old behavior
and restore transparent Unicode support.
Prior to version 6.000 character mode was not the default, but
required a B<:chars> import tag. That tag is now deprecated and
produces a warning.
( run in 0.547 second using v1.01-cache-2.11-cpan-71847e10f99 )