CCCP-Encode
view release on metacpan or search on metacpan
lib/CCCP/Encode.pm view on Meta::CPAN
print $data;
# output:
# еÑли в Ñлове 'Ñ
леб' поменÑÑÑ 4 бÑквÑ, Ñо полÑÑиÑÑÑ ? ÐÐÐÐ
Method C<from_to> from module C<Encode> replace uncnown character on 'B<?>'. This data go to save in your database.
And you write a guano-magic code for fixing this problem.
All developers, who have database not in utf, known about this problem.
And another case:
Getting data from rss-channels in utf-8 and saving in C<cyrillic> database
(for example mysql with default charset C<koi8-r> or C<windows-1251>).
B<CCCP::Encode> fix this problem.
=head2 METHODS
=head3 utf2cyrillic($str,$to)
C<$str> target string. C<$to> encoding name, analogue C<$to> in C<Encode::from_to($str,'utf-8',$to)>
=head2 PACKAGE VARIABLES
=head3 $CCCP::Encode::Entities
Ignored if $CCCP::Encode::ToText is true.
Default value 'xml'.
'xml' mode - replace all uncnown character in traget charset to valid xml numeric entities (i.e. —).
'html' mode - replace all uncnown character in traget charset to html numeric entities (i.e. —).
=head3 $CCCP::Encode::ToText
Default is false.
If C<$CCCP::Encode::ToText> is false, when C<utf2cyrillic>
return decode string whis replace uncnown character from you definition (see C<$CCCP::Encode::CharMap>)
or html entities from C<HTML::Entities>.
If C<$CCCP::Encode::ToText> is true, when C<utf2cyrillic>
return decode string in plain/text format whis replace uncnown character from you definition (see C<$CCCP::Encode::CharMap>)
or used C<Text::Unidecode>.
=head3 $CCCP::Encode::CharMap
Default is empty hashref.
You can custom define map for any characters.
This is wery flexible if you need custom replace (different of C<HTML::Entities> or C<Text::Unidecode>).
Example:
$CCCP::Encode::CharMap = {
"\x{2014}" => '-',
"\x{2015}" => 'foo'
};
=head3 $CCCP::Encode::Regexp
By default value is C<[^\p{Cyrillic}|\p{IsLatin}|\p{InBasic_Latin}]> - replace any character which not in Cyrillic or Latin map exist.
You can override this expression.
See more on C<http://www.regular-expressions.info/unicode.html>
=head1 OVERHEAD
CCCP::Encode with $CCCP::Encode::Entities eq "html":
2 wallclock secs ( 1.63 usr + 0.01 sys = 1.64 CPU) @ 60975.61/s (n=100000)
CCCP::Encode with $CCCP::Encode::Entities eq "xml":
3 wallclock secs ( 2.49 usr + 0.00 sys = 2.49 CPU) @ 40160.64/s (n=100000)
CCCP::Encode with $CCCP::Encode::ToText eq "1":
4 wallclock secs ( 3.85 usr + 0.02 sys = 3.87 CPU) @ 25839.79/s (n=100000)
Encode::from_to(...) :
2 wallclock secs ( 1.93 usr + 0.01 sys = 1.94 CPU) @ 51546.39/s (n=100000)
=head1 SEE ALSO
=over 4
=item *
C<Encode>
=item *
C<Text::Unidecode>
=back
=head1 AUTHOR
Ivan Sivirinov
=cut
( run in 0.960 second using v1.01-cache-2.11-cpan-39bf76dae61 )