Big5
view release on metacpan or search on metacpan
lib/Big5.pm view on Meta::CPAN
ex. Japanese Katakana "SO" like [ `/ ] code is "\x83\x5C" in SJIS
see hex dump
-----------------------------------------
source script "`/" [83 5c]
-----------------------------------------
Here, use SJIS;
hex dump
-----------------------------------------
escaped script "`\/" [83 [5c] 5c]
-----------------------------------------
^--- escape by SJIS software
by the by see hex dump
-----------------------------------------
your eye's "`/\" [83 5c] [5c]
-----------------------------------------
perl eye's "`\/" [83] \[5c]
-----------------------------------------
hex dump
-----------------------------------------
in the perl "`/" [83] [5c]
-----------------------------------------
=head1 Multiple-Octet Anchoring of Regular Expression (Big5 software provides)
Big5 software applies multiple-octet anchoring at beginning of regular expression.
--------------------------------------------------------------------------------
Before After
--------------------------------------------------------------------------------
m/regexp/ m/${Ebig5::anchor}(?:regexp).../
--------------------------------------------------------------------------------
=head1 Escaping Second Octet (Big5 software provides)
Big5 software escapes second octet of multiple-octet character in regular
expression.
--------------------------------------------------------------------------------
Before After
--------------------------------------------------------------------------------
m<...`/...> m<...`/\...>
--------------------------------------------------------------------------------
=head1 Multiple-Octet Character Regular Expression (Big5 software provides)
Big5 software clusters multiple-octet character with quantifier, makes cluster from
multiple-octet custom character classes. And makes multiple-octet version metasymbol
from classic Perl character class shortcuts and POSIX-style character classes.
--------------------------------------------------------------------------------
Before After
--------------------------------------------------------------------------------
m/...MULTIOCT+.../ m/...(?:MULTIOCT)+.../
m/...[AN-EM].../ m/...(?:A[N-Z]|[B-D][A-Z]|E[A-M]).../
m/...\D.../ m/...(?:${Ebig5::eD}).../
m/...[[:^digit:]].../ m/...(?:${Ebig5::not_digit}).../
--------------------------------------------------------------------------------
=head1 Calling 'Ebig5::ignorecase()' (Big5 software provides)
Big5 software applies calling 'Ebig5::ignorecase()' instead of /i modifier.
--------------------------------------------------------------------------------
Before After
--------------------------------------------------------------------------------
m/...$var.../i m/...@{[Ebig5::ignorecase($var)]}.../
--------------------------------------------------------------------------------
=head1 Character-Oriented Regular Expression
Regular expression works as character-oriented that has no /b modifier.
--------------------------------------------------------------------------------
Before After
--------------------------------------------------------------------------------
/regexp/ /ditto$Ebig5::matched/
m/regexp/ m/ditto$Ebig5::matched/
?regexp? m?ditto$Ebig5::matched?
m?regexp? m?ditto$Ebig5::matched?
$_ =~ ($_ =~ m/ditto$Ebig5::matched/) ?
s/regexp/replacement/ CORE::eval{ Ebig5::s_matched(); local $^W=0; my $__r=qq/replacement/; $_="${1}$__r$'"; 1 } :
undef
$_ !~ ($_ !~ m/ditto$Ebig5::matched/) ?
s/regexp/replacement/ 1 :
CORE::eval{ Ebig5::s_matched(); local $^W=0; my $__r=qq/replacement/; $_="${1}$__r$'"; undef }
split(/regexp/) Ebig5::split(qr/regexp/)
split(m/regexp/) Ebig5::split(qr/regexp/)
split(qr/regexp/) Ebig5::split(qr/regexp/)
qr/regexp/ qr/ditto$Ebig5::matched/
--------------------------------------------------------------------------------
=head1 Byte-Oriented Regular Expression
Regular expression works as byte-oriented that has /b modifier.
--------------------------------------------------------------------------------
Before After
--------------------------------------------------------------------------------
/regexp/b /(?:regexp)$Ebig5::matched/
m/regexp/b m/(?:regexp)$Ebig5::matched/
?regexp?b m?regexp$Ebig5::matched?
m?regexp?b m?regexp$Ebig5::matched?
$_ =~ ($_ =~ m/(\G[\x00-\xFF]*?)(?:regexp)$Ebig5::matched/) ?
s/regexp/replacement/b CORE::eval{ Ebig5::s_matched(); local $^W=0; my $__r=qq/replacement/; $_="${1}$__r$'"; 1 } :
undef
$_ !~ ($_ !~ m/(\G[\x00-\xFF]*?)(?:regexp)$Ebig5::matched/) ?
s/regexp/replacement/b 1 :
CORE::eval{ Ebig5::s_matched(); local $^W=0; my $__r=qq/replacement/; $_="${1}$__r$'"; undef }
split(/regexp/b) split(qr/regexp/)
split(m/regexp/b) split(qr/regexp/)
split(qr/regexp/b) split(qr/regexp/)
qr/regexp/b qr/(?:regexp)$Ebig5::matched/
--------------------------------------------------------------------------------
=head1 Escaping Character Classes (Ebig5.pm provides)
The character classes are redefined as follows to backward compatibility.
---------------------------------------------------------------
Before After
---------------------------------------------------------------
. ${Ebig5::dot}
${Ebig5::dot_s} (/s modifier)
\d [0-9] (universally)
\s \s
\w [0-9A-Z_a-z] (universally)
\D ${Ebig5::eD}
\S ${Ebig5::eS}
\W ${Ebig5::eW}
\h [\x09\x20]
\v [\x0A\x0B\x0C\x0D]
\H ${Ebig5::eH}
\V ${Ebig5::eV}
\C [\x00-\xFF]
\X X (so, just 'X')
\R ${Ebig5::eR}
\N ${Ebig5::eN}
---------------------------------------------------------------
Also POSIX-style character classes.
---------------------------------------------------------------
Before After
---------------------------------------------------------------
[:alnum:] [\x30-\x39\x41-\x5A\x61-\x7A]
[:alpha:] [\x41-\x5A\x61-\x7A]
[:ascii:] [\x00-\x7F]
[:blank:] [\x09\x20]
[:cntrl:] [\x00-\x1F\x7F]
[:digit:] [\x30-\x39]
[:graph:] [\x21-\x7F]
[:lower:] [\x61-\x7A]
[\x41-\x5A\x61-\x7A] (/i modifier)
[:print:] [\x20-\x7F]
[:punct:] [\x21-\x2F\x3A-\x3F\x40\x5B-\x5F\x60\x7B-\x7E]
[:space:] [\s\x0B]
[:upper:] [\x41-\x5A]
[\x41-\x5A\x61-\x7A] (/i modifier)
[:word:] [\x30-\x39\x41-\x5A\x5F\x61-\x7A]
[:xdigit:] [\x30-\x39\x41-\x46\x61-\x66]
[:^alnum:] ${Ebig5::not_alnum}
[:^alpha:] ${Ebig5::not_alpha}
[:^ascii:] ${Ebig5::not_ascii}
[:^blank:] ${Ebig5::not_blank}
[:^cntrl:] ${Ebig5::not_cntrl}
[:^digit:] ${Ebig5::not_digit}
[:^graph:] ${Ebig5::not_graph}
[:^lower:] ${Ebig5::not_lower}
${Ebig5::not_lower_i} (/i modifier)
[:^print:] ${Ebig5::not_print}
[:^punct:] ${Ebig5::not_punct}
[:^space:] ${Ebig5::not_space}
[:^upper:] ${Ebig5::not_upper}
${Ebig5::not_upper_i} (/i modifier)
[:^word:] ${Ebig5::not_word}
[:^xdigit:] ${Ebig5::not_xdigit}
---------------------------------------------------------------
\b and \B are redefined as follows to backward compatibility.
---------------------------------------------------------------
Before After
---------------------------------------------------------------
\b ${Ebig5::eb}
\B ${Ebig5::eB}
---------------------------------------------------------------
Definitions in Ebig5.pm.
---------------------------------------------------------------------------------------------------------------------------------------------------------
After Definition
---------------------------------------------------------------------------------------------------------------------------------------------------------
${Ebig5::anchor} qr{\G(?>[^\x81-\xFE]|[\x81-\xFE][\x00-\xFF])*?}
for over 32766 octets string on ActivePerl5.6 and Perl5.10 or later
qr{\G(?(?=.{0,32766}\z)\G(?>[^\x81-\xFE]|[\x81-\xFE][\x00-\xFF])*?|(?(?=[$sbcs]+\z).*?|(?:.*?[$sbcs](?:[^$sbcs][^$sbcs])*?)))}oxms
${Ebig5::dot} qr{(?>[^\x81-\xFE\x0A]|[\x81-\xFE][\x00-\xFF])};
${Ebig5::dot_s} qr{(?>[^\x81-\xFE]|[\x81-\xFE][\x00-\xFF])};
${Ebig5::eD} qr{(?>[^\x81-\xFE0-9]|[\x81-\xFE][\x00-\xFF])};
${Ebig5::eS} qr{(?>[^\x81-\xFE\s]|[\x81-\xFE][\x00-\xFF])};
${Ebig5::eW} qr{(?>[^\x81-\xFE0-9A-Z_a-z]|[\x81-\xFE][\x00-\xFF])};
${Ebig5::eH} qr{(?>[^\x81-\xFE\x09\x20]|[\x81-\xFE][\x00-\xFF])};
${Ebig5::eV} qr{(?>[^\x81-\xFE\x0A\x0B\x0C\x0D]|[\x81-\xFE][\x00-\xFF])};
${Ebig5::eR} qr{(?>\x0D\x0A|[\x0A\x0D])};
${Ebig5::eN} qr{(?>[^\x81-\xFE\x0A]|[\x81-\xFE][\x00-\xFF])};
${Ebig5::not_alnum} qr{(?>[^\x81-\xFE\x30-\x39\x41-\x5A\x61-\x7A]|[\x81-\xFE][\x00-\xFF])};
${Ebig5::not_alpha} qr{(?>[^\x81-\xFE\x41-\x5A\x61-\x7A]|[\x81-\xFE][\x00-\xFF])};
${Ebig5::not_ascii} qr{(?>[^\x81-\xFE\x00-\x7F]|[\x81-\xFE][\x00-\xFF])};
${Ebig5::not_blank} qr{(?>[^\x81-\xFE\x09\x20]|[\x81-\xFE][\x00-\xFF])};
${Ebig5::not_cntrl} qr{(?>[^\x81-\xFE\x00-\x1F\x7F]|[\x81-\xFE][\x00-\xFF])};
${Ebig5::not_digit} qr{(?>[^\x81-\xFE\x30-\x39]|[\x81-\xFE][\x00-\xFF])};
${Ebig5::not_graph} qr{(?>[^\x81-\xFE\x21-\x7F]|[\x81-\xFE][\x00-\xFF])};
${Ebig5::not_lower} qr{(?>[^\x81-\xFE\x61-\x7A]|[\x81-\xFE][\x00-\xFF])};
${Ebig5::not_lower_i} qr{(?>[^\x81-\xFE\x41-\x5A\x61-\x7A]|[\x81-\xFE][\x00-\xFF])}; # Perl 5.16 compatible
# ${Ebig5::not_lower_i} qr{(?>[^\x81-\xFE]|[\x81-\xFE][\x00-\xFF])}; # older Perl compatible
${Ebig5::not_print} qr{(?>[^\x81-\xFE\x20-\x7F]|[\x81-\xFE][\x00-\xFF])};
${Ebig5::not_punct} qr{(?>[^\x81-\xFE\x21-\x2F\x3A-\x3F\x40\x5B-\x5F\x60\x7B-\x7E]|[\x81-\xFE][\x00-\xFF])};
${Ebig5::not_space} qr{(?>[^\x81-\xFE\s\x0B]|[\x81-\xFE][\x00-\xFF])};
${Ebig5::not_upper} qr{(?>[^\x81-\xFE\x41-\x5A]|[\x81-\xFE][\x00-\xFF])};
${Ebig5::not_upper_i} qr{(?>[^\x81-\xFE\x41-\x5A\x61-\x7A]|[\x81-\xFE][\x00-\xFF])}; # Perl 5.16 compatible
# ${Ebig5::not_upper_i} qr{(?>[^\x81-\xFE]|[\x81-\xFE][\x00-\xFF])}; # older Perl compatible
${Ebig5::not_word} qr{(?>[^\x81-\xFE\x30-\x39\x41-\x5A\x5F\x61-\x7A]|[\x81-\xFE][\x00-\xFF])};
${Ebig5::not_xdigit} qr{(?>[^\x81-\xFE\x30-\x39\x41-\x46\x61-\x66]|[\x81-\xFE][\x00-\xFF])};
# This solution is not perfect. I beg better solution from you who are reading this.
${Ebig5::eb} qr{(?:\A(?=[0-9A-Z_a-z])|(?<=[\x00-\x2F\x40\x5B-\x5E\x60\x7B-\xFF])(?=[0-9A-Z_a-z])|(?<=[0-9A-Z_a-z])(?=[\x00-\x2F\x40\x5B-\x5E\x60\x7B-\xFF]|\z))};
${Ebig5::eB} qr{(?:(?<=[0-9A-Z_a-z])(?=[0-9A-Z_a-z])|(?<=[\x00-\x2F\x40\x5B-\x5E\x60\x7B-\xFF])(?=[\x00-\x2F\x40\x5B-\x5E\x60\x7B-\xFF]))};
---------------------------------------------------------------------------------------------------------------------------------------------------------
=head1 Un-Escaping \ of \b{}, \B{}, \N{}, \p{}, \P{}, and \X (Big5 software provides)
Big5 software removes '\' at head of alphanumeric regexp metasymbols \b{}, \B{},
\N{}, \p{}, \P{} and \X. By this method, you can avoid the trap of the abstraction.
See also,
Deprecate literal unescaped "{" in regexes.
http://perl5.git.perl.org/perl.git/commit/2a53d3314d380af5ab5283758219417c6dfa36e9
------------------------------------
Before After
------------------------------------
\b{...} b\{...}
\B{...} B\{...}
\N{CHARNAME} N\{CHARNAME}
\p{L} p\{L}
\p{^L} p\{^L}
\p{\^L} p\{\^L}
\pL pL
\P{L} P\{L}
\P{^L} P\{^L}
\P{\^L} P\{\^L}
\PL PL
\X X
------------------------------------
=head1 Escaping Built-in Functions (Big5 software provides)
Insert 'Ebig5::' at head of function name. Ebig5.pm provides your script Ebig5::*
subroutines.
-------------------------------------------
Before After Works as
-------------------------------------------
length length Byte
substr substr Byte
pos pos Byte
split Ebig5::split Character
tr/// Ebig5::tr Character
tr///b tr/// Byte
tr///B tr/// Byte
y/// Ebig5::tr Character
y///b tr/// Byte
y///B tr/// Byte
chop Ebig5::chop Character
index Ebig5::index Character
rindex Ebig5::rindex Character
lc Ebig5::lc Character
lcfirst Ebig5::lcfirst Character
uc Ebig5::uc Character
ucfirst Ebig5::ucfirst Character
fc Ebig5::fc Character
chr Ebig5::chr Character
glob Ebig5::glob Character
( run in 0.730 second using v1.01-cache-2.11-cpan-d8267643d1d )