Big5HKSCS
view release on metacpan or search on metacpan
lib/Big5HKSCS.pm view on Meta::CPAN
ex. Japanese Katakana "SO" like [ `/ ] code is "\x83\x5C" in SJIS
see hex dump
-----------------------------------------
source script "`/" [83 5c]
-----------------------------------------
Here, use SJIS;
hex dump
-----------------------------------------
escaped script "`\/" [83 [5c] 5c]
-----------------------------------------
^--- escape by SJIS software
by the by see hex dump
-----------------------------------------
your eye's "`/\" [83 5c] [5c]
-----------------------------------------
perl eye's "`\/" [83] \[5c]
-----------------------------------------
hex dump
-----------------------------------------
in the perl "`/" [83] [5c]
-----------------------------------------
=head1 Multiple-Octet Anchoring of Regular Expression (Big5HKSCS software provides)
Big5HKSCS software applies multiple-octet anchoring at beginning of regular expression.
--------------------------------------------------------------------------------
Before After
--------------------------------------------------------------------------------
m/regexp/ m/${Ebig5hkscs::anchor}(?:regexp).../
--------------------------------------------------------------------------------
=head1 Escaping Second Octet (Big5HKSCS software provides)
Big5HKSCS software escapes second octet of multiple-octet character in regular
expression.
--------------------------------------------------------------------------------
Before After
--------------------------------------------------------------------------------
m<...`/...> m<...`/\...>
--------------------------------------------------------------------------------
=head1 Multiple-Octet Character Regular Expression (Big5HKSCS software provides)
Big5HKSCS software clusters multiple-octet character with quantifier, makes cluster from
multiple-octet custom character classes. And makes multiple-octet version metasymbol
from classic Perl character class shortcuts and POSIX-style character classes.
--------------------------------------------------------------------------------
Before After
--------------------------------------------------------------------------------
m/...MULTIOCT+.../ m/...(?:MULTIOCT)+.../
m/...[AN-EM].../ m/...(?:A[N-Z]|[B-D][A-Z]|E[A-M]).../
m/...\D.../ m/...(?:${Ebig5hkscs::eD}).../
m/...[[:^digit:]].../ m/...(?:${Ebig5hkscs::not_digit}).../
--------------------------------------------------------------------------------
=head1 Calling 'Ebig5hkscs::ignorecase()' (Big5HKSCS software provides)
Big5HKSCS software applies calling 'Ebig5hkscs::ignorecase()' instead of /i modifier.
--------------------------------------------------------------------------------
Before After
--------------------------------------------------------------------------------
m/...$var.../i m/...@{[Ebig5hkscs::ignorecase($var)]}.../
--------------------------------------------------------------------------------
=head1 Character-Oriented Regular Expression
Regular expression works as character-oriented that has no /b modifier.
--------------------------------------------------------------------------------
Before After
--------------------------------------------------------------------------------
/regexp/ /ditto$Ebig5hkscs::matched/
m/regexp/ m/ditto$Ebig5hkscs::matched/
?regexp? m?ditto$Ebig5hkscs::matched?
m?regexp? m?ditto$Ebig5hkscs::matched?
$_ =~ ($_ =~ m/ditto$Ebig5hkscs::matched/) ?
s/regexp/replacement/ CORE::eval{ Ebig5hkscs::s_matched(); local $^W=0; my $__r=qq/replacement/; $_="${1}$__r$'"; 1 } :
undef
$_ !~ ($_ !~ m/ditto$Ebig5hkscs::matched/) ?
s/regexp/replacement/ 1 :
CORE::eval{ Ebig5hkscs::s_matched(); local $^W=0; my $__r=qq/replacement/; $_="${1}$__r$'"; undef }
split(/regexp/) Ebig5hkscs::split(qr/regexp/)
split(m/regexp/) Ebig5hkscs::split(qr/regexp/)
split(qr/regexp/) Ebig5hkscs::split(qr/regexp/)
qr/regexp/ qr/ditto$Ebig5hkscs::matched/
--------------------------------------------------------------------------------
=head1 Byte-Oriented Regular Expression
Regular expression works as byte-oriented that has /b modifier.
--------------------------------------------------------------------------------
Before After
--------------------------------------------------------------------------------
/regexp/b /(?:regexp)$Ebig5hkscs::matched/
m/regexp/b m/(?:regexp)$Ebig5hkscs::matched/
?regexp?b m?regexp$Ebig5hkscs::matched?
m?regexp?b m?regexp$Ebig5hkscs::matched?
$_ =~ ($_ =~ m/(\G[\x00-\xFF]*?)(?:regexp)$Ebig5hkscs::matched/) ?
s/regexp/replacement/b CORE::eval{ Ebig5hkscs::s_matched(); local $^W=0; my $__r=qq/replacement/; $_="${1}$__r$'"; 1 } :
undef
$_ !~ ($_ !~ m/(\G[\x00-\xFF]*?)(?:regexp)$Ebig5hkscs::matched/) ?
s/regexp/replacement/b 1 :
CORE::eval{ Ebig5hkscs::s_matched(); local $^W=0; my $__r=qq/replacement/; $_="${1}$__r$'"; undef }
split(/regexp/b) split(qr/regexp/)
split(m/regexp/b) split(qr/regexp/)
split(qr/regexp/b) split(qr/regexp/)
qr/regexp/b qr/(?:regexp)$Ebig5hkscs::matched/
--------------------------------------------------------------------------------
=head1 Escaping Character Classes (Ebig5hkscs.pm provides)
The character classes are redefined as follows to backward compatibility.
---------------------------------------------------------------
Before After
---------------------------------------------------------------
. ${Ebig5hkscs::dot}
${Ebig5hkscs::dot_s} (/s modifier)
\d [0-9] (universally)
\s \s
\w [0-9A-Z_a-z] (universally)
\D ${Ebig5hkscs::eD}
\S ${Ebig5hkscs::eS}
\W ${Ebig5hkscs::eW}
\h [\x09\x20]
\v [\x0A\x0B\x0C\x0D]
\H ${Ebig5hkscs::eH}
\V ${Ebig5hkscs::eV}
\C [\x00-\xFF]
\X X (so, just 'X')
\R ${Ebig5hkscs::eR}
\N ${Ebig5hkscs::eN}
---------------------------------------------------------------
Also POSIX-style character classes.
---------------------------------------------------------------
Before After
---------------------------------------------------------------
[:alnum:] [\x30-\x39\x41-\x5A\x61-\x7A]
[:alpha:] [\x41-\x5A\x61-\x7A]
[:ascii:] [\x00-\x7F]
[:blank:] [\x09\x20]
[:cntrl:] [\x00-\x1F\x7F]
[:digit:] [\x30-\x39]
[:graph:] [\x21-\x7F]
[:lower:] [\x61-\x7A]
[\x41-\x5A\x61-\x7A] (/i modifier)
[:print:] [\x20-\x7F]
[:punct:] [\x21-\x2F\x3A-\x3F\x40\x5B-\x5F\x60\x7B-\x7E]
[:space:] [\s\x0B]
[:upper:] [\x41-\x5A]
[\x41-\x5A\x61-\x7A] (/i modifier)
[:word:] [\x30-\x39\x41-\x5A\x5F\x61-\x7A]
[:xdigit:] [\x30-\x39\x41-\x46\x61-\x66]
[:^alnum:] ${Ebig5hkscs::not_alnum}
[:^alpha:] ${Ebig5hkscs::not_alpha}
[:^ascii:] ${Ebig5hkscs::not_ascii}
[:^blank:] ${Ebig5hkscs::not_blank}
[:^cntrl:] ${Ebig5hkscs::not_cntrl}
[:^digit:] ${Ebig5hkscs::not_digit}
[:^graph:] ${Ebig5hkscs::not_graph}
[:^lower:] ${Ebig5hkscs::not_lower}
${Ebig5hkscs::not_lower_i} (/i modifier)
[:^print:] ${Ebig5hkscs::not_print}
[:^punct:] ${Ebig5hkscs::not_punct}
[:^space:] ${Ebig5hkscs::not_space}
[:^upper:] ${Ebig5hkscs::not_upper}
${Ebig5hkscs::not_upper_i} (/i modifier)
[:^word:] ${Ebig5hkscs::not_word}
[:^xdigit:] ${Ebig5hkscs::not_xdigit}
---------------------------------------------------------------
\b and \B are redefined as follows to backward compatibility.
---------------------------------------------------------------
Before After
---------------------------------------------------------------
\b ${Ebig5hkscs::eb}
\B ${Ebig5hkscs::eB}
---------------------------------------------------------------
Definitions in Ebig5hkscs.pm.
---------------------------------------------------------------------------------------------------------------------------------------------------------
After Definition
---------------------------------------------------------------------------------------------------------------------------------------------------------
${Ebig5hkscs::anchor} qr{\G(?>[^\x81-\xFE]|[\x81-\xFE][\x00-\xFF])*?}
for over 32766 octets string on ActivePerl5.6 and Perl5.10 or later
qr{\G(?(?=.{0,32766}\z)\G(?>[^\x81-\xFE]|[\x81-\xFE][\x00-\xFF])*?|(?(?=[$sbcs]+\z).*?|(?:.*?[$sbcs](?:[^$sbcs][^$sbcs])*?)))}oxms
${Ebig5hkscs::dot} qr{(?>[^\x81-\xFE\x0A]|[\x81-\xFE][\x00-\xFF])};
${Ebig5hkscs::dot_s} qr{(?>[^\x81-\xFE]|[\x81-\xFE][\x00-\xFF])};
${Ebig5hkscs::eD} qr{(?>[^\x81-\xFE0-9]|[\x81-\xFE][\x00-\xFF])};
${Ebig5hkscs::eS} qr{(?>[^\x81-\xFE\s]|[\x81-\xFE][\x00-\xFF])};
${Ebig5hkscs::eW} qr{(?>[^\x81-\xFE0-9A-Z_a-z]|[\x81-\xFE][\x00-\xFF])};
${Ebig5hkscs::eH} qr{(?>[^\x81-\xFE\x09\x20]|[\x81-\xFE][\x00-\xFF])};
${Ebig5hkscs::eV} qr{(?>[^\x81-\xFE\x0A\x0B\x0C\x0D]|[\x81-\xFE][\x00-\xFF])};
${Ebig5hkscs::eR} qr{(?>\x0D\x0A|[\x0A\x0D])};
${Ebig5hkscs::eN} qr{(?>[^\x81-\xFE\x0A]|[\x81-\xFE][\x00-\xFF])};
${Ebig5hkscs::not_alnum} qr{(?>[^\x81-\xFE\x30-\x39\x41-\x5A\x61-\x7A]|[\x81-\xFE][\x00-\xFF])};
${Ebig5hkscs::not_alpha} qr{(?>[^\x81-\xFE\x41-\x5A\x61-\x7A]|[\x81-\xFE][\x00-\xFF])};
${Ebig5hkscs::not_ascii} qr{(?>[^\x81-\xFE\x00-\x7F]|[\x81-\xFE][\x00-\xFF])};
${Ebig5hkscs::not_blank} qr{(?>[^\x81-\xFE\x09\x20]|[\x81-\xFE][\x00-\xFF])};
${Ebig5hkscs::not_cntrl} qr{(?>[^\x81-\xFE\x00-\x1F\x7F]|[\x81-\xFE][\x00-\xFF])};
${Ebig5hkscs::not_digit} qr{(?>[^\x81-\xFE\x30-\x39]|[\x81-\xFE][\x00-\xFF])};
${Ebig5hkscs::not_graph} qr{(?>[^\x81-\xFE\x21-\x7F]|[\x81-\xFE][\x00-\xFF])};
${Ebig5hkscs::not_lower} qr{(?>[^\x81-\xFE\x61-\x7A]|[\x81-\xFE][\x00-\xFF])};
${Ebig5hkscs::not_lower_i} qr{(?>[^\x81-\xFE\x41-\x5A\x61-\x7A]|[\x81-\xFE][\x00-\xFF])}; # Perl 5.16 compatible
# ${Ebig5hkscs::not_lower_i} qr{(?>[^\x81-\xFE]|[\x81-\xFE][\x00-\xFF])}; # older Perl compatible
${Ebig5hkscs::not_print} qr{(?>[^\x81-\xFE\x20-\x7F]|[\x81-\xFE][\x00-\xFF])};
${Ebig5hkscs::not_punct} qr{(?>[^\x81-\xFE\x21-\x2F\x3A-\x3F\x40\x5B-\x5F\x60\x7B-\x7E]|[\x81-\xFE][\x00-\xFF])};
${Ebig5hkscs::not_space} qr{(?>[^\x81-\xFE\s\x0B]|[\x81-\xFE][\x00-\xFF])};
${Ebig5hkscs::not_upper} qr{(?>[^\x81-\xFE\x41-\x5A]|[\x81-\xFE][\x00-\xFF])};
${Ebig5hkscs::not_upper_i} qr{(?>[^\x81-\xFE\x41-\x5A\x61-\x7A]|[\x81-\xFE][\x00-\xFF])}; # Perl 5.16 compatible
# ${Ebig5hkscs::not_upper_i} qr{(?>[^\x81-\xFE]|[\x81-\xFE][\x00-\xFF])}; # older Perl compatible
${Ebig5hkscs::not_word} qr{(?>[^\x81-\xFE\x30-\x39\x41-\x5A\x5F\x61-\x7A]|[\x81-\xFE][\x00-\xFF])};
${Ebig5hkscs::not_xdigit} qr{(?>[^\x81-\xFE\x30-\x39\x41-\x46\x61-\x66]|[\x81-\xFE][\x00-\xFF])};
# This solution is not perfect. I beg better solution from you who are reading this.
${Ebig5hkscs::eb} qr{(?:\A(?=[0-9A-Z_a-z])|(?<=[\x00-\x2F\x40\x5B-\x5E\x60\x7B-\xFF])(?=[0-9A-Z_a-z])|(?<=[0-9A-Z_a-z])(?=[\x00-\x2F\x40\x5B-\x5E\x60\x7B-\xFF]|\z))};
${Ebig5hkscs::eB} qr{(?:(?<=[0-9A-Z_a-z])(?=[0-9A-Z_a-z])|(?<=[\x00-\x2F\x40\x5B-\x5E\x60\x7B-\xFF])(?=[\x00-\x2F\x40\x5B-\x5E\x60\x7B-\xFF]))};
---------------------------------------------------------------------------------------------------------------------------------------------------------
=head1 Un-Escaping \ of \b{}, \B{}, \N{}, \p{}, \P{}, and \X (Big5HKSCS software provides)
Big5HKSCS software removes '\' at head of alphanumeric regexp metasymbols \b{}, \B{},
\N{}, \p{}, \P{} and \X. By this method, you can avoid the trap of the abstraction.
See also,
Deprecate literal unescaped "{" in regexes.
http://perl5.git.perl.org/perl.git/commit/2a53d3314d380af5ab5283758219417c6dfa36e9
------------------------------------
Before After
------------------------------------
\b{...} b\{...}
\B{...} B\{...}
\N{CHARNAME} N\{CHARNAME}
\p{L} p\{L}
\p{^L} p\{^L}
\p{\^L} p\{\^L}
\pL pL
\P{L} P\{L}
\P{^L} P\{^L}
\P{\^L} P\{\^L}
\PL PL
\X X
------------------------------------
=head1 Escaping Built-in Functions (Big5HKSCS software provides)
Insert 'Ebig5hkscs::' at head of function name. Ebig5hkscs.pm provides your script Ebig5hkscs::*
subroutines.
-------------------------------------------
Before After Works as
-------------------------------------------
length length Byte
substr substr Byte
pos pos Byte
split Ebig5hkscs::split Character
tr/// Ebig5hkscs::tr Character
tr///b tr/// Byte
tr///B tr/// Byte
y/// Ebig5hkscs::tr Character
y///b tr/// Byte
y///B tr/// Byte
chop Ebig5hkscs::chop Character
index Ebig5hkscs::index Character
rindex Ebig5hkscs::rindex Character
lc Ebig5hkscs::lc Character
lcfirst Ebig5hkscs::lcfirst Character
uc Ebig5hkscs::uc Character
ucfirst Ebig5hkscs::ucfirst Character
fc Ebig5hkscs::fc Character
chr Ebig5hkscs::chr Character
glob Ebig5hkscs::glob Character
( run in 0.743 second using v1.01-cache-2.11-cpan-5a3173703d6 )