Unicode-LineBreak
view release on metacpan or search on metacpan
lib/POD2/JA/Unicode/LineBreak.pod view on Meta::CPAN
SELF 㯠Unicode::LineBreak ãªãã¸ã§ã¯ããSTR 㯠REGEX ã«ãããããåå²ãã¹ã Unicode æååã
ãµãã«ã¼ãã³ã¯ãæåå STR ãåå²ããçµæã®é
åãè¿ããªããã°ãªããªãã
ãã¨ãã°æ¬¡ã®ã³ã¼ãã¯ãHTTP URL ã [CMOS] ã®è¦åãç¨ãã¦åå²ããã
my $url = qr{http://[\x21-\x7E]+}i;
sub breakurl {
my $self = shift;
my $str = shift;
return split m{(?<=[/]) (?=[^/]) |
(?<=[^-.]) (?=[-~.,_?\#%=&]) |
(?<=[=&]) (?=.)}x, $str;
}
my $lb = Unicode::LineBreak->new(Prep => [$url, \&breakurl]);
$output = $lb->break($string);
=head3 ç¶æ
ã®ä¿å
Unicode::LineBreak ãªãã¸ã§ã¯ãã¯ããã·ã¥åç
§ã¨ãã¦ãµãã¾ãã
ä»»æã®è¦ç´ ãããªãã¸ã§ã¯ãã®å卿éä¸ä¿åã§ããã
lib/Unicode/LineBreak.pod view on Meta::CPAN
STR is a Unicode string matched with REGEX.
Subroutine should return an array of broken string STR.
For example, following code will break HTTP URLs using [CMOS] rule.
my $url = qr{http://[\x21-\x7E]+}i;
sub breakurl {
my $self = shift;
my $str = shift;
return split m{(?<=[/]) (?=[^/]) |
(?<=[^-.]) (?=[-~.,_?\#%=&]) |
(?<=[=&]) (?=.)}x, $str;
}
my $lb = Unicode::LineBreak->new(Prep => [$url, \&breakurl]);
$output = $lb->break($string);
=head3 Preserving State
Unicode::LineBreak object can behave as hash reference.
Any items may be preserved throughout its life.
t/16regex.t view on Meta::CPAN
# Breaking URIs according to some CMoS rules.
sub breakURI {
# 17.11 1.1: [/] ÷ [^/]
# 17.11 2: [-] Ã
# 6.17 2: [.] Ã
# 17.11 1.2: ÷ [-~.,_?#%]
# 17.11 1.3: ÷ [=&]
# 17.11 1.3: [=&] ÷
# Default: ALL Ã ALL
my @c = split m{$splitre}, $_[1];
# Won't break punctuations at end of matches.
while (2 <= scalar @c and $c[$#c] =~ /^[\".:;,>]+$/) {
my $c = pop @c;
$c[$#c] .= $c;
}
@c;
}
# [REGEX, SUB] pair
dotest('uri', 'uri.break', ColumnsMax => 1,
( run in 1.298 second using v1.01-cache-2.11-cpan-71847e10f99 )