Unicode-LineBreak

 view release on metacpan or  search on metacpan

lib/POD2/JA/Unicode/LineBreak.pod  view on Meta::CPAN

SELF は Unicode::LineBreak オブジェクト、STR は REGEX にマッチする分割すべき Unicode 文字列。

サブルーチンは、文字列 STR を分割した結果の配列を返さなければならない。

たとえば次のコードは、HTTP URL を [CMOS] の規則を用いて分割する。

    my $url = qr{http://[\x21-\x7E]+}i;
    sub breakurl {
        my $self = shift;
        my $str = shift;
        return split m{(?<=[/]) (?=[^/]) |
                       (?<=[^-.]) (?=[-~.,_?\#%=&]) |
        	       (?<=[=&]) (?=.)}x, $str;
    }
    my $lb = Unicode::LineBreak->new(Prep => [$url, \&breakurl]);
    $output = $lb->break($string);

=head3 状態の保存

Unicode::LineBreak オブジェクトはハッシュ参照としてふるまう。
任意の要素を、オブジェクトの存在期間中保存できる。

lib/Unicode/LineBreak.pod  view on Meta::CPAN

STR is a Unicode string matched with REGEX.

Subroutine should return an array of broken string STR.

For example, following code will break HTTP URLs using [CMOS] rule.

    my $url = qr{http://[\x21-\x7E]+}i;
    sub breakurl {
        my $self = shift;
        my $str = shift;
        return split m{(?<=[/]) (?=[^/]) |
                       (?<=[^-.]) (?=[-~.,_?\#%=&]) |
                       (?<=[=&]) (?=.)}x, $str;
    }
    my $lb = Unicode::LineBreak->new(Prep => [$url, \&breakurl]);
    $output = $lb->break($string);

=head3 Preserving State

Unicode::LineBreak object can behave as hash reference.
Any items may be preserved throughout its life.

t/16regex.t  view on Meta::CPAN


# Breaking URIs according to some CMoS rules.
sub breakURI {
    # 17.11 1.1: [/] ÷ [^/]
    # 17.11 2:   [-] ×
    # 6.17 2:   [.] ×
    # 17.11 1.2: ÷ [-~.,_?#%]
    # 17.11 1.3: ÷ [=&]
    # 17.11 1.3: [=&] ÷
    # Default:  ALL × ALL
    my @c = split m{$splitre}, $_[1];
    # Won't break punctuations at end of matches.
    while (2 <= scalar @c and $c[$#c] =~ /^[\".:;,>]+$/) {
	my $c = pop @c;
	$c[$#c] .= $c;
    }
    @c;
}

# [REGEX, SUB] pair
dotest('uri', 'uri.break', ColumnsMax => 1,



( run in 1.298 second using v1.01-cache-2.11-cpan-71847e10f99 )