Unicode-LineBreak

 view release on metacpan or  search on metacpan

lib/Unicode/LineBreak.pod  view on Meta::CPAN

Note that modification in the context of C<"sot">, C<"sop"> or C<"sol"> may
affect decision of successive breaking positions while in the others won't.

B<Note>:
String arguments are actually sequences of grapheme clusters.
See L<Unicode::GCString>.

For example, following code folds lines removing trailing spaces:

    sub fmt {
        if ($_[1] =~ /^eo/) {
            return "\n";
        }
        return undef;
    }
    my $lb = Unicode::LineBreak->new(Format => \&fmt);
    $output = $lb->break($text);

=head2 User-Defined Breaking Behaviors

When a line generated by arbitrary break is expected to be beyond measure of
either CharMax, ColMax or ColMin, B<urgent break> may be
performed on successive string.
If you specify subroutine reference as a value of L</Urgent> option,
it should accept two arguments:

    @BROKEN = &subroutine(SELF, STR);

SELF is a Unicode::LineBreak object and STR is a Unicode string to be broken.

Subroutine should return an array of broken string STR.

B<Note>:
String argument is actually a sequence of grapheme clusters.
See L<Unicode::GCString>.

For example, following code inserts hyphen to the name of several chemical substances (such as Titin) so that it may be folded:

    sub hyphenize {
        return map {$_ =~ s/yl$/yl-/; $_} split /(\w+?yl(?=\w))/, $_[1];
    }
    my $lb = Unicode::LineBreak->new(Urgent => \&hyphenize);
    $output = $lb->break("Methionylthreonylthreonylglutaminylarginyl...");

If you specify [REGEX, SUBREF] array reference as any of L</Prep> option,
subroutine should accept two arguments:

    @BROKEN = &subroutine(SELF, STR);

SELF is a Unicode::LineBreak object and
STR is a Unicode string matched with REGEX.

Subroutine should return an array of broken string STR.

For example, following code will break HTTP URLs using [CMOS] rule.

    my $url = qr{http://[\x21-\x7E]+}i;
    sub breakurl {
        my $self = shift;
        my $str = shift;
        return split m{(?<=[/]) (?=[^/]) |
                       (?<=[^-.]) (?=[-~.,_?\#%=&]) |
                       (?<=[=&]) (?=.)}x, $str;
    }
    my $lb = Unicode::LineBreak->new(Prep => [$url, \&breakurl]);
    $output = $lb->break($string);

=head3 Preserving State

Unicode::LineBreak object can behave as hash reference.
Any items may be preserved throughout its life.

For example, following code will separate paragraphs with empty lines.

    sub paraformat {
        my $self = shift;
        my $action = shift;
        my $str = shift;
        
        if ($action eq 'sot' or $action eq 'sop') {
            $self->{'line'} = '';
        } elsif ($action eq '') {
            $self->{'line'} = $str;
        } elsif ($action eq 'eol') {
            return "\n";
        } elsif ($action eq 'eop') {
            if (length $self->{'line'}) {
                return "\n\n";
            } else {
                return "\n";
            }
        } elsif ($action eq 'eot') {
            return "\n";
        }
        return undef;
    }
    my $lb = Unicode::LineBreak->new(Format => \&paraformat);
    $output = $lb->break($string);

=head2 Calculating String Size

If you specify subroutine reference as a value of L</Sizing> option,
it will be called with five arguments:

    $COLS = &subroutine(SELF, LEN, PRE, SPC, STR);

SELF is a Unicode::LineBreak object, LEN is size of preceding string,
PRE is preceding Unicode string, SPC is additional SPACEs and STR is a
Unicode string to be processed.

Subroutine should return calculated number of columns of C<PRE.SPC.STR>.
The number of columns may not be an integer: Unit of the number may be freely chosen, however, it should be same as those of L</ColMin> and L</ColMax> option.

B<Note>:
String arguments are actually sequences of grapheme clusters.
See L<Unicode::GCString>.

For example, following code processes lines with tab stops by each eight columns.

    sub tabbedsizing {
        my ($self, $cols, $pre, $spc, $str) = @_;



( run in 0.557 second using v1.01-cache-2.11-cpan-71847e10f99 )