Unicode-LineBreak
view release on metacpan or search on metacpan
lib/Unicode/LineBreak.pod view on Meta::CPAN
Note that modification in the context of C<"sot">, C<"sop"> or C<"sol"> may
affect decision of successive breaking positions while in the others won't.
B<Note>:
String arguments are actually sequences of grapheme clusters.
See L<Unicode::GCString>.
For example, following code folds lines removing trailing spaces:
sub fmt {
if ($_[1] =~ /^eo/) {
return "\n";
}
return undef;
}
my $lb = Unicode::LineBreak->new(Format => \&fmt);
$output = $lb->break($text);
=head2 User-Defined Breaking Behaviors
When a line generated by arbitrary break is expected to be beyond measure of
either CharMax, ColMax or ColMin, B<urgent break> may be
performed on successive string.
If you specify subroutine reference as a value of L</Urgent> option,
it should accept two arguments:
@BROKEN = &subroutine(SELF, STR);
SELF is a Unicode::LineBreak object and STR is a Unicode string to be broken.
Subroutine should return an array of broken string STR.
B<Note>:
String argument is actually a sequence of grapheme clusters.
See L<Unicode::GCString>.
For example, following code inserts hyphen to the name of several chemical substances (such as Titin) so that it may be folded:
sub hyphenize {
return map {$_ =~ s/yl$/yl-/; $_} split /(\w+?yl(?=\w))/, $_[1];
}
my $lb = Unicode::LineBreak->new(Urgent => \&hyphenize);
$output = $lb->break("Methionylthreonylthreonylglutaminylarginyl...");
If you specify [REGEX, SUBREF] array reference as any of L</Prep> option,
subroutine should accept two arguments:
@BROKEN = &subroutine(SELF, STR);
SELF is a Unicode::LineBreak object and
STR is a Unicode string matched with REGEX.
Subroutine should return an array of broken string STR.
For example, following code will break HTTP URLs using [CMOS] rule.
my $url = qr{http://[\x21-\x7E]+}i;
sub breakurl {
my $self = shift;
my $str = shift;
return split m{(?<=[/]) (?=[^/]) |
(?<=[^-.]) (?=[-~.,_?\#%=&]) |
(?<=[=&]) (?=.)}x, $str;
}
my $lb = Unicode::LineBreak->new(Prep => [$url, \&breakurl]);
$output = $lb->break($string);
=head3 Preserving State
Unicode::LineBreak object can behave as hash reference.
Any items may be preserved throughout its life.
For example, following code will separate paragraphs with empty lines.
sub paraformat {
my $self = shift;
my $action = shift;
my $str = shift;
if ($action eq 'sot' or $action eq 'sop') {
$self->{'line'} = '';
} elsif ($action eq '') {
$self->{'line'} = $str;
} elsif ($action eq 'eol') {
return "\n";
} elsif ($action eq 'eop') {
if (length $self->{'line'}) {
return "\n\n";
} else {
return "\n";
}
} elsif ($action eq 'eot') {
return "\n";
}
return undef;
}
my $lb = Unicode::LineBreak->new(Format => \¶format);
$output = $lb->break($string);
=head2 Calculating String Size
If you specify subroutine reference as a value of L</Sizing> option,
it will be called with five arguments:
$COLS = &subroutine(SELF, LEN, PRE, SPC, STR);
SELF is a Unicode::LineBreak object, LEN is size of preceding string,
PRE is preceding Unicode string, SPC is additional SPACEs and STR is a
Unicode string to be processed.
Subroutine should return calculated number of columns of C<PRE.SPC.STR>.
The number of columns may not be an integer: Unit of the number may be freely chosen, however, it should be same as those of L</ColMin> and L</ColMax> option.
B<Note>:
String arguments are actually sequences of grapheme clusters.
See L<Unicode::GCString>.
For example, following code processes lines with tab stops by each eight columns.
sub tabbedsizing {
my ($self, $cols, $pre, $spc, $str) = @_;
( run in 0.557 second using v1.01-cache-2.11-cpan-71847e10f99 )