Acme-Lingua-ZH-Remix
view release on metacpan or search on metacpan
0.98
- Made to be Moo-based.
0.97
- re-packaging with mbtiny. no feature changes.
0.96
- Try to be faster by adding fuzzy exit conditions.
0.95
- improve: range_sentence does not stop generating when the sentance length just passes the lower-bound.
0.94
- fix: range constranit and infinite loop
0.93
- fix dup phrases in a particular edge case.
0.92
- random_sentence now accpets length constrain
0.91
- prevent a div-by-zero exception
- use AnyMoose instead
0.90
- Make it a Moose-based module. Deprecate all old usages.
0.14
- Add c9s as corpus author
This is an utility method that does not change the internal state of the
topic object.
feed($corpus_text)
Instance method. Takes a scalar, return the topic object.
Merge $corpus_text into the internal phrases corpus of the object.
random_sentence( min => $min, max => $max )
Instance method. Optionally takes "min" or "max" parameter as the
constraint of sentence length (number of characters).
Both min and max values are required to be integers greater or equal to
zero. The value of max should be greater then the value of min. If any
of these values are invalidate, it is treated as if they are not passed.
The default values of min, max are 0 and 140, respectively.
The implementation random algorthm based, thus it needs indefinite time
to generate the result. If it takes more then 1000 iterations, it aborts
and return the results anyway, regardless the length constraint. This
can happen when the lengths of phrases from corpus do no adds up to a
value within the given range.
The returned scalar is the generate sentence string of wide characters.
(Which makes Encode::is_utf8 return true.)
AUTHOR
Kang-min Liu <gugod@gugod.org>
COPYRIGHT
Copyright 2010- by Kang-min Liu, <gugod@gugod.org>
lib/Acme/Lingua/ZH/Remix.pm view on Meta::CPAN
sub random_phrase {
my $self = shift;
my $type = shift;
return ${ random(@{ $self->phrases->{$type}||=[] }) || \'' };
}
=head2 random_sentence( min => $min, max => $max )
Instance method. Optionally takes "min" or "max" parameter as the constraint of
sentence length (number of characters).
Both min and max values are required to be integers greater or equal to
zero. The value of max should be greater then the value of min. If any of these
values are invalidate, it is treated as if they are not passed.
The default values of min, max are 0 and 140, respectively.
The implementation random algorthm based, thus it needs indefinite time to
generate the result. If it takes more then 1000 iterations, it aborts and return
the results anyway, regardless the length constraint. This can happen when the
lengths of phrases from corpus do no adds up to a value within the given range.
The returned scalar is the generate sentence string of wide characters. (Which
makes Encode::is_utf8 return true.)
=cut
sub random_sentence {
my ($self, %options) = @_;
for my $p (qw(min max)) {
lib/Acme/Lingua/ZH/Remix.pm view on Meta::CPAN
}
$options{min} ||= 0;
$options{max} ||= 140;
my $str = "";
my @phrases;
my $ending = $self->random_phrase(random(qw/ã ï¼ ï¼/)) || "â¦";
while ( length($ending) > $options{max} ) {
$ending = $self->random_phrase(random(qw/ã ï¼ ï¼/)) || "â¦";
}
unshift @phrases, $ending;
my $l = length($ending);
my $iterations = 0;
my $max_iterations = 1000;
my $average = ($options{min} + $options{max}) / 2;
my $desired = int(rand($options{max} - $options{min}) + $options{min}) || $average || $options{max};
while ($iterations++ < $max_iterations) {
my $x;
do {
$x = random('ï¼', 'ã', 'ï¼', '/')
} while ($self->phrase_ratio($x) == 0);
my $p = $self->random_phrase($x);
if ($l + length($p) < $options{max}) {
unshift @phrases, $p;
$l += length($p);
}
my $r = abs(1 - $l/$desired);
last if $r < 0.1;
last if $r < 0.2 && $iterations >= $max_iterations/2;
}
$str = join "", @phrases;
$str =~ s/ï¼$//;
$str =~ s/^ã(.+)ã$/$1/;
t/03-constrain.t view on Meta::CPAN
use utf8;
use Test::More 0.98;
use Acme::Lingua::ZH::Remix;
my $r = Acme::Lingua::ZH::Remix->new;
my ($min, $max) = (5, 8);
for (1..100) {
my $s = $r->random_sentence(min => $min, max => $max);
my $l = length($s);
utf8::encode($s);
ok($l >= $min && $l <= $max, "length: $min <= $l <= $max");
}
done_testing;
( run in 0.526 second using v1.01-cache-2.11-cpan-65fba6d93b7 )