Acme-Lingua-ZH-Remix

 view release on metacpan or  search on metacpan

Changes  view on Meta::CPAN

0.98
	- Made to be Moo-based.

0.97
	- re-packaging with mbtiny. no feature changes.

0.96
	- Try to be faster by adding fuzzy exit conditions.

0.95
	- improve: range_sentence does not stop generating when the sentance length just passes the lower-bound.

0.94
	- fix: range constranit and infinite loop

0.93
	- fix dup phrases in a particular edge case.

0.92
	- random_sentence now accpets length constrain

0.91
	- prevent a div-by-zero exception
	- use AnyMoose instead

0.90
	- Make it a Moose-based module. Deprecate all old usages.

0.14
	- Add c9s as corpus author

README  view on Meta::CPAN

    This is an utility method that does not change the internal state of the
    topic object.

  feed($corpus_text)
    Instance method. Takes a scalar, return the topic object.

    Merge $corpus_text into the internal phrases corpus of the object.

  random_sentence( min => $min, max => $max )
    Instance method. Optionally takes "min" or "max" parameter as the
    constraint of sentence length (number of characters).

    Both min and max values are required to be integers greater or equal to
    zero. The value of max should be greater then the value of min. If any
    of these values are invalidate, it is treated as if they are not passed.

    The default values of min, max are 0 and 140, respectively.

    The implementation random algorthm based, thus it needs indefinite time
    to generate the result. If it takes more then 1000 iterations, it aborts
    and return the results anyway, regardless the length constraint. This
    can happen when the lengths of phrases from corpus do no adds up to a
    value within the given range.

    The returned scalar is the generate sentence string of wide characters.
    (Which makes Encode::is_utf8 return true.)

AUTHOR
    Kang-min Liu <gugod@gugod.org>

COPYRIGHT
    Copyright 2010- by Kang-min Liu, <gugod@gugod.org>

lib/Acme/Lingua/ZH/Remix.pm  view on Meta::CPAN

sub random_phrase {
    my $self = shift;
    my $type = shift;

    return ${ random(@{ $self->phrases->{$type}||=[] }) || \'' };
}

=head2 random_sentence( min => $min, max => $max )

Instance method. Optionally takes "min" or "max" parameter as the constraint of
sentence length (number of characters).

Both min and max values are required to be integers greater or equal to
zero. The value of max should be greater then the value of min. If any of these
values are invalidate, it is treated as if they are not passed.

The default values of min, max are 0 and 140, respectively.

The implementation random algorthm based, thus it needs indefinite time to
generate the result. If it takes more then 1000 iterations, it aborts and return
the results anyway, regardless the length constraint. This can happen when the
lengths of phrases from corpus do no adds up to a value within the given range.

The returned scalar is the generate sentence string of wide characters. (Which
makes Encode::is_utf8 return true.)

=cut

sub random_sentence {
    my ($self, %options) = @_;

    for my $p (qw(min max)) {

lib/Acme/Lingua/ZH/Remix.pm  view on Meta::CPAN

    }

    $options{min} ||= 0;
    $options{max} ||= 140;

    my $str = "";
    my @phrases;

    my $ending = $self->random_phrase(random(qw/。 ! ?/)) || "…";

    while ( length($ending) > $options{max} ) {
        $ending = $self->random_phrase(random(qw/。 ! ?/)) || "…";
    }

    unshift @phrases, $ending;

    my $l = length($ending);

    my $iterations = 0;
    my $max_iterations = 1000;
    my $average = ($options{min} + $options{max}) / 2;
    my $desired = int(rand($options{max} - $options{min}) + $options{min}) || $average || $options{max};

    while ($iterations++ < $max_iterations) {
        my $x;
        do {
            $x = random(',', '」', ')', '/')
        } while ($self->phrase_ratio($x) == 0);

        my $p = $self->random_phrase($x);

         if ($l + length($p) < $options{max}) {
            unshift @phrases, $p;
            $l += length($p);
        }

        my $r = abs(1 - $l/$desired);
        last if $r < 0.1;
        last if $r < 0.2 && $iterations >= $max_iterations/2;
    }

    $str = join "", @phrases;
    $str =~ s/,$//;
    $str =~ s/^「(.+)」$/$1/;

t/03-constrain.t  view on Meta::CPAN

use utf8;
use Test::More 0.98;
use Acme::Lingua::ZH::Remix;

my $r = Acme::Lingua::ZH::Remix->new;

my ($min, $max) = (5, 8);

for (1..100) {
    my $s = $r->random_sentence(min => $min, max => $max);
    my $l = length($s);

    utf8::encode($s);
    ok($l >= $min && $l <= $max, "length: $min <= $l <= $max");
}


done_testing;



( run in 0.526 second using v1.01-cache-2.11-cpan-65fba6d93b7 )