Acme-Wabby

 view release on metacpan or  search on metacpan

Wabby.pm  view on Meta::CPAN


            # If we're not at the end yet, add this word to our collected
            # string, increment our word count, and do it all again. 
            if ($next != -1) {
                $final .= " " . ${$self->{'data'}{'list'}}[$next]{'word'};
                $count++;
            }
        }

        # If we failed to make a long enough sentence, we need to do something.
        if ($count < $self->{'conf'}{'min_len'}) {

            # If we haven't yet passed our max number of attempts, try again.
            if ($attempts < $self->{'conf'}{'max_attempts'}) {
                $attempts++;
                next;
            }
            # If we passed our max number of attempts, we can take one of two
            # course of action.
            else {
                # If we're trying to talk about something in particular, we're
                # always going to be stuck with the same starting point.  Thus,
                # there's not the best chance for continued success, so just
                # give up and bail.
                if ($directed) {
                    return undef;
                }
                # If we're talking about random things, we likely just got
                # a bad starting point, so we'll pick a new random starting
                # point, and do the whole thing over again.
                else {
                    $attempts = 0;
                    $start = ${${$self->{'data'}{'list'}}[0]{'num'}}[int rand scalar @{${$self->{'data'}{'list'}}[0]{'num'}}];
                    $text = ${$self->{'data'}{'list'}}[$start]{'word'};
                    next;
                }
            }
        }
    }

    # If we're not case sensitive, make sure any I's by themselves are
    # capitalized, for aesthetic purposes.  If we are, they probably want
    # things to come out the way they are.
    # FIXME - Need to be able to configure this so that persons with
    # non-english texts can pick values that make sense.
    unless ($self->{'conf'}{'case_sensitive'}) {
        $final =~ s/(^|[^\w-])i($|[^\w-])/$1I$2/g
    }

    # Pick a random piece of punctuation to add to the end of the sentence.
    $final .= ${$self->{'conf'}{'punctuation'}}[int rand scalar @{$self->{'conf'}{'punctuation'}}];

    return $final;
}

# A method for getting some basic information about the current state.
# Arguments: None.
# Returns:   In a scalar context, this function returns a string describing the
#            current state.  In a list context, this function returns a list
#            containing two numbers -- the first one is the number of words
#            that this object knows about, and the second one is the average
#            number of transitions between words.
sub stats {
    my $self = shift;
    die "Invalid object" unless (ref($self) eq __PACKAGE__);

    # Get the number of words in our hash.
    my $word_count = scalar keys %{$self->{'data'}{'hash'}};

    # If we've got no words, just quit now.
    if ($word_count == 0) {
        return wantarray ? (0,0) : "I don't know anything!";
    }

    # Iterate over the list, adding up the number of transitions for each word.
    my $average = 0;
    foreach (@{$self->{'data'}{'list'}}) {
        $average += scalar @{$_->{'num'}} if defined($_->{'num'});
    }

    # Calculate an average, trim it to two decimal points, and return it.
    $average /= $word_count;
    $average = sprintf "%.2f", $average;
    return wantarray ? ($word_count, $average) : "Wabby knows $word_count "
        ."words, with an average of $average connections between each word.";
}

1;

__END__

=head1 NAME

Acme::Wabby - Create semi-random sentences based upon a body of text.

=head1 SYNOPSIS

  use Acme::Wabby qw(:errors);

  # Use the default options
  my $wabby = Acme::Wabby->new;

  # Pass in explicit options. (All options below are defaults)
  my $wabby = Acme::Wabby->new( min_len => 3, max_len => 30,
      punctuation => [".","?","!","..."], case_sensitive => 1,
      hash_file => "./wabbyhash.dat", list_file => "./wabbylist.dat",
      autosave_on_destroy => 0, max_attempts => 1000 );

  # Save the current state to the configured files
  $wabby->save;

  # Load a saved state from the configured files
  $wabby->load;

  # Add some text to the current state
  $wabby->add($the_complete_works_of_shakespeare);

  # Generate a random sentence
  print $wabby->spew, "\n";

  # Generate a random sentence, beginning with "The"

Wabby.pm  view on Meta::CPAN

To have an amusing experience, you will need to feed the object a body of text.
This text can come from virtually any source, although I enjoy using e-Texts
from the good folks at Project Gutenberg (http://promo.net/pg).  To add text to
the state, simply call the B<add()> method on the object, passing it a scalar
containing the text.

  $wabby->add($complete_works_of_shakespeare);

It is acceptable for the input text to contain embedded newlines or other such
things.  It is acceptable to call the B<add()> method many times, and at any
point in the object's life-span.  The B<add()> method will return B<undef> upon
error, and true upon success.

=head2 Generating random sentences

Once you have some text loaded into the object, you can generate random
sentences.  To do this, we use the B<spew()> method.  The B<spew()> method has
two modes of operation:  If no argument is given, it will generate and return a
random sentence.  If a single string is passed in, it will generate and return
a random sentence beginning with the provided string.

  my $random_sentence = $wabby->spew;
  my $not_so_random_sentence = $wabby->spew("Romeo and Juliet");

The B<spew()> method will return the generated string, or B<undef> upon error.
There are several error conditions which can occur in the B<spew()> method.
None of them are fatal, but they must be taken into account by the calling
program.  They are:

* At least (min_len * 10) words haven't been run through yet. (Must B<add()>
more text before trying again.)

* A string was passed in containing nothing. (Don't do that.)

* We don't know the last word in the string passed in, and can therefore not
generate a sentence with it. (Either teach us about it with B<add()>, or try
something else.)

* A sentence of at least min_len words could not be generated, even after
max_attempts tries at doing so. (Likely need to B<add()> more text before
trying again.)

=head2 Saving / loading state

Acme::Wabby can save and load state to disk using the Storable module.  To do
this, simply use the B<save()> and/or B<load()> methods.

  $wabby->save;
  $wabby->load;

These methods take no arguments, they simply save or load the state to or from
the file names which were defined when the object was created.  Loading a
saved state is much faster than re-parsing a large body of text.

=head2 Getting statistics

Using the B<stat()> method will provide you with some simple statistics about
the current state of an object.  When used in a scalar context, the B<stat()>
method will return a string containing a description of what the object knows.
When used in a list context, it will return a list of two numbers.  The first
entry in the list is the number of words that the object knows.  The second
entry in the list is the average number of connections between words.

  my ($wordcount, $average) = $wabby->stats;
  print "count=$wordcount, average=$average\n";
  print scalar($wabby->stats), "\n";

=head1 BUGS

 * Uses a lot of memory (not so much a bug as an implementation quirk).

=head1 TODO

 * Be better about normalizing input text.
 * Fix english assumtions about single-letter words besides I and a.
 * See about making the parsing into phrases and words more configurable.
 * Investigate using longer-order chains to improve generation quality.
 * Try to use less memory!

=head1 AUTHOR

Nathan Poznick <kraken@wang-fu.org>

=head1 CREDITS

 nick@misanthropia.nu - for writing the original wabbylegs.pl
 Project Gutenberg - for providing free text to feed to Acme::Wabby.

=head1 COPYRIGHT

Copyright (c) 2004, Nathan Poznick.  All rights reserved.  This program is free
software; you can redistribute it and/or modify it under the terms of the GPL
version 2.

=cut



( run in 2.428 seconds using v1.01-cache-2.11-cpan-97f6503c9c8 )