Games-Dissociate

 view release on metacpan or  search on metacpan

lib/Games/Dissociate.pm  view on Meta::CPAN

### The POD is at the end. ###
require 5.000;
package Games::Dissociate;
use strict;
require Exporter;

use vars qw(@ISA @EXPORT @EXPORT_OK $Debug $VERSION);
use Carp;
@ISA = qw(Exporter);
@EXPORT = qw(dissociate_filter dissociate);
$VERSION = 1.0;
$Debug = 0;

###########################################################################

sub dissociate_filter {
  require Text::Wrap;
  require Getopt::Std;
  my %o;
  if(@ARGV) {
    Getopt::Std::getopts('c:w:m:', \%o)
     or die "Options:
  -cNUMBER
      Run a by-character dissociation with that number of
      characters as the group size.
  -wNUMBER
      Run a by-word dissociation with that number of
      words as the group size.
  -mNUMBER
      Specifies how many iterations the dissociator loop should make.
";
  }

  my $o;
  my $max;
  if($_[0]) {
    $o = $_[0];
  } elsif($o{'w'}){
    $o = - abs($o{'w'});
  } elsif($o{'c'}){
    $o =   abs($o{'c'});
  }
  $o ||= 2;

  if($_[1]) {
    $max = $_[1];
  } elsif ($o{'m'}) {
    $max = abs($o{'m'});
  }
  $max ||= 100;

  print "group_length: $o.  max_length: $max\n" if $Debug;
  print Text::Wrap::wrap(   '','', dissociate(join('', <>), $o, $max)  ), "\n";
  return;
}

#==========================================================================
sub dissociate {
  my $in = $_[0];
  my $arg = int($_[1] || 2);
  my $iteration_limit = $_[2] || 100;
  my @out;

  my $by_word = ($arg < 0);
  my $degree = abs($arg);
  my $last_match_point;

  $degree = 2 if $degree == 1;

  use locale;

  $in =~ tr<\cm\cj \t>< >s;
  die "No input\n" unless length $in;
  study $in;

  my $new_matcher;
  if($by_word) {
    $new_matcher = "\\W+(" . join("\\W+", ("\\w+") x $degree) . ")";
  } else {
    $new_matcher = "(" . ('.' x $degree) . ")";
  }

  # In use in the loop.
  my($re, @orig, $matched, $remainder,
     $i, $last_matched, $iteration);
  $iteration = 0;

  $last_match_point = -1;
  while($iteration < $iteration_limit) {
    ++$iteration;

lib/Games/Dissociate.pm  view on Meta::CPAN


3. Look for the other times the last-matched group occurs in the text,
and randomly select one of them.  (Or: select the I<next> time that
group occurs -- a shortcut I've made in the code, which seems to still
produce random-looking results).  Look at the group of tokens that
occurs right after that.  Make I<that> the last-matched group.  Loop
back to Step 2 until we think we've outputted enough.

4. But if the last-matched group from 2 occurred just that once in the
text, go back to step 1.

=back

Since the groups of characters or words (at least, when you look at
them as bits of text only group-size tokens long) are all taken from
the input text, you get somewhat natural-looking text -- as opposed to
what you'd get if you just randomly outputted single characters or
single words from the input text.

The process of applying a DissociatedPress algorithm to a bit
of text is called "dissociation".

=head1 PARAMETERS AND USAGE

To use this module after you've installed it, say "use
Games::Dissociate".  This imports the function C<dissociate> and the
procedure C<dissociate_filter>.

=over 4

=item dissociate($input, $group_size, $max)

The function C<dissociate> takes three parameters:

  $output = dissociate($input, $group_size, $max);

$input is the input string, hopefully containing a stretch of
(plaintext) text in a human language, encoded either in just plain
US-ASCII, or in a character-encoding your locale settings know about.
$output will be "dissociated text" (charmingly generated gibberish)
based on that input text.  (Note that output will contain no
line-breaks or tabs.  Yoy may wish, as C<dissociate_filter> does,
to pass the output thru Text::Wrap's C<wrap>.)

You'll get strange output if $input contains markup (HTML, LaTeX,
etc.), or is very short, or is not in a human language.

$group_size is the number of tokens (words or characters) that must be
in common between bits of text the dissociation algorithm skips
between.  A positive value means you want to dissociate by character,
with a group-size of that many characters (4 = 4 characters); a
negative value means you want to dissociate by word, with a group size
of that many words (-2 = 2 words).  I suggest values between -3 and 5;
I'm a fan of -2.  A $group_size value of 0 or 1 is invalid, and
currently causes C<dissociate> to use the default value of 2 (2
characters) instead.  A value of -1 is invalid, and currently causes
C<dissociate> to use the value of -2 (2 words) instead.  The
behavior/validity of $group_size values of 0, 1, or -1 may change in
future versions.

$max is a parameter used to control the maximum number of iterations
of C<dissociate>'s central loop -- it corresponds roughly to the
number of "chunks" of text you get back, where a chunk is N *
-$group_size words for negative values of $group_size, and N *
$group_size characters for positive values of $group_size.  $max must
be greater than 1.

If you need (!) more precise control over the size of the output text,
try setting set $max high and trim the output to size, and/or try
calling C<dissociate> multiple times until you get the amount of output
you want.  (But be sure to give up if C<dissociate> keeps returning
nullstring, as it will in some strange cases.)

C<dissociate> can also be called with the following syntaxes:

  dissociate($input, $group_size);
   # acts like max of 100

  dissociate($input);
   # acts like group size of 2 (characters) and max of 100

=item dissociate_filter()

=item dissociate_filter($group_size)

=item dissociate_filter($group_size, $max)

This library also provides the procedure C<dissociate_filter>, which
pulls input from "<>" (files specified on the command line, or STDIN),
and sends dissociated output to STDOUT.  It can be called with these
syntaxes:

  dissociate_filter($group_size, $max);

  dissociate_filter($group_size);
   # uses a default value for $max

  dissociate_filter();
   # uses a default value for $group_size and $max

These above-mentioned default values can come from command line
switches, if you make a script consisting of:

  #!/usr/local/bin/perl
  use Games::Dissociate;
  dissociate_filter;

and call that script, say, C<dissociate>, and call it as:

  dissociate -c5 -m200 < foo.txt

or

  dissociate -w2 -m70 foo.txt bar.txt | less

and so on.

To explain the switches:

C<-w[number]> specifies a by-word dissociation with that number of
words as the group size, C<-c[number]> specifies a by-character



( run in 1.977 second using v1.01-cache-2.11-cpan-71847e10f99 )