Games-Dissociate
view release on metacpan or search on metacpan
lib/Games/Dissociate.pm view on Meta::CPAN
### The POD is at the end. ###
require 5.000;
package Games::Dissociate;
use strict;
require Exporter;
use vars qw(@ISA @EXPORT @EXPORT_OK $Debug $VERSION);
use Carp;
@ISA = qw(Exporter);
@EXPORT = qw(dissociate_filter dissociate);
$VERSION = 1.0;
$Debug = 0;
###########################################################################
sub dissociate_filter {
require Text::Wrap;
require Getopt::Std;
my %o;
if(@ARGV) {
Getopt::Std::getopts('c:w:m:', \%o)
or die "Options:
-cNUMBER
Run a by-character dissociation with that number of
characters as the group size.
-wNUMBER
Run a by-word dissociation with that number of
words as the group size.
-mNUMBER
Specifies how many iterations the dissociator loop should make.
";
}
my $o;
my $max;
if($_[0]) {
$o = $_[0];
} elsif($o{'w'}){
$o = - abs($o{'w'});
} elsif($o{'c'}){
$o = abs($o{'c'});
}
$o ||= 2;
if($_[1]) {
$max = $_[1];
} elsif ($o{'m'}) {
$max = abs($o{'m'});
}
$max ||= 100;
print "group_length: $o. max_length: $max\n" if $Debug;
print Text::Wrap::wrap( '','', dissociate(join('', <>), $o, $max) ), "\n";
return;
}
#==========================================================================
sub dissociate {
my $in = $_[0];
my $arg = int($_[1] || 2);
my $iteration_limit = $_[2] || 100;
my @out;
my $by_word = ($arg < 0);
my $degree = abs($arg);
my $last_match_point;
$degree = 2 if $degree == 1;
use locale;
$in =~ tr<\cm\cj \t>< >s;
die "No input\n" unless length $in;
study $in;
my $new_matcher;
if($by_word) {
$new_matcher = "\\W+(" . join("\\W+", ("\\w+") x $degree) . ")";
} else {
$new_matcher = "(" . ('.' x $degree) . ")";
}
# In use in the loop.
my($re, @orig, $matched, $remainder,
$i, $last_matched, $iteration);
$iteration = 0;
$last_match_point = -1;
while($iteration < $iteration_limit) {
++$iteration;
lib/Games/Dissociate.pm view on Meta::CPAN
3. Look for the other times the last-matched group occurs in the text,
and randomly select one of them. (Or: select the I<next> time that
group occurs -- a shortcut I've made in the code, which seems to still
produce random-looking results). Look at the group of tokens that
occurs right after that. Make I<that> the last-matched group. Loop
back to Step 2 until we think we've outputted enough.
4. But if the last-matched group from 2 occurred just that once in the
text, go back to step 1.
=back
Since the groups of characters or words (at least, when you look at
them as bits of text only group-size tokens long) are all taken from
the input text, you get somewhat natural-looking text -- as opposed to
what you'd get if you just randomly outputted single characters or
single words from the input text.
The process of applying a DissociatedPress algorithm to a bit
of text is called "dissociation".
=head1 PARAMETERS AND USAGE
To use this module after you've installed it, say "use
Games::Dissociate". This imports the function C<dissociate> and the
procedure C<dissociate_filter>.
=over 4
=item dissociate($input, $group_size, $max)
The function C<dissociate> takes three parameters:
$output = dissociate($input, $group_size, $max);
$input is the input string, hopefully containing a stretch of
(plaintext) text in a human language, encoded either in just plain
US-ASCII, or in a character-encoding your locale settings know about.
$output will be "dissociated text" (charmingly generated gibberish)
based on that input text. (Note that output will contain no
line-breaks or tabs. Yoy may wish, as C<dissociate_filter> does,
to pass the output thru Text::Wrap's C<wrap>.)
You'll get strange output if $input contains markup (HTML, LaTeX,
etc.), or is very short, or is not in a human language.
$group_size is the number of tokens (words or characters) that must be
in common between bits of text the dissociation algorithm skips
between. A positive value means you want to dissociate by character,
with a group-size of that many characters (4 = 4 characters); a
negative value means you want to dissociate by word, with a group size
of that many words (-2 = 2 words). I suggest values between -3 and 5;
I'm a fan of -2. A $group_size value of 0 or 1 is invalid, and
currently causes C<dissociate> to use the default value of 2 (2
characters) instead. A value of -1 is invalid, and currently causes
C<dissociate> to use the value of -2 (2 words) instead. The
behavior/validity of $group_size values of 0, 1, or -1 may change in
future versions.
$max is a parameter used to control the maximum number of iterations
of C<dissociate>'s central loop -- it corresponds roughly to the
number of "chunks" of text you get back, where a chunk is N *
-$group_size words for negative values of $group_size, and N *
$group_size characters for positive values of $group_size. $max must
be greater than 1.
If you need (!) more precise control over the size of the output text,
try setting set $max high and trim the output to size, and/or try
calling C<dissociate> multiple times until you get the amount of output
you want. (But be sure to give up if C<dissociate> keeps returning
nullstring, as it will in some strange cases.)
C<dissociate> can also be called with the following syntaxes:
dissociate($input, $group_size);
# acts like max of 100
dissociate($input);
# acts like group size of 2 (characters) and max of 100
=item dissociate_filter()
=item dissociate_filter($group_size)
=item dissociate_filter($group_size, $max)
This library also provides the procedure C<dissociate_filter>, which
pulls input from "<>" (files specified on the command line, or STDIN),
and sends dissociated output to STDOUT. It can be called with these
syntaxes:
dissociate_filter($group_size, $max);
dissociate_filter($group_size);
# uses a default value for $max
dissociate_filter();
# uses a default value for $group_size and $max
These above-mentioned default values can come from command line
switches, if you make a script consisting of:
#!/usr/local/bin/perl
use Games::Dissociate;
dissociate_filter;
and call that script, say, C<dissociate>, and call it as:
dissociate -c5 -m200 < foo.txt
or
dissociate -w2 -m70 foo.txt bar.txt | less
and so on.
To explain the switches:
C<-w[number]> specifies a by-word dissociation with that number of
words as the group size, C<-c[number]> specifies a by-character
( run in 1.977 second using v1.01-cache-2.11-cpan-71847e10f99 )