Algorithm-RandomPointGenerator
view release on metacpan or search on metacpan
lib/Algorithm/RandomPointGenerator.pm view on Meta::CPAN
# In this case, the number of points to be generated is set to 2000. These will
# be the points after the first 500 that are discarded to get past the effects of
# the starting state of the generator.
=head1 CHANGES
Version 1.01 downshifts the version of Perl that is required for this module. The
implementation code for the module is unchanged from Version 1.0.
=head1 DESCRIPTION
Several testing protocols for "big data" research projects involving large geographic
areas require a random set of points that are distributed according to a
user-specified probability density function that exists in the form of a 2D
histogram. This module is an implementation of the Metropolis-Hastings algorithm for
generating such a set of points.
=head1 METHODS
The module provides the following methods:
=over 4
=item B<new():>
A call to C<new()> constructs a new instance of the
C<Algorithm::RandomPointGenerator> class. If you wanted to set all the constructor
options, this call would look like:
my $generator = Algorithm::RandomPointGenerator->new(
input_histogram_file => $input_histogram_file,
bounding_box_file => $bounding_box_file,
number_of_points => 2000,
how_many_to_discard => 500,
proposal_density_width => 0.1,
y_axis_pos_direction => 'down',
);
where C<input_histogram_file> is the name of the file that contains a 2D histogram as
an approximation to the desired probability density function for the random points to
be generated. The data in the histogram file is expected to be in CSV format. Here
is a display of a very small portion of the contents of such a file for an actual
geographic region:
0,211407,216387,211410,205621,199122,192870, ........
0,408221,427716,427716,427716,427716,427716,427716, ......
0,408221,427716,427716,427716,427716,427716,427716, ......
....
....
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,165,9282,11967,15143, .....
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,....
The C<bounding_box_file> parameter of the constructor should delineate the portion of
the plane to which the input histogram corresponds. Here is an example of the
contents of an actual file supplied for this option:
-71.772016, -70.431923
-34.254251, -33.203240
Apart from any comment lines, there must exist exactly two lines in the bounding-box
file, with the first line indicating the left and the right limits of the horizontal
coordinates and the second line indicating the lower and the upper limits of the
vertical coordinates. (The values shown above are the longitude and the latitude
limits for a region in Chile, in case you are curious.)
=back
=head2 Constructor Parameters:
=over 8
=item C<input_histogram_file>:
This required parameter supplies the name of the file that contains a 2D histogram as
the desired density function for the random points that the module will generate.
Each line record in this file must correspond to one row of the 2D histogram. The
left-to-right direction in the file will be considered to be positive direction for
the x-axis. As for the positive direction for the y-axis, it is common for that to
be from top to bottom when the histogram is written out to a text file as an array of
integers. It is important to bear in mind this orientation of the histogram in light
of the fact that a bounding box is specified typically with its y-axis going upwards
(whereas the x-axis continues to be positive from left to right). This inconsistency
between how a 2D histogram is typically stored in a text file and how a bounding box
is likely to be specified means that if the events occur more frequently in the upper
right hand corner of the bounding box, those high counts would show up in the lower
right hand corner of the histogram in a text file (or in a terminal display of the
contents of such a file). B<You can use the constructor option
C<y_axis_pos_direction> to reverse the positive sense of the y direction for the
histogram. If you set C<y_axis_pos_direction> to the string C<up>, then the
orientation of the y axis in both the histogram and the bounding box will be the
same.>
=item C<bounding_box_file>:
This required parameter supplies the name of the file that contains the bounding box
information. By bounding box is meant the part of the XY-plane that corresponds to
the histogram supplied through the C<input_histogram_file> option. Apart from any
comment lines, this file must contain exactly two lines and each line must contain
exactly two numeric entries. Additionally, the first entry in each of the two lines
must be smaller than the second entry in the same line. The two entries in the first
line define the lower and the upper bounds on the x-axis and the two entries in the
second line do the same for the y-axis.
=item C<number_of_points>:
This parameter specifies the number of random points that you want the module to
generate.
=item C<how_many_to_discard>:
The Metropolis-Hastings algorithm belongs to a category of algorithms known as
random-walk algorithms. Since the random walk carried out by such algorithms must be
initialized with user input, it is necessary to discard the points produced until the
effects the initial state have died out. This parameter specifies how many of the
generated points will be discarded. This parameter is optional in the sense that it
has a default value of 500.
=item C<proposal_density_width>:
( run in 1.457 second using v1.01-cache-2.11-cpan-0bd6704ced7 )