Algorithm-ExpectationMaximization
view release on metacpan or search on metacpan
examples/data_generator.pl view on Meta::CPAN
use strict;
use Algorithm::ExpectationMaximization;
# The Parameter File:
# How the synthetic data is generated for clustering is
# controlled entirely by the input_parameter_file keyword in
# the function call shown below. The class prior
# probabilities, the mean vectors and covariance matrix
# entries in file must be according to the syntax shown in
# the example param.txt file. It is best to edit that file
# as needed for the purpose of data generation.
#my $parameter_file = "param1.txt"; #2D
#my $parameter_file = "param2.txt"; #2D
#my $parameter_file = "param3.txt"; #2D
#my $parameter_file = "param4.txt"; #3D
#my $parameter_file = "param5.txt"; #3D
#my $parameter_file = "param6.txt"; #3D
my $parameter_file = "param7.txt"; #1D
lib/Algorithm/ExpectationMaximization.pm view on Meta::CPAN
}
}
################### Generating Synthetic Data for Clustering ###################
# The data generated corresponds to a multivariate distribution. The mean and the
# covariance of each Gaussian in the distribution are specified individually in a
# parameter file. The parameter file must also state the prior probabilities to be
# associated with each Gaussian. See the example parameter file param1.txt in the
# examples directory. Just edit this file for your own needs.
#
# The multivariate random numbers are generated by calling the Math::Random module.
# As you would expect, that module will insist that the covariance matrix you
# specify be symmetric and positive definite.
sub cluster_data_generator {
my $class = shift;
die "illegal call of a class method"
unless $class eq 'Algorithm::ExpectationMaximization';
my %args = @_;
my $input_parameter_file = $args{input_parameter_file};
lib/Algorithm/ExpectationMaximization.pm view on Meta::CPAN
# probability distributions is written out to a file named posterior_prob_plot.png
# and the PNG image of the disjoint clusters to a file called cluster_plot.png.
# SYNTHETIC DATA GENERATION:
# The module has been provided with a class method for generating multivariate
# data for experimenting with the EM algorithm. The data generation is controlled
# by the contents of a parameter file that is supplied as an argument to the data
# generator method. The priors, the means, and the covariance matrices in the
# parameter file must be according to the syntax shown in the `param1.txt' file in
# the `examples' directory. It is best to edit a copy of this file for your
# synthetic data generation needs.
my $parameter_file = "param1.txt";
my $out_datafile = "mydatafile1.dat";
Algorithm::ExpectationMaximization->cluster_data_generator(
input_parameter_file => $parameter_file,
output_datafile => $out_datafile,
total_number_of_data_points => $N );
# where the value of $N is the total number of data points you would like to see
lib/Algorithm/ExpectationMaximization.pm view on Meta::CPAN
input_parameter_file => $parameter_file,
output_datafile => $out_datafile,
total_number_of_data_points => 300
);
for generating multivariate data for clustering if you wish to play with synthetic
data for experimenting with the EM algorithm. The input parameter file must specify
the priors to be used for the Gaussians, their means, and their covariance matrices.
The format of the information contained in the parameter file must be as shown in the
file C<param1.txt> provided in the C<examples> directory. It will be easiest for you
to just edit a copy of this file for your data generation needs. In addition to the
format of the parameter file, the main constraint you need to observe in specifying
the parameters is that the dimensionality of the covariance matrices must correspond
to the dimensionality of the mean vectors. The multivariate random numbers are
generated by calling the C<Math::Random> module. As you would expect, this module
requires that the covariance matrices you specify in your parameter file be symmetric
and positive definite. Should the covariances in your parameter file not obey this
condition, the C<Math::Random> module will let you know.
=item B<visualize_data($data_visualization_mask):>
lib/Algorithm/ExpectationMaximization.pm view on Meta::CPAN
Going through the six examples listed above will make you familiar with how to make
the calls to the clustering and the visualization methods. The C<examples> directory
also includes several parameter files with names like
param1.txt
param2.txt
param3.txt
...
These were used to generate the synthetic data for which the results are shown in the
C<examples> directory. Just make a copy of one of these files and edit it if you
would like to generate your own multivariate data for clustering. Note that you can
generate data with any dimensionality through appropriate entries in the parameter
file.
=head1 CAVEATS
When you run the scripts in the C<examples> directory, your results will NOT always
look like what I have shown in the PNG image files in the directory. As mentioned
earlier in Description, the EM algorithm starting from randomly chosen initial
guesses for the cluster means can get stuck in a local maximum.
( run in 0.840 second using v1.01-cache-2.11-cpan-de7293f3b23 )