Algorithm-ExpectationMaximization
view release on metacpan or search on metacpan
examples/canned_example1.pl
examples/canned_example2.pl
examples/canned_example3.pl
examples/canned_example4.pl
examples/canned_example5.pl
examples/canned_example6.pl
examples/cleanup_directory.pl
examples/cluster_plot.png
examples/data_generator.pl
examples/data_scatter_plot.png
examples/datafile_1d.txt
examples/mydatafile1.dat
examples/mydatafile2.dat
examples/mydatafile3.dat
examples/mydatafile4.dat
examples/mydatafile5.dat
examples/mydatafile6.dat
examples/mydatafile7.dat
examples/sphericaldata.csv
examples/param1.txt
examples/param2.txt
examples/param3.txt
examples/param4.txt
examples/param5.txt
examples/param6.txt
examples/param7.txt
examples/posterior_prob_plot.png
examples/README
examples/save_example_1_cluster_plot.png
examples/save_example_1_posterior_prob_plot.png
examples/save_example_2_cluster_plot.png
examples/save_example_2_posterior_prob_plot.png
examples/save_example_3_cluster_plot.png
examples/save_example_3_posterior_prob_plot.png
examples/save_example_4_cluster_plot.png
examples/save_example_4_posterior_prob_plot.png
examples/save_example_5_cluster_plot.png
examples/save_example_5_posterior_prob_plot.png
examples/save_example_6_cluster_plot.png
examples/save_example_6_posterior_prob_plot.png
lib/Algorithm/ExpectationMaximization.pm
Makefile.PL
MANIFEST This list of files
README
t/test.t
META.yml Module YAML meta-data (added by MakeMaker)
META.json Module JSON meta-data (added by MakeMaker)
examples/README view on Meta::CPAN
1) canned_example1.pl
This example illustrates 2D clustering of co-located but
overloapping clusters with different covariances.
Unless your run gets trapped in a local maximum, your results
should look like those shown in the following image files:
save_example_1_cluster_plot.png (for hard clustering)
save_example_1_posterior_prob_plot.png (for soft clustering)
If you are using a Linux machine, you can display these image
files with the 'display' utility.
2) canned_example2.pl
This example illustrates 2D clustering involving non-overlapping
clusters.
Unless your run gets trapped in a local maximum, your results
should look like those shown in the following image files:
save_example_2_cluster_plot.png (for hard clustering)
save_example_2_posterior_prob_plot.png (for soft clustering)
3) canned_example3.pl
This example illustrates 2D clustering involving overlapping
clusters whose means are at different locations.
Unless your run gets trapped in a local maximum, your results
should look like those shown in the following image files:
save_example_3_cluster_plot.png (for hard clustering)
save_example_3_posterior_prob_plot.png (for soft clustering)
4) canned_example4.pl
This example illustrates 3D clustering involving non-overlapping
clusters.
Unless your run gets trapped in a local maximum, your results
should look like those shown in the following image files:
save_example_4_cluster_plot.png (for hard clustering)
save_example_4_posterior_prob_plot.png (for soft clustering)
5) canned_example5.pl
This example illustrates 3D clustering involving overlapping
clusters.
Unless your run gets trapped in a local maximum, your results
should look like those shown in the following image files:
save_example_5_cluster_plot.png (for hard clustering)
save_example_5_posterior_prob_plot.png (for soft clustering)
6) canned_example6.pl
This example was added in Version 1.2 to illustrate clustering
of 1-D data.
Unless your run gets trapped in a local maximum, your results
should look like those shown in the following image files:
save_example_6_cluster_plot.png (for hard clustering)
save_example_6_posterior_prob_plot.png (for soft clustering)
========================================================================
Support scripts in the `examples' directory:
1) For generating the data for experiments with clustering
lib/Algorithm/ExpectationMaximization.pm view on Meta::CPAN
print OUTPUT "$i $histogram[$i]\n";
}
# $arg_string .= "\"$temp_file\" using 1:2 ti col smooth frequency with boxes lc $cindex, ";
$arg_string .= "\"$temp_file\" using 2:xtic(1) ti col smooth frequency with boxes lc $cindex, ";
close OUTPUT;
}
}
$arg_string = $arg_string =~ /^(.*),[ ]+$/;
$arg_string = $1;
if ($visualization_data_field_width > 2) {
$plot->gnuplot_cmd( 'set terminal png color',
'set output "cluster_plot.png"');
$plot->gnuplot_cmd( "splot $arg_string" );
} elsif ($visualization_data_field_width == 2) {
$plot->gnuplot_cmd('set terminal png',
'set output "cluster_plot.png"');
$plot->gnuplot_cmd( "plot $arg_string" );
} elsif ($visualization_data_field_width == 1) {
$plot->gnuplot_cmd('set terminal png',
'set output "cluster_plot.png"');
$plot->gnuplot_cmd( "plot $arg_string" );
}
}
# This method is for the visualization of the posterior class distributions. In
# other words, this method allows us to see the soft clustering produced by the EM
# algorithm. While much of the gnuplot logic here is the same as in the
# visualize_clusters() method, there are significant differences in how the data is
# pooled for the purpose of display.
sub visualize_distributions {
lib/Algorithm/ExpectationMaximization.pm view on Meta::CPAN
print OUTPUT "$i $histogram[$i]\n";
}
$arg_string .= "\"$temp_file\" using 2:xtic(1) ti col smooth frequency with boxes lc $cindex, ";
close OUTPUT;
}
}
$arg_string = $arg_string =~ /^(.*),[ ]+$/;
$arg_string = $1;
if ($visualization_data_field_width > 2) {
$plot->gnuplot_cmd( 'set terminal png',
'set output "posterior_prob_plot.png"');
$plot->gnuplot_cmd( "splot $arg_string" );
} elsif ($visualization_data_field_width == 2) {
$plot->gnuplot_cmd( 'set terminal png',
'set output "posterior_prob_plot.png"');
$plot->gnuplot_cmd( "plot $arg_string" );
} elsif ($visualization_data_field_width == 1) {
$plot->gnuplot_cmd( 'set terminal png',
'set output "posterior_prob_plot.png"');
$plot->gnuplot_cmd( "plot $arg_string" );
}
}
# The method shown below should be called only AFTER you have called the method
# read_data_from_file(). The visualize_data() is meant for the visualization of the
# original data in its various 2D or 3D subspaces.
sub visualize_data {
my $self = shift;
my $v_mask = shift || die "visualization mask missing";
lib/Algorithm/ExpectationMaximization.pm view on Meta::CPAN
foreach my $i (0..@all_data-1) {
$histogram[int( ($all_data[$i] - $minval) / $delta )]++;
}
foreach my $i (0..@histogram-1) {
print OUTPUT "$i $histogram[$i]\n";
}
$arg_string = "\"$temp_file\" using 2:xtic(1) ti col smooth frequency with boxes lc rgb 'green'";
close OUTPUT;
}
if ($visualization_data_field_width > 2) {
$plot->gnuplot_cmd( 'set terminal png',
'set output "data_scatter_plot.png"');
$plot->gnuplot_cmd( "splot $arg_string" );
} elsif ($visualization_data_field_width == 2) {
$plot->gnuplot_cmd( 'set terminal png',
'set output "data_scatter_plot.png"');
$plot->gnuplot_cmd( "plot $arg_string" );
} elsif ($visualization_data_field_width == 1) {
$plot->gnuplot_cmd( 'set terminal png',
'set output "data_scatter_plot.png"');
$plot->gnuplot_cmd( "plot $arg_string" );
}
}
################### Generating Synthetic Data for Clustering ###################
# The data generated corresponds to a multivariate distribution. The mean and the
# covariance of each Gaussian in the distribution are specified individually in a
# parameter file. The parameter file must also state the prior probabilities to be
lib/Algorithm/ExpectationMaximization.pm view on Meta::CPAN
# the clusters in:
my $visualization_mask = "111";
$clusterer->visualize_clusters($visualization_mask);
$clusterer->visualize_distributions($visualization_mask);
$clusterer->plot_hardcopy_clusters($visualization_mask);
$clusterer->plot_hardcopy_distributions($visualization_mask);
# where the last two invocations are for writing out the PNG plots of the
# visualization displays to disk files. The PNG image of the posterior
# probability distributions is written out to a file named posterior_prob_plot.png
# and the PNG image of the disjoint clusters to a file called cluster_plot.png.
# SYNTHETIC DATA GENERATION:
# The module has been provided with a class method for generating multivariate
# data for experimenting with the EM algorithm. The data generation is controlled
# by the contents of a parameter file that is supplied as an argument to the data
# generator method. The priors, the means, and the covariance matrices in the
# parameter file must be according to the syntax shown in the `param1.txt' file in
# the `examples' directory. It is best to edit a copy of this file for your
# synthetic data generation needs.
lib/Algorithm/ExpectationMaximization.pm view on Meta::CPAN
future versions). You are urged to start by executing the
following five example scripts:
=over 16
=item I<canned_example1.pl>
This example applies the EM algorithm to the data contained in the datafile
C<mydatafile.dat>. The mixture data in the file corresponds to three overlapping
Gaussian components in a star-shaped pattern. The EM based clustering for this data
is shown in the files C<save_example_1_cluster_plot.png> and
C<save_example_1_posterior_prob_plot.png>, the former displaying the hard clusters
obtained by using the naive Bayes' classifier and the latter showing the soft
clusters obtained on the basis of the posterior class probabilities at the data
points.
=item I<canned_example2.pl>
The datafile used in this example is C<mydatafile2.dat>. This mixture data
corresponds to two well-separated relatively isotropic Gaussians. EM based clustering for this
data is shown in the files C<save_example_2_cluster_plot.png> and
C<save_example_2_posterior_prob_plot.png>, the former displaying the hard clusters
obtained by using the naive Bayes' classifier and the latter showing the soft
clusters obtained by using the posterior class probabilities at the data points.
=item I<canned_example3.pl>
Like the first example, this example again involves three Gaussians, but now their
means are not co-located. Additionally, we now seed the clusters manually by
specifying three selected data points as the initial guesses for the cluster means.
The datafile used for this example is C<mydatafile3.dat>. The EM based clustering
for this data is shown in the files C<save_example_3_cluster_plot.png> and
C<save_example_3_posterior_prob_plot.png>, the former displaying the hard clusters
obtained by using the naive Bayes' classifier and the latter showing the soft
clusters obtained on the basis of the posterior class probabilities at the data
points.
=item I<canned_example4.pl>
Whereas the three previous examples demonstrated EM based clustering of 2D data, we
now present an example of clustering in 3D. The datafile used in this example is
C<mydatafile4.dat>. This mixture data corresponds to three well-separated but highly
anisotropic Gaussians. The EM derived clustering for this data is shown in the files
C<save_example_4_cluster_plot.png> and C<save_example_4_posterior_prob_plot.png>, the
former displaying the hard clusters obtained by using the naive Bayes' classifier and
the latter showing the soft clusters obtained on the basis of the posterior class
probabilities at the data points.
You may also wish to run this example on the data in a CSV file in the C<examples>
directory. The name of the file is C<sphericaldata.csv>.
=item I<canned_example5.pl>
We again demonstrate clustering in 3D but now we have one Gaussian cluster that
"cuts" through the other two Gaussian clusters. The datafile used in this example is
C<mydatafile5.dat>. The three Gaussians in this case are highly overlapping and
highly anisotropic. The EM derived clustering for this data is shown in the files
C<save_example_5_cluster_plot.png> and C<save_example_5_posterior_prob_plot.png>, the
former displaying the hard clusters obtained by using the naive Bayes' classifier and
the latter showing the soft clusters obtained through the posterior class
probabilities at the data points.
=item I<canned_example6.pl>
This example, added in Version 1.2, demonstrates the use of this module for 1-D data.
In order to visualize the clusters for the 1-D case, we show them through their
respective histograms. The datafile used in this example is C<mydatafile7.dat>. The
data consists of two overlapping Gaussians. The EM derived clustering for this data
is shown in the files C<save_example_6_cluster_plot.png> and
C<save_example_6_posterior_prob_plot.png>, the former displaying the hard clusters
obtained by using the naive Bayes' classifier and the latter showing the soft
clusters obtained through the posterior class probabilities at the data points.
=back
Going through the six examples listed above will make you familiar with how to make
the calls to the clustering and the visualization methods. The C<examples> directory
also includes several parameter files with names like
param1.txt
( run in 1.897 second using v1.01-cache-2.11-cpan-df04353d9ac )