Algorithm-KMeans
view release on metacpan or search on metacpan
examples/data_generator.pl view on Meta::CPAN
use strict;
use Algorithm::KMeans;
# The Parameter File:
# How the synthetic data is generated for clustering is
# controlled entirely by the input_parameter_file keyword in
# the function call shown below. The mean vector and
# covariance matrix entries in file must be according to the
# syntax shown in the example param.txt file. It is best to
# edit this file as needed for the purpose of data
# generation.
#my $parameter_file = "param.txt";
#my $parameter_file = "param3.txt";
my $parameter_file = "param2.txt";
#my $out_datafile = "mydatafile2.dat";
my $out_datafile = "mydatafile3.dat";
Algorithm::KMeans->cluster_data_generator(
input_parameter_file => $parameter_file,
lib/Algorithm/KMeans.pm view on Meta::CPAN
} elsif ($visualization_data_field_width == 1) {
croak "No provision for plotting 1-D data\n";
}
}
########################### Generating Synthetic Data for Clustering ##############################
# The data generated corresponds to a multivariate distribution. The mean and the
# covariance of each Gaussian in the distribution are specified individually in a
# parameter file. See the example parameter file param.txt in the examples
# directory. Just edit this file for your own needs.
#
# The multivariate random numbers are generated by calling the Math::Random module.
# As you would expect, that module will insist that the covariance matrix you
# specify be symmetric and positive definite.
sub cluster_data_generator {
my $class = shift;
croak "illegal call of a class method" unless $class eq 'Algorithm::KMeans';
my %args = @_;
my $input_parameter_file = $args{input_parameter_file};
my $output_file = $args{output_datafile};
lib/Algorithm/KMeans.pm view on Meta::CPAN
$clusterer->visualize_clusters($visualization_mask);
# SYNTHETIC DATA GENERATION:
# The module has been provided with a class method for generating multivariate data
# for experimenting with clustering. The data generation is controlled by the
# contents of the parameter file that is supplied as an argument to the data
# generator method. The mean and covariance matrix entries in the parameter file
# must be according to the syntax shown in the param.txt file in the examples
# directory. It is best to edit this file as needed:
my $parameter_file = "param.txt";
my $out_datafile = "mydatafile.dat";
Algorithm::KMeans->cluster_data_generator(
input_parameter_file => $parameter_file,
output_datafile => $out_datafile,
number_data_points_per_cluster => $N );
=head1 CHANGES
lib/Algorithm/KMeans.pm view on Meta::CPAN
Algorithm::KMeans->cluster_data_generator(
input_parameter_file => $parameter_file,
output_datafile => $out_datafile,
number_data_points_per_cluster => 20 );
for generating multivariate data for clustering if you wish to play with synthetic
data for clustering. The input parameter file contains the means and the variances
for the different Gaussians you wish to use for the synthetic data. See the file
C<param.txt> provided in the examples directory. It will be easiest for you to just
edit this file for your data generation needs. In addition to the format of the
parameter file, the main constraint you need to observe in specifying the parameters
is that the dimensionality of the covariance matrix must correspond to the
dimensionality of the mean vectors. The multivariate random numbers are generated by
calling the C<Math::Random> module. As you would expect, this module requires that
the covariance matrices you specify in your parameter file be symmetric and positive
definite. Should the covariances in your parameter file not obey this condition, the
C<Math::Random> module will let you know.
=back
lib/Algorithm/KMeans.pm view on Meta::CPAN
For generating the data for experiments with clustering:
data_generator.pl
For cleaning up the examples directory:
cleanup_directory.pl
The examples directory also includes a parameter file, C<param.txt>, for generating
synthetic data for clustering. Just edit this file if you would like to generate
your own multivariate data for clustering. The parameter file is for the 3D case,
but you can generate data with any dimensionality through appropriate entries in the
parameter file.
=head1 EXPORT
None by design.
=head1 CAVEATS
( run in 0.256 second using v1.01-cache-2.11-cpan-de7293f3b23 )