view release on metacpan or search on metacpan
examples/cleanup_directory.pl
examples/example1.pl
examples/example2.pl
examples/example3.pl
examples/example4.pl
examples/generate_data_on_a_sphere.pl
examples/3_clusters_on_a_sphere_3000_samples.csv
examples/3_clusters_on_a_sphere_498_samples.csv
examples/4_clusters_on_a_sphere_1000_samples.csv
examples/data_visualizer.pl
examples/example1_initial_clustering.png
examples/example1_clustering_after_3_iterations.png
examples/example1_clustering_after_graph_partitioning.png
examples/example1_final_clustering.png
examples/example2_initial_clustering.png
examples/example2_clustering_after_3_iterations.png
examples/example2_clustering_after_8_iterations.png
examples/example2_clustering_after_graph_partitioning.png
examples/example2_final_clustering.png
examples/example3_initial_clustering.png
examples/example3_clustering_after_3_iterations.png
examples/example3_clustering_after_7_iterations.png
examples/example3_clustering_after_graph_partitioning.png
examples/example3_final_clustering.png
examples/README
lib/Algorithm/LinearManifoldDataClusterer.pm
Makefile.PL
MANIFEST This list of files
README
t/test.t
META.yml Module YAML meta-data (added by MakeMaker)
META.json Module JSON meta-data (added by MakeMaker)
examples/README view on Meta::CPAN
For visualizing the data generated by the above script:
data_visualizer.pl
For cleaning up the examples directory:
cleanup_directory.pl
Invoking the cleanup_directory.pl script will get rid of all the PNG image files that
are generated by the module when you run it with the constructor option
make_png_for_each_iteration set to 1.
examples/cleanup_directory.pl view on Meta::CPAN
# files will also be deleted automatically if the program
# terminates properly.
# Should these files not be deleted automatically, you can
# execute this script for cleaning up the directory.
unlink glob "cluster*.txt";
unlink glob "__temp_*";
unlink glob "clustering*at*iteration*.png";
unlink "initial_clusters.png";
unlink "final_clustering.png";
unlink glob "*produced_by_graph_partitioning.png";
examples/example1.pl view on Meta::CPAN
datafile => $datafile,
mask => $mask,
K => 3, # number of clusters
P => 2, # manifold dimensionality
max_iterations => 15,
cluster_search_multiplier => 2,
delta_reconstruction_error => 0.001,
terminal_output => 1,
visualize_each_iteration => 1,
show_hidden_in_3D_plots => 1,
make_png_for_each_iteration => 1,
);
$clusterer->get_data_from_csv();
my $clusters = $clusterer->linear_manifold_clusterer();
$clusterer->display_reconstruction_errors_as_a_function_of_iterations();
$clusterer->write_clusters_to_files($clusters);
$clusterer->visualize_clusters_on_sphere("final clustering", $clusters);
# Now make a png image file that shows the final clusters:
$clusterer->visualize_clusters_on_sphere("final_clustering", $clusters, "png");
examples/example2.pl view on Meta::CPAN
datafile => $datafile,
mask => $mask,
K => 3, # number of clusters
P => 2, # manifold dimensionality
max_iterations => 15,
cluster_search_multiplier => 2,
delta_reconstruction_error => 0.012,
terminal_output => 1,
visualize_each_iteration => 1,
show_hidden_in_3D_plots => 0,
make_png_for_each_iteration => 1,
);
$clusterer->get_data_from_csv();
my $clusters = $clusterer->linear_manifold_clusterer();
$clusterer->display_reconstruction_errors_as_a_function_of_iterations();
$clusterer->write_clusters_to_files($clusters);
$clusterer->visualize_clusters_on_sphere("final_clustering", $clusters);
# Now make a png image file that shows the final clusters:
$clusterer->visualize_clusters_on_sphere("final_clustering", $clusters, "png");
examples/example3.pl view on Meta::CPAN
datafile => $datafile,
mask => $mask,
K => 4, # number of clusters
P => 2, # manifold dimensionality
cluster_search_multiplier => 2,
max_iterations => 15,
delta_reconstruction_error => 0.002,
terminal_output => 1,
visualize_each_iteration => 1,
show_hidden_in_3D_plots => 1,
make_png_for_each_iteration => 1,
);
$clusterer->get_data_from_csv();
my $clusters = $clusterer->linear_manifold_clusterer();
$clusterer->display_reconstruction_errors_as_a_function_of_iterations();
$clusterer->write_clusters_to_files($clusters);
$clusterer->visualize_clusters_on_sphere("final clustering", $clusters);
# Now make a png image file that shows the final clusters:
$clusterer->visualize_clusters_on_sphere("final_clustering", $clusters, "png");
examples/example4.pl view on Meta::CPAN
datafile => $datafile,
mask => $mask,
K => 3, # number of clusters
P => 2, # manifold dimensionality
max_iterations => 15,
cluster_search_multiplier => 1,
delta_reconstruction_error => 0.001,
terminal_output => 1,
visualize_each_iteration => 1,
show_hidden_in_3D_plots => 1,
make_png_for_each_iteration => 1,
);
$clusterer->get_data_from_csv();
my $clusters = $clusterer->auto_retry_clusterer();
$clusterer->display_reconstruction_errors_as_a_function_of_iterations();
$clusterer->write_clusters_to_files($clusters);
$clusterer->visualize_clusters_on_sphere("final clustering", $clusters);
# Now make a png image file that shows the final clusters:
$clusterer->visualize_clusters_on_sphere("final_clustering", $clusters, "png");
lib/Algorithm/LinearManifoldDataClusterer.pm view on Meta::CPAN
_mask => $args{mask} || croak("mask required"),
_K => $args{K} || 0,
_P => $args{P} || 0,
_terminal_output => $args{terminal_output} || 0,
_max_iterations => $args{max_iterations} || 0,
_delta_reconstruction_error => $args{delta_reconstruction_error} || 0.001,
_delta_normalized_error => undef,
_cluster_search_multiplier => $args{cluster_search_multiplier} || 1,
_visualize_each_iteration => $args{visualize_each_iteration} == 0 ? 0 : 1,
_show_hidden_in_3D_plots => $args{show_hidden_in_3D_plots} == 0 ? 0 : 1,
_make_png_for_each_iteration => $args{make_png_for_each_iteration} == 0 ? 0 : 1,
_debug => $args{debug} || 0,
_N => 0,
_KM => $args{K} * $args{cluster_search_multiplier},
_data_hash => {},
_data_tags => [],
_data_dimensions => 0,
_final_clusters => [],
_auto_retry_flag => 0,
_num_iterations_actually_used => undef,
_scale_factor => undef,
lib/Algorithm/LinearManifoldDataClusterer.pm view on Meta::CPAN
if ($self->{_debug}) {
foreach my $centroid (@initial_cluster_center_coords) {
print "Initial cluster center coords: @{$centroid}\n";
}
}
my $initial_clusters = $self->assign_data_to_clusters_initial(\@initial_cluster_center_coords);
if ($self->{_data_dimensions} == 3) {
$visualization_msg = "initial_clusters";
$self->visualize_clusters_on_sphere($visualization_msg, $initial_clusters)
if $self->{_visualize_each_iteration};
$self->visualize_clusters_on_sphere($visualization_msg, $initial_clusters, "png")
if $self->{_make_png_for_each_iteration};
}
foreach my $cluster (@$initial_clusters) {
my ($mean, $covariance) = $self->estimate_mean_and_covariance($cluster);
display_mean_and_covariance($mean, $covariance) if $self->{_debug};
}
my @clusters = @$initial_clusters;
display_clusters(\@clusters) if $self->{_debug};
my $iteration_index = 0;
my $unimodal_correction_flag;
my $previous_min_value_for_unimodality_quotient;
lib/Algorithm/LinearManifoldDataClusterer.pm view on Meta::CPAN
}
display_clusters(\@newclusters) if $self->{_terminal_output};
@clusters = grep {@$_ != 0} @newclusters;
die "linear manifold based algorithm does not appear to work in this case $!"
unless @clusters == $self->{_KM};
}# end of foreach my $cluster (@clusters) ... loop followed by if clause for empty clusters
if ($self->{_data_dimensions} == 3) {
$visualization_msg = "clustering_at_iteration_$iteration_index";
$self->visualize_clusters_on_sphere($visualization_msg, \@clusters)
if $self->{_visualize_each_iteration};
$self->visualize_clusters_on_sphere($visualization_msg, \@clusters, "png")
if $self->{_make_png_for_each_iteration};
}
my @cluster_unimodality_quotients = map {$self->cluster_unimodality_quotient($clusters[$_],
$reference_vecs_for_all_subspaces[$_])} 0..@clusters-1;
my $min_value_for_unimodality_quotient = List::Util::min @cluster_unimodality_quotients;
print "\nCluster unimodality quotients: @cluster_unimodality_quotients\n" if $self->{_terminal_output};
die "\n\nBailing out!\n" .
"It does not look like these iterations will lead to a good clustering result.\n" .
"Program terminating. Try running again.\n"
if defined($previous_min_value_for_unimodality_quotient)
&& ($min_value_for_unimodality_quotient < 0.4)
lib/Algorithm/LinearManifoldDataClusterer.pm view on Meta::CPAN
grep {$_} map {exists $data_clustered{$_} ? undef : $_} @{$self->{_data_tags}};
if ($self->{_terminal_output}) {
print "\n\nNot all data clustered. The most reliable clusters found by graph partitioning:\n";
display_clusters(\@final_clusters);
print "\n\nData not yet clustered:\n\n@data_tags_not_clustered\n";
}
if ($self->{_data_dimensions} == 3) {
$visualization_msg = "$self->{_K}_best_clusters_produced_by_graph_partitioning";
$self->visualize_clusters_on_sphere($visualization_msg, \@final_clusters)
if $self->{_visualize_each_iteration};
$self->visualize_clusters_on_sphere($visualization_msg, \@final_clusters, "png")
if $self->{_make_png_for_each_iteration};
}
my %data_tags_to_cluster_label_hash;
foreach my $i (0..@final_clusters-1) {
map {$data_tags_to_cluster_label_hash{$_} = $i} @{$final_clusters[$i]};
}
$self->{_data_tags_to_cluster_label_hash} = \%data_tags_to_cluster_label_hash;
foreach my $tag (@data_tags_not_clustered) {
my $which_cluster = $self->which_cluster_for_new_element($tag);
$self->{_data_tags_to_cluster_label_hash}->{$tag} = $which_cluster;
}
lib/Algorithm/LinearManifoldDataClusterer.pm view on Meta::CPAN
$plot->gnuplot_cmd( "set hidden3d" ) unless $self->{_show_hidden_in_3D_plots};
$plot->gnuplot_cmd( "set title \"$visualization_msg\"" );
$plot->gnuplot_cmd( "set noclip" );
$plot->gnuplot_cmd( "set pointsize 2" );
$plot->gnuplot_cmd( "set parametric" );
$plot->gnuplot_cmd( "set size ratio 1" );
$plot->gnuplot_cmd( "set xlabel \"X\"" );
$plot->gnuplot_cmd( "set ylabel \"Y\"" );
$plot->gnuplot_cmd( "set zlabel \"Z\"" );
if ($hardcopy_format) {
$plot->gnuplot_cmd( "set terminal png" );
my $image_file_name = "$visualization_msg\.$hardcopy_format";
$plot->gnuplot_cmd( "set output \"$image_file_name\"" );
$plot->gnuplot_cmd( "unset hidden3d" );
}
# set the range for azimuth angles:
$plot->gnuplot_cmd( "set urange [0:2*pi]" );
# set the range for the elevation angles:
$plot->gnuplot_cmd( "set vrange [-pi/2:pi/2]" );
# Parametric functions for the sphere
# $plot->gnuplot_cmd( "r=1" );
lib/Algorithm/LinearManifoldDataClusterer.pm view on Meta::CPAN
my @legal_params = qw / datafile
mask
K
P
terminal_output
cluster_search_multiplier
max_iterations
delta_reconstruction_error
visualize_each_iteration
show_hidden_in_3D_plots
make_png_for_each_iteration
debug
/;
my $found_match_flag;
foreach my $param (@params) {
foreach my $legal (@legal_params) {
$found_match_flag = 0;
if ($param eq $legal) {
$found_match_flag = 1;
last;
}
lib/Algorithm/LinearManifoldDataClusterer.pm view on Meta::CPAN
datafile => $datafile,
mask => $mask,
K => 3,
P => 2,
max_iterations => 15,
cluster_search_multiplier => 2,
delta_reconstruction_error => 0.001,
terminal_output => 1,
visualize_each_iteration => 1,
show_hidden_in_3D_plots => 1,
make_png_for_each_iteration => 1,
);
# where the parameter K specifies the number of clusters you expect to find in
# your data and the parameter P is the dimensionality of the manifold on which the
# data resides. The parameter cluster_search_multiplier is for increasing the
# odds that the random seeds chosen initially for clustering will populate all the
# clusters. Set this parameter to a low number like 2 or 3. The parameter
# max_iterations places a hard limit on the number of iterations that the
# algorithm is allowed. The actual number of iterations is controlled by the
# parameter delta_reconstruction_error. The iterations stop when the change in
lib/Algorithm/LinearManifoldDataClusterer.pm view on Meta::CPAN
mask => $mask,
K => $K,
P => $P,
cluster_search_multiplier => $C,
max_iterations => $max_iter,
delta_reconstruction_error => 0.001,
terminal_output => 1,
write_clusters_to_files => 1,
visualize_each_iteration => 1,
show_hidden_in_3D_plots => 1,
make_png_for_each_iteration => 1,
);
A call to C<new()> constructs a new instance of the
C<Algorithm::LinearManifoldDataClusterer> class.
=back
=head2 Constructor Parameters
=over 8
lib/Algorithm/LinearManifoldDataClusterer.pm view on Meta::CPAN
=item C<show_hidden_in_3D_plots>:
This parameter is important for controlling the visualization of the clusters on the
surface of a sphere. If the clusters are too spread out, seeing all of the clusters
all at once can be visually confusing. When you set this parameter, the clusters on
the back side of the sphere will not be visible. Note that no matter how you set
this parameter, you can interact with the 3D plot of the data and rotate it with your
mouse pointer to see all of the data that is output by the clustering code.
=item C<make_png_for_each_iteration>:
If you set this option to 1, the module will output a Gnuplot in the form of a PNG
image for each iteration in Phase 1 of the algorithm. In Phase 2, the module will
output the clustering result produced by the graph partitioning algorithm.
=back
=over
=item B<get_data_from_csv()>:
lib/Algorithm/LinearManifoldDataClusterer.pm view on Meta::CPAN
Before the clusters are written to these files, the module destroys all files with
such names in the directory in which you call the module.
=item B<visualize_clusters_on_sphere()>:
$clusterer->visualize_clusters_on_sphere("final clustering", $clusters);
or
$clusterer->visualize_clusters_on_sphere("final_clustering", $clusters, "png");
If your data is 3-dimensional and it resides on the surface of a sphere (or in the
vicinity of such a surface), you may be able to use these methods for the
visualization of the clusters produced by the algorithm. The first invocation
produces a Gnuplot in a terminal window that you can rotate with your mouse pointer.
The second invocation produces a `.png' image of the plot.
=item B<auto_retry_clusterer()>:
$clusterer->auto_retry_clusterer();
or
my $clusters = $clusterer->auto_retry_clusterer();
As mentioned earlier, the module is programmed in such a way that it is more likely
lib/Algorithm/LinearManifoldDataClusterer.pm view on Meta::CPAN
For visualizing the data generated by the above script:
data_visualizer.pl
For cleaning up the examples directory:
cleanup_directory.pl
Invoking the C<cleanup_directory.pl> script will get rid of all the PNG image files
that are generated by the module when you run it with the constructor option
C<make_png_for_each_iteration> set to 1.
=head1 EXPORT
None by design.
=head1 CAVEATS
The performance of the algorithm depends much on the values you choose for the
constructor parameters. And, even for the best choices for the parameters, the
algorithm is not theoretically guaranteed to return the best results.
my $clusterer = Algorithm::LinearManifoldDataClusterer->new(
datafile => $datafile,
mask => "N111",
K => 3,
P => 2,
max_iterations => 1,
cluster_search_multiplier => 1,
terminal_output => 0,
visualize_each_iteration => 0,
show_hidden_in_3D_plots => 0,
make_png_for_each_iteration => 0,
);
$clusterer->get_data_from_csv();
my $clusters = $clusterer->auto_retry_clusterer();
ok( @$clusters == 3, 'Clustering works' );
# Test 3 (Data Visualization)
eval {
my $pause_time = 1;
$clusterer->visualize_clusters_on_sphere("", $clusters, "", $pause_time);