Algorithm-LinearManifoldDataClusterer

 view release on metacpan or  search on metacpan

MANIFEST  view on Meta::CPAN

examples/cleanup_directory.pl
examples/example1.pl
examples/example2.pl
examples/example3.pl
examples/example4.pl
examples/generate_data_on_a_sphere.pl
examples/3_clusters_on_a_sphere_3000_samples.csv
examples/3_clusters_on_a_sphere_498_samples.csv
examples/4_clusters_on_a_sphere_1000_samples.csv
examples/data_visualizer.pl
examples/example1_initial_clustering.png
examples/example1_clustering_after_3_iterations.png
examples/example1_clustering_after_graph_partitioning.png
examples/example1_final_clustering.png
examples/example2_initial_clustering.png
examples/example2_clustering_after_3_iterations.png
examples/example2_clustering_after_8_iterations.png
examples/example2_clustering_after_graph_partitioning.png
examples/example2_final_clustering.png
examples/example3_initial_clustering.png
examples/example3_clustering_after_3_iterations.png
examples/example3_clustering_after_7_iterations.png
examples/example3_clustering_after_graph_partitioning.png
examples/example3_final_clustering.png
examples/README
lib/Algorithm/LinearManifoldDataClusterer.pm
Makefile.PL
MANIFEST			This list of files
README
t/test.t
META.yml                                 Module YAML meta-data (added by MakeMaker)
META.json                                Module JSON meta-data (added by MakeMaker)

examples/README  view on Meta::CPAN

For visualizing the data generated by the above script:

    data_visualizer.pl

For cleaning up the examples directory:

    cleanup_directory.pl

Invoking the cleanup_directory.pl script will get rid of all the PNG image files that
are generated by the module when you run it with the constructor option
make_png_for_each_iteration set to 1.

examples/cleanup_directory.pl  view on Meta::CPAN

#  files will also be deleted automatically if the program
#  terminates properly.

#  Should these files not be deleted automatically, you can
#  execute this script for cleaning up the directory.

unlink glob "cluster*.txt";

unlink glob "__temp_*";

unlink glob "clustering*at*iteration*.png";

unlink "initial_clusters.png";
unlink "final_clustering.png";
unlink glob "*produced_by_graph_partitioning.png";

examples/example1.pl  view on Meta::CPAN

                                    datafile => $datafile,
                                    mask     => $mask,
                                    K        => 3,     # number of clusters
                                    P        => 2,     # manifold dimensionality
                                    max_iterations => 15,
                                    cluster_search_multiplier => 2,
                                    delta_reconstruction_error => 0.001,
                                    terminal_output => 1,
                                    visualize_each_iteration => 1,
                                    show_hidden_in_3D_plots => 1,
                                    make_png_for_each_iteration => 1,
                );

$clusterer->get_data_from_csv();

my $clusters = $clusterer->linear_manifold_clusterer();

$clusterer->display_reconstruction_errors_as_a_function_of_iterations();

$clusterer->write_clusters_to_files($clusters);

$clusterer->visualize_clusters_on_sphere("final clustering", $clusters);

# Now make a png image file that shows the final clusters:
$clusterer->visualize_clusters_on_sphere("final_clustering", $clusters, "png");

examples/example2.pl  view on Meta::CPAN

                                    datafile => $datafile,
                                    mask     => $mask,
                                    K        => 3,     # number of clusters
                                    P        => 2,     # manifold dimensionality
                                    max_iterations => 15,
                                    cluster_search_multiplier => 2,
                                    delta_reconstruction_error => 0.012,
                                    terminal_output => 1,
                                    visualize_each_iteration => 1,
                                    show_hidden_in_3D_plots => 0,
                                    make_png_for_each_iteration => 1,
                );

$clusterer->get_data_from_csv();

my $clusters = $clusterer->linear_manifold_clusterer();

$clusterer->display_reconstruction_errors_as_a_function_of_iterations();

$clusterer->write_clusters_to_files($clusters);

$clusterer->visualize_clusters_on_sphere("final_clustering", $clusters);

# Now make a png image file that shows the final clusters:
$clusterer->visualize_clusters_on_sphere("final_clustering", $clusters, "png");

examples/example3.pl  view on Meta::CPAN

                                    datafile => $datafile,
                                    mask     => $mask,
                                    K        => 4,     # number of clusters
                                    P        => 2,     # manifold dimensionality
                                    cluster_search_multiplier => 2,
                                    max_iterations => 15,
                                    delta_reconstruction_error => 0.002,
                                    terminal_output => 1,
                                    visualize_each_iteration => 1,
                                    show_hidden_in_3D_plots => 1,
                                    make_png_for_each_iteration => 1,
                );

$clusterer->get_data_from_csv();

my $clusters = $clusterer->linear_manifold_clusterer();

$clusterer->display_reconstruction_errors_as_a_function_of_iterations();

$clusterer->write_clusters_to_files($clusters);

$clusterer->visualize_clusters_on_sphere("final clustering", $clusters);

# Now make a png image file that shows the final clusters:
$clusterer->visualize_clusters_on_sphere("final_clustering", $clusters, "png");

examples/example4.pl  view on Meta::CPAN

                                    datafile => $datafile,
                                    mask     => $mask,
                                    K        => 3,     # number of clusters
                                    P        => 2,     # manifold dimensionality
                                    max_iterations => 15,
                                    cluster_search_multiplier => 1,
                                    delta_reconstruction_error => 0.001,
                                    terminal_output => 1,
                                    visualize_each_iteration => 1,
                                    show_hidden_in_3D_plots => 1,
                                    make_png_for_each_iteration => 1,
                );

$clusterer->get_data_from_csv();

my $clusters = $clusterer->auto_retry_clusterer();

$clusterer->display_reconstruction_errors_as_a_function_of_iterations();

$clusterer->write_clusters_to_files($clusters);

$clusterer->visualize_clusters_on_sphere("final clustering", $clusters);

# Now make a png image file that shows the final clusters:
$clusterer->visualize_clusters_on_sphere("final_clustering", $clusters, "png");

lib/Algorithm/LinearManifoldDataClusterer.pm  view on Meta::CPAN

        _mask                         =>   $args{mask}     || croak("mask required"),
        _K                            =>   $args{K}        || 0,
        _P                            =>   $args{P}        || 0,
        _terminal_output              =>   $args{terminal_output} || 0,
        _max_iterations               =>   $args{max_iterations} || 0,
        _delta_reconstruction_error   =>   $args{delta_reconstruction_error} || 0.001,
        _delta_normalized_error       =>   undef,
        _cluster_search_multiplier    =>   $args{cluster_search_multiplier} || 1,
        _visualize_each_iteration     =>   $args{visualize_each_iteration} == 0 ? 0 : 1,
        _show_hidden_in_3D_plots      =>   $args{show_hidden_in_3D_plots} == 0 ? 0 : 1,
        _make_png_for_each_iteration  =>   $args{make_png_for_each_iteration} == 0 ? 0 : 1,
        _debug                        =>   $args{debug} || 0,
        _N                            =>   0,
        _KM                           =>   $args{K} * $args{cluster_search_multiplier},
        _data_hash                    =>   {},
        _data_tags                    =>   [],
        _data_dimensions              =>   0,
        _final_clusters               =>   [],
        _auto_retry_flag              =>   0,
        _num_iterations_actually_used =>   undef,
        _scale_factor                 =>   undef,

lib/Algorithm/LinearManifoldDataClusterer.pm  view on Meta::CPAN

    if ($self->{_debug}) {
        foreach my $centroid (@initial_cluster_center_coords) {
            print "Initial cluster center coords:  @{$centroid}\n";
        }
    }
    my $initial_clusters = $self->assign_data_to_clusters_initial(\@initial_cluster_center_coords);
    if ($self->{_data_dimensions} == 3) {
        $visualization_msg = "initial_clusters";
        $self->visualize_clusters_on_sphere($visualization_msg, $initial_clusters) 
            if $self->{_visualize_each_iteration};
        $self->visualize_clusters_on_sphere($visualization_msg, $initial_clusters, "png")
            if $self->{_make_png_for_each_iteration};
    }
    foreach my $cluster (@$initial_clusters) {
        my ($mean, $covariance) = $self->estimate_mean_and_covariance($cluster);
        display_mean_and_covariance($mean, $covariance) if $self->{_debug};
    }
    my @clusters = @$initial_clusters;
    display_clusters(\@clusters) if $self->{_debug};
    my $iteration_index = 0;
    my $unimodal_correction_flag;
    my $previous_min_value_for_unimodality_quotient;

lib/Algorithm/LinearManifoldDataClusterer.pm  view on Meta::CPAN

            }
            display_clusters(\@newclusters) if $self->{_terminal_output};
            @clusters = grep {@$_ != 0} @newclusters;
            die "linear manifold based algorithm does not appear to work in this case $!" 
                unless @clusters == $self->{_KM};
        }# end of foreach my $cluster (@clusters) ... loop  followed by if clause for empty clusters
        if ($self->{_data_dimensions} == 3) {
            $visualization_msg = "clustering_at_iteration_$iteration_index";
            $self->visualize_clusters_on_sphere($visualization_msg, \@clusters)
                if $self->{_visualize_each_iteration};
            $self->visualize_clusters_on_sphere($visualization_msg, \@clusters, "png")
                if $self->{_make_png_for_each_iteration};
        }
        my @cluster_unimodality_quotients = map {$self->cluster_unimodality_quotient($clusters[$_], 
                                                      $reference_vecs_for_all_subspaces[$_])} 0..@clusters-1;
        my $min_value_for_unimodality_quotient = List::Util::min @cluster_unimodality_quotients;
        print "\nCluster unimodality quotients: @cluster_unimodality_quotients\n" if $self->{_terminal_output};
        die "\n\nBailing out!\n" .
            "It does not look like these iterations will lead to a good clustering result.\n" .
            "Program terminating.  Try running again.\n" 
            if defined($previous_min_value_for_unimodality_quotient)
               && ($min_value_for_unimodality_quotient < 0.4)

lib/Algorithm/LinearManifoldDataClusterer.pm  view on Meta::CPAN

                   grep {$_} map {exists $data_clustered{$_} ? undef : $_} @{$self->{_data_tags}};
            if ($self->{_terminal_output}) {
                print "\n\nNot all data clustered.  The most reliable clusters found by graph partitioning:\n";
                display_clusters(\@final_clusters);
                print "\n\nData not yet clustered:\n\n@data_tags_not_clustered\n";
            }
            if ($self->{_data_dimensions} == 3) {
                $visualization_msg = "$self->{_K}_best_clusters_produced_by_graph_partitioning";
                $self->visualize_clusters_on_sphere($visualization_msg, \@final_clusters)
                    if $self->{_visualize_each_iteration};
                $self->visualize_clusters_on_sphere($visualization_msg, \@final_clusters, "png")
                    if $self->{_make_png_for_each_iteration};
            }
            my %data_tags_to_cluster_label_hash;
            foreach my $i (0..@final_clusters-1) {
                map {$data_tags_to_cluster_label_hash{$_} = $i} @{$final_clusters[$i]};
            }
            $self->{_data_tags_to_cluster_label_hash} = \%data_tags_to_cluster_label_hash;
            foreach my $tag (@data_tags_not_clustered) {
                my $which_cluster = $self->which_cluster_for_new_element($tag);
                $self->{_data_tags_to_cluster_label_hash}->{$tag} = $which_cluster;
            }

lib/Algorithm/LinearManifoldDataClusterer.pm  view on Meta::CPAN

    $plot->gnuplot_cmd( "set hidden3d" ) unless $self->{_show_hidden_in_3D_plots};
    $plot->gnuplot_cmd( "set title \"$visualization_msg\"" );
    $plot->gnuplot_cmd( "set noclip" );
    $plot->gnuplot_cmd( "set pointsize 2" );
    $plot->gnuplot_cmd( "set parametric" );
    $plot->gnuplot_cmd( "set size ratio 1" );
    $plot->gnuplot_cmd( "set xlabel \"X\"" );
    $plot->gnuplot_cmd( "set ylabel \"Y\"" );
    $plot->gnuplot_cmd( "set zlabel \"Z\"" );
    if ($hardcopy_format) {
        $plot->gnuplot_cmd( "set terminal png" );
        my $image_file_name = "$visualization_msg\.$hardcopy_format";
        $plot->gnuplot_cmd( "set output \"$image_file_name\"" );
        $plot->gnuplot_cmd( "unset hidden3d" );
    }
    # set the range for azimuth angles:
    $plot->gnuplot_cmd( "set urange [0:2*pi]" );
    # set the range for the elevation angles:
    $plot->gnuplot_cmd( "set vrange [-pi/2:pi/2]" );
    # Parametric functions for the sphere
#    $plot->gnuplot_cmd( "r=1" );

lib/Algorithm/LinearManifoldDataClusterer.pm  view on Meta::CPAN

    my @legal_params = qw / datafile
                            mask
                            K
                            P
                            terminal_output
                            cluster_search_multiplier
                            max_iterations
                            delta_reconstruction_error
                            visualize_each_iteration
                            show_hidden_in_3D_plots
                            make_png_for_each_iteration
                            debug
                          /;
    my $found_match_flag;
    foreach my $param (@params) {
        foreach my $legal (@legal_params) {
            $found_match_flag = 0;
            if ($param eq $legal) {
                $found_match_flag = 1;
                last;
            }

lib/Algorithm/LinearManifoldDataClusterer.pm  view on Meta::CPAN

                                    datafile => $datafile,
                                    mask     => $mask,
                                    K        => 3,     
                                    P        => 2,     
                                    max_iterations => 15,
                                    cluster_search_multiplier => 2,
                                    delta_reconstruction_error => 0.001,
                                    terminal_output => 1,
                                    visualize_each_iteration => 1,
                                    show_hidden_in_3D_plots => 1,
                                    make_png_for_each_iteration => 1,
                  );

  #  where the parameter K specifies the number of clusters you expect to find in
  #  your data and the parameter P is the dimensionality of the manifold on which the
  #  data resides.  The parameter cluster_search_multiplier is for increasing the
  #  odds that the random seeds chosen initially for clustering will populate all the
  #  clusters.  Set this parameter to a low number like 2 or 3. The parameter
  #  max_iterations places a hard limit on the number of iterations that the
  #  algorithm is allowed.  The actual number of iterations is controlled by the
  #  parameter delta_reconstruction_error.  The iterations stop when the change in

lib/Algorithm/LinearManifoldDataClusterer.pm  view on Meta::CPAN

                                        mask                        => $mask,
                                        K                           => $K,
                                        P                           => $P,     
                                        cluster_search_multiplier   => $C,
                                        max_iterations              => $max_iter,
                                        delta_reconstruction_error  => 0.001,
                                        terminal_output             => 1,
                                        write_clusters_to_files     => 1,
                                        visualize_each_iteration    => 1,
                                        show_hidden_in_3D_plots     => 1,
                                        make_png_for_each_iteration => 1,
                    );

A call to C<new()> constructs a new instance of the
C<Algorithm::LinearManifoldDataClusterer> class.

=back

=head2 Constructor Parameters

=over 8

lib/Algorithm/LinearManifoldDataClusterer.pm  view on Meta::CPAN


=item C<show_hidden_in_3D_plots>:

This parameter is important for controlling the visualization of the clusters on the
surface of a sphere.  If the clusters are too spread out, seeing all of the clusters
all at once can be visually confusing.  When you set this parameter, the clusters on
the back side of the sphere will not be visible.  Note that no matter how you set
this parameter, you can interact with the 3D plot of the data and rotate it with your
mouse pointer to see all of the data that is output by the clustering code.

=item C<make_png_for_each_iteration>:

If you set this option to 1, the module will output a Gnuplot in the form of a PNG
image for each iteration in Phase 1 of the algorithm.  In Phase 2, the module will
output the clustering result produced by the graph partitioning algorithm.

=back

=over

=item B<get_data_from_csv()>:

lib/Algorithm/LinearManifoldDataClusterer.pm  view on Meta::CPAN


Before the clusters are written to these files, the module destroys all files with
such names in the directory in which you call the module.

=item B<visualize_clusters_on_sphere()>:

    $clusterer->visualize_clusters_on_sphere("final clustering", $clusters);

or

    $clusterer->visualize_clusters_on_sphere("final_clustering", $clusters, "png");

If your data is 3-dimensional and it resides on the surface of a sphere (or in the
vicinity of such a surface), you may be able to use these methods for the
visualization of the clusters produced by the algorithm.  The first invocation
produces a Gnuplot in a terminal window that you can rotate with your mouse pointer.
The second invocation produces a `.png' image of the plot.

=item B<auto_retry_clusterer()>:

    $clusterer->auto_retry_clusterer();

or

    my $clusters = $clusterer->auto_retry_clusterer();

As mentioned earlier, the module is programmed in such a way that it is more likely

lib/Algorithm/LinearManifoldDataClusterer.pm  view on Meta::CPAN

For visualizing the data generated by the above script:

    data_visualizer.pl

For cleaning up the examples directory:

    cleanup_directory.pl

Invoking the C<cleanup_directory.pl> script will get rid of all the PNG image files
that are generated by the module when you run it with the constructor option
C<make_png_for_each_iteration> set to 1.

=head1 EXPORT

None by design.

=head1 CAVEATS

The performance of the algorithm depends much on the values you choose for the
constructor parameters.  And, even for the best choices for the parameters, the
algorithm is not theoretically guaranteed to return the best results.

t/test.t  view on Meta::CPAN

my $clusterer = Algorithm::LinearManifoldDataClusterer->new( 
                                     datafile => $datafile,
                                     mask     => "N111",
                                     K        => 3,
                                     P        => 2,
                                     max_iterations => 1,
                                     cluster_search_multiplier => 1,
                                     terminal_output => 0,
                                     visualize_each_iteration => 0,
                                     show_hidden_in_3D_plots => 0,
                                     make_png_for_each_iteration => 0,
                );
$clusterer->get_data_from_csv();
my $clusters = $clusterer->auto_retry_clusterer();
ok( @$clusters == 3,  'Clustering works' );

# Test 3 (Data Visualization)

eval {
    my $pause_time = 1;
    $clusterer->visualize_clusters_on_sphere("", $clusters, "", $pause_time);



( run in 2.055 seconds using v1.01-cache-2.11-cpan-df04353d9ac )