Algorithm-LinearManifoldDataClusterer
view release on metacpan or search on metacpan
examples/cleanup_directory.pl view on Meta::CPAN
# creates new cluster files, it automatically deletes all
# previously created such files. Such files are named
# ClusterX.dat for X starting with X = 0. The files
# __temp_* are created by the visualization script. Those
# files will also be deleted automatically if the program
# terminates properly.
# Should these files not be deleted automatically, you can
# execute this script for cleaning up the directory.
unlink glob "cluster*.txt";
unlink glob "__temp_*";
unlink glob "clustering*at*iteration*.png";
unlink "initial_clusters.png";
unlink "final_clustering.png";
unlink glob "*produced_by_graph_partitioning.png";
lib/Algorithm/LinearManifoldDataClusterer.pm view on Meta::CPAN
$squared_sum += ($ele1[$i] - $ele2[$i])**2;
}
my $dist = sqrt $squared_sum;
return $dist;
}
sub write_clusters_to_files {
my $self = shift;
my $clusters = shift;
my @clusters = @$clusters;
unlink glob "cluster*.txt";
foreach my $i (0..@clusters-1) {
my $filename = "cluster" . $i . ".txt";
print "\nWriting cluster $i to file $filename\n" if $self->{_terminal_output};
open FILEHANDLE, "| sort > $filename" or die "Unable to open file: $!";
foreach my $ele (@{$clusters[$i]}) {
print FILEHANDLE "$ele ";
}
close FILEHANDLE;
}
}
lib/Algorithm/LinearManifoldDataClusterer.pm view on Meta::CPAN
That brings us to the problem of how to actually discover the best set of hyperplane
approximations to the data. What is probably the most popular algorithm today for
that purpose is based on the following key idea: Given a set of subspaces to which a
data element can be assigned, you assign it to that subspace for which the
B<reconstruction error> is the least. But what do we mean by a B<subspace> and what
is B<reconstruction error>?
To understand the notions of B<subspace> and B<reconstruction-error>, let's revisit
the traditional approach of dimensionality reduction by the PCA algorithm. The PCA
algorithm consists of: (1) Subtracting from each data element the global mean of the
data; (2) Calculating the covariance matrix of the data; (3) Carrying out an
eigendecomposition of the covariance matrix and ordering the eigenvectors according
to decreasing values of the corresponding eigenvalues; (4) Forming a B<subspace> by
discarding the trailing eigenvectors whose corresponding eigenvalues are relatively
small; and, finally, (5) projecting all the data elements into the subspace so
formed. The error incurred in representing a data element by its projection into the
subspace is known as the B<reconstruction error>. This error is the projection of
the data element into the space spanned by the discarded trailing eigenvectors.
I<In linear-manifold based machine learning, instead of constructing a single
( run in 1.285 second using v1.01-cache-2.11-cpan-49f99fa48dc )