Algorithm-ExpectationMaximization
view release on metacpan or search on metacpan
lib/Algorithm/ExpectationMaximization.pm view on Meta::CPAN
naive_bayes_cluster1.txt
naive_bayes_cluster2.txt
...
# In the same manner, you can write out the posterior probability based possibly
# overlapping clusters to files by calling:
$clusterer->write_posterior_prob_clusters_above_threshold_to_files($theta1);
# where the threshold $theta1 sets the probability threshold for deciding which
# data elements to place in a cluster. These clusters are placed in files with
# names like
posterior_prob_cluster1.txt
posterior_prob_cluster2.txt
...
# CLUSTER VISUALIZATION:
# You must first set the mask for cluster visualization. This mask tells the
lib/Algorithm/ExpectationMaximization.pm view on Meta::CPAN
my $parameter_file = "param1.txt";
my $out_datafile = "mydatafile1.dat";
Algorithm::ExpectationMaximization->cluster_data_generator(
input_parameter_file => $parameter_file,
output_datafile => $out_datafile,
total_number_of_data_points => $N );
# where the value of $N is the total number of data points you would like to see
# generated for all of the Gaussians. How this total number is divided up amongst
# the Gaussians is decided by the prior probabilities for the Gaussian components
# as declared in input parameter file. The synthetic data may be visualized in a
# terminal window and the visualization written out as a PNG image to a diskfile
# by
my $data_visualization_mask = "11";
$clusterer->visualize_data($data_visualization_mask);
$clusterer->plot_hardcopy_data($data_visualization_mask);
=head1 CHANGES
lib/Algorithm/ExpectationMaximization.pm view on Meta::CPAN
c20 0 10.7087017086940 9.63528386251712 10.9512155258108 ...
c7 0 12.8025925026787 10.6126270065785 10.5228482095349 ...
b9 0 7.60118206283120 5.05889245193079 5.82841781759102 ...
....
....
where the first column contains the symbolic ID tag for each
data record and the rest of the columns the numerical
information. As to which columns are actually used for
clustering is decided by the string value of the mask. For
example, if we wanted to cluster on the basis of the entries
in just the 3rd, the 4th, and the 5th columns above, the
mask value would be C<N0111> where the character C<N>
indicates that the ID tag is in the first column, the
character C<0> that the second column is to be ignored, and
the C<1>'s that follow that the 3rd, the 4th, and the 5th
columns are to be used for clustering.
If instead of random seeding, you wish to use the kmeans
based seeding, just replace the option C<random> supplied
lib/Algorithm/ExpectationMaximization.pm view on Meta::CPAN
"@{$posterior_prob_clusters->[$index]}\n\n"
}
=item B<write_posterior_prob_clusters_above_threshold_to_files($theta1):>
$clusterer->write_posterior_prob_clusters_above_threshold_to_files($theta1);
This call writes out the posterior-probability based soft
clusters to disk files. As in the previous method, the
threshold C<$theta1> sets the probability threshold for
deciding which data elements belong to a cluster. These
clusters are placed in files with names like
posterior_prob_cluster1.txt
posterior_prob_cluster2.txt
...
=item B<return_individual_class_distributions_above_given_threshold($theta):>
my $theta2 = 0.00001;
my $class_distributions =
lib/Algorithm/ExpectationMaximization.pm view on Meta::CPAN
the cluster corresponding to that file. The clusters are placed in files with names
like
naive_bayes_cluster1.txt
naive_bayes_cluster2.txt
...
The soft clusters on the other hand are created by calling
C<return_clusters_with_posterior_probs_above_threshold($theta1)>
on an instance of the module, where the argument C<$theta1>
is the threshold for deciding whether a data element belongs
in a soft cluster. The posterior class probability at a
data element must exceed the threshold for the element to
belong to the corresponding cluster. The soft cluster can
be written out to disk files by calling
C<write_posterior_prob_clusters_above_threshold_to_files($theta1)>.
As with the hard clusters, each cluster is placed in a separate
file. The filenames for such clusters look like:
posterior_prob_cluster1.txt
posterior_prob_cluster2.txt
( run in 1.567 second using v1.01-cache-2.11-cpan-de7293f3b23 )