Algorithm-Cluster
view release on metacpan or search on metacpan
perl/Record.pm view on Meta::CPAN
}
$self->{geneid} = [];
$self->{data} = [];
$self->{mask} = [];
my $needmask = 0;
while ($line = <$handle>) {
my $count = ($line =~ tr/\t//);
@words = split(/\t/, $line);
chomp @words;
scalar @words == $n or die "Line with " . scalar @words . " columns found (expected $n): $!";
my $start = 0;
for my $key (keys %cols) {
if ($key > $start) {
$start = $key;
}
}
if ($words[0] eq 'EWEIGHT') {
@{$self->{eweight}} = @words[$start+1..$n-1];
}
elsif ($words[0] eq 'EORDER') {
@{$self->{eorder}} = @words[$start+1..$n-1];
}
else {
my @rowdata = ();
my @rowmask = ();
for ($i = 0; $i < $n; $i++) {
my $word = $words[$i];
if (defined $cols{$i}) {
if ($cols{$i} eq 'GENEID') {
push(@{$self->{geneid}}, $word);
}
src/cluster.c view on Meta::CPAN
kcluster(int nclusters, int nrows, int ncolumns, double** data, int** mask,
double weight[], int transpose, int npass, char method, char dist,
int clusterid[], double* error, int* ifound)
/*
Purpose
=======
The kcluster routine performs k-means or k-median clustering on a given set of
elements, using the specified distance measure. The number of clusters is given
by the user. Multiple passes are being made to find the optimal clustering
solution, each time starting from a different initial clustering.
Arguments
=========
nclusters (input) int
The number of clusters to be found.
data (input) double[nrows][ncolumns]
The array containing the data of the elements to be clustered (i.e., the gene
src/cluster.c view on Meta::CPAN
The weights that are used to calculate the distance. This is equivalent
to including the jth data point weight[j] times in the calculation. The
weights can be non-integer.
transpose (input) int
If transpose == 0, the rows of the matrix are clustered. Otherwise, columns
of the matrix are clustered.
npass (input) int
The number of times clustering is performed. Clustering is performed npass
times, each time starting from a different (random) initial assignment of
genes to clusters. The clustering solution with the lowest within-cluster sum
of distances is chosen.
If npass == 0, then the clustering algorithm will be run once, where the
initial assignment of elements to clusters is taken from the clusterid array.
method (input) char
Defines whether the arithmetic mean (method == 'a') or the median
(method == 'm') is used to calculate the cluster center.
dist (input) char
src/cluster.c view on Meta::CPAN
dist == 'u': uncentered correlation
dist == 'x': absolute uncentered correlation
dist == 's': Spearman's rank correlation
dist == 'k': Kendall's tau
For other values of dist, the default (Euclidean distance) is used.
clusterid (output; input) int[nrows] if transpose == 0
int[ncolumns] otherwise
The cluster number to which a gene or microarray was assigned. If npass == 0,
then on input clusterid contains the initial clustering assignment from which
the clustering algorithm starts. On output, it contains the clustering solution
that was found.
error (output) double*
The sum of distances to the cluster center of each item in the optimal k-means
clustering solution that was found.
ifound (output) int*
The number of times the optimal clustering solution was
found. The value of ifound is at least 1; its maximum value is npass. If the
number of clusters is larger than the number of elements being clustered,
src/cluster.c view on Meta::CPAN
void
kmedoids(int nclusters, int nelements, double** distmatrix, int npass,
int clusterid[], double* error, int* ifound)
/*
Purpose
=======
The kmedoids routine performs k-medoids clustering on a given set of elements,
using the distance matrix and the number of clusters passed by the user.
Multiple passes are being made to find the optimal clustering solution, each
time starting from a different initial clustering.
Arguments
=========
nclusters (input) int
The number of clusters to be found.
nelements (input) int
The number of elements to be clustered.
distmatrix (input) double array, ragged
(number of rows is nelements, number of columns is equal to the row number)
The distance matrix. To save space, the distance matrix is given in the
form of a ragged array. The distance matrix is symmetric and has zeros
on the diagonal. See distancematrix for a description of the content.
npass (input) int
The number of times clustering is performed. Clustering is performed npass
times, each time starting from a different (random) initial assignment of genes
to clusters. The clustering solution with the lowest within-cluster sum of
distances is chosen.
If npass == 0, then the clustering algorithm will be run once, where the
initial assignment of elements to clusters is taken from the clusterid array.
clusterid (output; input) int[nelements]
On input, if npass == 0, then clusterid contains the initial clustering
assignment from which the clustering algorithm starts; all numbers in clusterid
should be between zero and nelements-1 inclusive. If npass != 0, clusterid is
ignored on input.
On output, clusterid contains the clustering solution that was found: clusterid
contains the number of the cluster to which each item was assigned. On output,
the number of a cluster is defined as the item number of the centroid of the
cluster.
error (output) double
The sum of distances to the cluster center of each item in the optimal
k-medoids clustering solution that was found.
( run in 0.240 second using v1.01-cache-2.11-cpan-0d8aa00de5b )