Algorithm-Cluster

 view release on metacpan or  search on metacpan

src/cluster.c  view on Meta::CPAN


ip         (output) int*
A pointer to the integer that is to receive the first index of the pair with
the shortest distance.

jp         (output) int*
A pointer to the integer that is to receive the second index of the pair with
the shortest distance.
*/
{
    int i, j;
    double temp;
    double distance = distmatrix[1][0];

    *ip = 1;
    *jp = 0;
    for (i = 1; i < n; i++) {
        for (j = 0; j < i; j++) {
            temp = distmatrix[i][j];
            if (temp<distance) {
                distance = temp;
                *ip = i;
                *jp = j;
            }
        }
    }
    return distance;
}

/* ********************************************************************* */

static int
svd(int m, int n, double** u, double w[], double** vt)
/*
 * This subroutine is a translation of the Algol procedure svd, described in
 * G.H. Golub and C. Reinsch:
 * "Singular Value Decomposition and Least Squares Solutions",
 * published in
 * Numerische Mathematik 14(5): page 403-420 (1970)
 * and
 * Handbook for Automatic Computation, Volume II: Linear Algebra,
 * (John H. Wilkinson and C. Reinsch; edited by Friedrich L. Bauer and
 * Alston S. Householder): page 134-151 (1971).
 *                                                                  t
 * This subroutine determines the singular value decomposition A=usv  of a
 * real m by n rectangular matrix, where m is greater * than or equal to n.
 * Householder bidiagonalization and a variant of the QR algorithm are used.
 *
 * On input:
 *
 * m  is the number of rows of A (and u).
 *
 * n  is the number of columns of A (and u) and the order of v.
 *
 * u  contains the rectangular input matrix A to be decomposed.
 *
 * On output:
 *
 * the routine returns an integer ierr equal to
 *  0:  to indicate a normal return,
 *  k:  if the k-th singular value has not been determined after 30 iterations,
 * -1:  if memory allocation fails.
 *
 * w  contains the n (non-negative) singular values of a (the
 *    diagonal elements of s). they are unordered.
 *    If an error exit is made, the singular values should be correct for
 *    indices ierr+1, ierr+2, ..., n.
 *
 * u  contains the matrix u (orthogonal column vectors) of the decomposition.
 *    If an error exit is made, the columns of u corresponding to indices of
 *    correct singular values should be correct.
 *                         t
 * vt contains the matrix v (orthogonal) of the decomposition.
 *    If an error exit is made, the columns of v corresponding to indices of
 *    correct singular values should be correct.
 *
 *
 * Questions and comments should be directed to B. S. Garbow,
 * Applied Mathematics division, Argonne National Laboratory
 *
 * Modified to eliminate machep
 *
 * Translated to C by Michiel de Hoon, Human Genome Center,
 * University of Tokyo, for inclusion in the C Clustering Library.
 * This routine is less general than the original svd routine, as
 * it focuses on the singular value decomposition as needed for
 * clustering. In particular,
 *  - We calculate both u and v in all cases
 *  - We pass the input array A via u; this array is subsequently overwritten.
 *  - We allocate for the array rv1, used as a working space, internally in
 *    this routine, instead of passing it as an argument.
 *    If the allocation fails, svd returns -1.
 * 2003.06.05
 */
{
    int i, j, k, i1, k1, l1, its;
    double c, f, h, s, x, y, z;
    int l = 0;
    int ierr = 0;
    double g = 0.0;
    double scale = 0.0;
    double anorm = 0.0;
    double* rv1;

    rv1 = malloc(n*sizeof(double));
    if (!rv1) return -1;
    if (m >= n) {
        /* Householder reduction to bidiagonal form */
        for (i = 0; i < n; i++) {
            l = i + 1;
            rv1[i] = scale * g;
            g = 0.0;
            s = 0.0;
            scale = 0.0;
            for (k = i; k < m; k++) scale += fabs(u[k][i]);
            if (scale != 0.0) {
                for (k = i; k < m; k++) {
                    u[k][i] /= scale;
                    s += u[k][i]*u[k][i];
                }
                f = u[i][i];

src/cluster.c  view on Meta::CPAN

            clusterid[i][1] = iybest;
        }
        free(celldatavector);
        for (i = 0; i < nrows; i++) free(dummymask[i]);
        free(dummymask);
    }
}

/* ******************************************************************* */

void
somcluster(int nrows, int ncolumns, double** data, int** mask,
    const double weight[], int transpose, int nxgrid, int nygrid,
    double inittau, int niter, char dist, double*** celldata,
    int clusterid[][2])
/*

Purpose
=======

The somcluster routine implements a self-organizing map (Kohonen) on a
rectangular grid, using a given set of vectors. The distance measure to be
used to find the similarity between genes and nodes is given by dist.

Arguments
=========

nrows     (input) int
The number of rows in the data matrix, equal to the number of genes.

ncolumns  (input) int
The number of columns in the data matrix, equal to the number of samples.

data      (input) double[nrows][ncolumns]
The array containing the gene expression data.

mask      (input) int[nrows][ncolumns]
This array shows which data values are missing. If
mask[i][j] == 0, then data[i][j] is missing.

weight    (input) double[ncolumns] if transpose == 0;
                  double[nrows]    otherwise
The weights that are used to calculate the distance. This is equivalent
to including the jth data point weight[j] times in the calculation. The
weights can be non-integer.

transpose (input) int
If transpose == 0, the rows (genes) of the matrix are clustered. Otherwise,
columns (samples) of the matrix are clustered.

nxgrid    (input) int
The number of grid cells horizontally in the rectangular topology of clusters.

nygrid    (input) int
The number of grid cells horizontally in the rectangular topology of clusters.

inittau   (input) double
The initial value of tau, representing the neighborhood function.

niter     (input) int
The number of iterations to be performed.

dist      (input) char
Defines which distance measure is used, as given by the table:
dist == 'e': Euclidean distance
dist == 'b': City-block distance
dist == 'c': correlation
dist == 'a': absolute value of the correlation
dist == 'u': uncentered correlation
dist == 'x': absolute uncentered correlation
dist == 's': Spearman's rank correlation
dist == 'k': Kendall's tau
For other values of dist, the default (Euclidean distance) is used.

celldata  (output) double[nxgrid][nygrid][ncolumns] if transpose == 0;
                   double[nxgrid][nygrid][nrows]    otherwise
The gene expression data for each node (cell) in the 2D grid. This can be
interpreted as the centroid for the cluster corresponding to that cell. If
celldata is NULL, then the centroids are not returned. If celldata is not
NULL, enough space should be allocated to store the centroid data before
calling somcluster.

clusterid (output) int[nrows][2]        if transpose == 0;
                   int[ncolumns][2]     otherwise
For each item (gene or microarray) that is clustered, the coordinates of the
cell in the 2D grid to which the item was assigned. If clusterid is NULL, the
cluster assignments are not returned. If clusterid is not NULL, enough memory
should be allocated to store the clustering information before calling
somcluster.

========================================================================
*/
{
    const int nobjects = (transpose == 0) ? nrows : ncolumns;
    const int ndata = (transpose == 0) ? ncolumns : nrows;
    int i, j;
    const int lcelldata = (celldata == NULL) ? 0 : 1;

    if (nobjects < 2) return;

    if (lcelldata == 0) {
        celldata = malloc(nxgrid*nygrid*ndata*sizeof(double**));
        for (i = 0; i < nxgrid; i++) {
            celldata[i] = malloc(nygrid*ndata*sizeof(double*));
            for (j = 0; j < nygrid; j++)
                celldata[i][j] = malloc(ndata*sizeof(double));
        }
    }

    somworker(nrows, ncolumns, data, mask, weight, transpose, nxgrid, nygrid,
        inittau, celldata, niter, dist);
    if (clusterid)
        somassign(nrows, ncolumns, data, mask, weight, transpose,
            nxgrid, nygrid, celldata, dist, clusterid);
    if (lcelldata == 0) {
        for (i = 0; i < nxgrid; i++)
            for (j = 0; j < nygrid; j++)
                free(celldata[i][j]);
        for (i = 0; i < nxgrid; i++)
            free(celldata[i]);
        free(celldata);



( run in 0.555 second using v1.01-cache-2.11-cpan-96521ef73a4 )