PDL-Cluster

 view release on metacpan or  search on metacpan

GENERATED/PDL/Cluster.pm  view on Meta::CPAN




#line 951 "/usr/lib/x86_64-linux-gnu/perl5/5.36/PDL/PP.pm"

*clusterdistance = \&PDL::clusterdistance;
#line 362 "Cluster.pm"



#line 949 "/usr/lib/x86_64-linux-gnu/perl5/5.36/PDL/PP.pm"



=head2 distancematrix

=for sig

  Signature: (
   double data(d,n);
   int    mask(d,n);
   double weight(d);
   double [o]dists(n,n);
   ; char *distFlag;
)

=for ref

Compute triangular distance matrix over all data points.

=for bad

distancematrix does not process bad values.
It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays.


=cut
#line 393 "Cluster.pm"



#line 951 "/usr/lib/x86_64-linux-gnu/perl5/5.36/PDL/PP.pm"

*distancematrix = \&PDL::distancematrix;
#line 400 "Cluster.pm"



#line 949 "/usr/lib/x86_64-linux-gnu/perl5/5.36/PDL/PP.pm"



=head2 getclustercentroids

=for sig

  Signature: (
   double data(d,n);
   int    mask(d,n);
   int    clusterids(n);
   double [o]cdata(d,k);
   int    [o]cmask(d,k);
   ; char *ctrMethodFlag;
)

=for ref

Find cluster centroids by arithmetic mean (C<ctrMethodFlag="a">) or median over each dimension (C<ctrMethodFlag="m">).

=for bad

getclustercentroids does not process bad values.
It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays.


=cut
#line 432 "Cluster.pm"



#line 951 "/usr/lib/x86_64-linux-gnu/perl5/5.36/PDL/PP.pm"

*getclustercentroids = \&PDL::getclustercentroids;
#line 439 "Cluster.pm"



#line 589 "Cluster.pd"


=pod

=head2 getclustermean

=for sig

  Signature: (
   double data(d,n);
   int    mask(d,n);
   int    clusterids(n);
   double [o]cdata(d,k);
   int    [o]cmask(d,k);
   )

Really just a wrapper for getclustercentroids(...,"a").

=cut

sub getclustermean {
  my ($data,$mask,$cids,$cdata,$cmask) = @_;
  return getclustercentroids($dat,$mask,$cids,$cdata,$cmask,'a');
}
#line 468 "Cluster.pm"



#line 620 "Cluster.pd"


=pod

=head2 getclustermedian

=for sig

  Signature: (
   double data(d,n);
   int    mask(d,n);
   int    clusterids(n);
   double [o]cdata(d,k);
   int    [o]cmask(d,k);
   )

Really just a wrapper for getclustercentroids(...,"m").

=cut

sub getclustermedian {
  my ($data,$mask,$cids,$cdata,$cmask) = @_;
  return getclustercentroids($dat,$mask,$cids,$cdata,$cmask,'m');
}
#line 497 "Cluster.pm"



#line 949 "/usr/lib/x86_64-linux-gnu/perl5/5.36/PDL/PP.pm"



=head2 getclustermedoids

=for sig

  Signature: (
   double distance(n,n);
   int    clusterids(n);
   int    [o]centroids(k);
   double [o]errors(k);
   )

The getclustermedoid routine calculates the cluster centroids, given to which
cluster each element belongs. The centroid is defined as the element with the
smallest sum of distances to the other elements.


=for bad

getclustermedoids does not process bad values.
It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays.


=cut
#line 528 "Cluster.pm"



#line 951 "/usr/lib/x86_64-linux-gnu/perl5/5.36/PDL/PP.pm"

*getclustermedoids = \&PDL::getclustermedoids;
#line 535 "Cluster.pm"



#line 949 "/usr/lib/x86_64-linux-gnu/perl5/5.36/PDL/PP.pm"



=head2 kcluster

=for sig

  Signature: (
   int    nclusters();
   double data(d,n);
   int    mask(d,n);
   double weight(d);
   int    npass();
   int    [o]clusterids(n);
   double [o]error();
   int    [o]nfound();

GENERATED/PDL/Cluster.pm  view on Meta::CPAN

=head2 clusterdistancesenc

=for sig

  Signature: (
   double data(d,n);
   int    mask(d,n);
   double weight(d);
   int    coffsets1(k1);
   int    crowids1(nc1);
   int    cwhich1(ncmps);
   int    coffsets2(k2);
   int    crowids2(nc2);
   int    cwhich2(ncmps);
   double [o]dists(ncmps);
   ; 
   char *distFlag;
   char *methodFlag;
   )


Computes cluster-distance between selected pairs of co-indexed clusters in ($cwhich1,$cwhich2).
Cluster contents are passed as pairs ($coffsetsX(),$crowidsX()) as returned
by the clusteroffsets() function.

$distFlag and $methodFlag are interpreted as for clusterdistance().

See also clusterenc(), clusterdistancematrixenc().


=for bad

clusterdistancesenc does not process bad values.
It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays.


=cut
#line 1352 "Cluster.pm"



#line 951 "/usr/lib/x86_64-linux-gnu/perl5/5.36/PDL/PP.pm"

*clusterdistancesenc = \&PDL::clusterdistancesenc;
#line 1359 "Cluster.pm"



#line 949 "/usr/lib/x86_64-linux-gnu/perl5/5.36/PDL/PP.pm"



=head2 getclusterwsum

=for sig

  Signature: (
   double data(d,n);
   int    mask(d,n);
   double clusterwts(k,n);
   double [o]cdata(d,k);
   int    [o]cmask(d,k);
   )


Find cluster centroids by weighted sum.  This can be considered an
expensive generalization of the getclustermean() and getclustermedian()
functions.  Here, the input PDLs $data() and $mask(), as well as the
output PDL $cdata() are as for getclustermean().  The matrix $clusterwts()
determines the relative weight of each data row in determining the
centroid of each cluster, potentially useful for "fuzzy" clustering.
The equation used to compute cluster means is:

 $cdata(d,k) = sum_{n} $clusterwts(k,n) * $data(d,n) * $mask(d,n)

For centroids in the same range as data elements, $clusterwts()
should sum to 1 over each column (k):

 all($clusterwts->xchg(0,1)->sumover == 1)

getclustermean() can be simulated by instantiating $clusterwts() with
a uniform distribution over cluster elements:

 $clusterwts = zeroes($k,$n);
 $clusterwts->indexND(cat($clusterids, xvals($clusterids))->xchg(0,1)) .= 1;
 $clusterwts /= $clusterwts->xchg(0,1)->sumover;
 getclusterwsum($data,$mask, $clusterwts, $cdata=zeroes($d,$k));

Similarly, getclustermedian() can be simulated by setting $clusterwts() to
1 for cluster medians and otherwise to 0.  More sophisticated centroid
discovery methods can be computed by this function by setting
$clusterwts(k,n) to some estimate of the conditional probability
of the datum at row $n given the cluster with index $k:
p(Elt==n|Cluster==k).  One
way to achieve such an estimate is to use (normalized inverses of) the
singleton-row-to-cluster distances as output by clusterdistancematrix().



=for bad

getclusterwsum does not process bad values.
It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays.


=cut
#line 1421 "Cluster.pm"



#line 951 "/usr/lib/x86_64-linux-gnu/perl5/5.36/PDL/PP.pm"

*getclusterwsum = \&PDL::getclusterwsum;
#line 1428 "Cluster.pm"



#line 949 "/usr/lib/x86_64-linux-gnu/perl5/5.36/PDL/PP.pm"



=head2 attachtonearest

=for sig

  Signature: (
   double data(d,n);
   int    mask(d,n);
   double weight(d);
   int    rowids(nr);
   double cdata(d,k);
   int    cmask(d,k);
   int    [o]clusterids(nr);
   double [o]cdist(nr);
   ; 
   char *distFlag;
   char *methodFlag;
   )


Assigns each specified data row to the nearest cluster centroid.
Data elements are given by $data() and $mask(), feature weights are
given by $weight(), as usual.  Cluster centroids are defined by
by $cdata() and $cmask(), and the indices of rows to be attached
are given in the vector $rowids().  The output vector $clusterids()
contains for each specified row index the identifier of the nearest
cluster centroid.  The vector $cdist() contains the distance to
the best clusters.

See also: clusterdistancematrix(), attachtonearestd().


=for bad

attachtonearest does not process bad values.
It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays.


=cut
#line 1474 "Cluster.pm"



#line 951 "/usr/lib/x86_64-linux-gnu/perl5/5.36/PDL/PP.pm"

*attachtonearest = \&PDL::attachtonearest;
#line 1481 "Cluster.pm"



#line 1659 "Cluster.pd"


=pod

=head2 attachtonearestd

=for sig

  Signature: (
   double cdistmat(k,n);
   int rowids(nr);
   int [o]clusterids(nr);
   double [o]dists(nr);
   )

Assigns each specified data row to the nearest cluster centroid,
as for attachtonearest(), given the datum-to-cluster distance
matrix $cdistmat().  Currently just a wrapper for a few PDL calls.
In scalar context returns $clusterids(), in list context returns
the list ($clusterids(),$dists()).

=cut

sub attachtonearestd {
  my ($cdm,$rowids,$cids,$dists)=@_;
  $cids = zeroes(long, $rowids->dim(0))    if (!defined($cids));
  $dists = zeroes(double, $rowids->dim(0)) if (!defined($dists));

  ##-- dice matrix
  my $cdmr   = $cdm->dice_axis(1,$rowids);

  ##-- get best
  $cdmr->minimum_ind($cids);



( run in 2.549 seconds using v1.01-cache-2.11-cpan-f56aa216473 )