Algorithm-ClusterPoints
view release on metacpan or search on metacpan
lib/Algorithm/ClusterPoints.pm view on Meta::CPAN
inside the arrays.
=back
=head2 Using hypercylindrical distances
By default distances between points are meassured as euclidean
distances. That means that two points A and B form a cluster when B is
inside the hypersphere of radius $radius and center A. We will call
this hypersphere the clustering limit surface for point A.
Sometimes, specially when the dimensions represent unrelated entities,
it is desirable to use hypercylinders as the clustering limit surfaces.
For instance, suppose we have a set of three dimensional points ($x,
$y, $t) where the first two dimensions represent coordinates over a
geometrical plane and the third coordinate represents time.
It doesn't make sense to mix space and time to calculate a unique
distance, and so to have a spherical clustering limit surface. What we
need is to set independent limits for geometrical and temporal
dimensions, for instance C<$geo_distance < $geo_radius> and
C<$temp_distance < $temp_radius> and these pair of constraints define
a cylinder on our three-dimensional problem space.
In the general multidimensional case, instead of cylinders, we talk
about hypercylinders but the logic behind is the same, dimensions are
divided in several groups (d-groups) following some problem defined
relation and two points form a cluster when all the subdistances are
smaller than the radius (where subdistance is the euclidean distance
considering only the dimensions in a d-group). Note that every d-group
defines a hypercylinder base.
The method that allows to define the hypercylindrical shape is as
follows:
=over 4
=item $clp->dimensional_groups(\@group0, \@group1, ...)
where @group0, @group1, ... are lists of dimension indexes.
For instance, for a three dimensional problem with dimensions X, Y and
T (in that order), to form a group with the dimensions X and Y and
another with the dimension T, the following call has to be used:
$clp->dimensional_groups([0, 1], [2]);
=back
The dimensional groups can also be set when the constructor is called:
my $clp = Algoritm::ClusterPoints->new(
dimensional_groups => [[0, 1], [2]],
...);
Usually, when using dimensional groups, you would also want to use the
C<scales> method to set different scales for every dimension group.
Following with the previous example, supposing X and Y are given in
meters and T in seconds, to find the clusters with radius between
points of 1Km and 2 days, the following scales should be used:
my $spc_scl = 1/1000;
my $tmp_scl = 1/(2 * 24 * 60 * 60);
$clp = Algorithm::ClusterPoints->new(
dimensional_groups => [[0, 1], [2]],
scales => [$spc_scl, $spc_scl, $tmp_scl],
...);
=head1 SEE ALSO
All began on this PerlMonks discussion:
L<http://perlmonks.org/?node_id=694892>.
L<Algorithm::Cluster> is a Perl wrapper for the C Clustering Library.
=head1 COPYRIGHT AND LICENSE
Copyright (C) 2008 by Salvador FandiE<ntilde>o (sfandino@yahoo.com)
This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself, either Perl version 5.8.8 or,
at your option, any later version of Perl 5 you may have available.
=cut
( run in 0.451 second using v1.01-cache-2.11-cpan-39bf76dae61 )