Algorithm-DBSCAN
view release on metacpan or search on metacpan
lib/Algorithm/DBSCAN.pm view on Meta::CPAN
package Algorithm::DBSCAN;
use strict;
use warnings;
use 5.10.1;
use Data::Dumper;
use Algorithm::DBSCAN::Point;
use Algorithm::DBSCAN::Dataset;
=head1 NAME
Algorithm::DBSCAN - (ALFA code) Perl implementation of the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm
=cut
our $VERSION = '0.07';
=head1 SYNOPSIS
This module can be used to find clusters of points in a multidimensional space.
More information can be found on Wikipedia: L<DBSCAN|https://en.wikipedia.org/wiki/DBSCAN>
The simple usage:
use Algorithm::DBSCAN;
my $points_data_file =
'point_1 56.514307478581514 37.146118456702034
point_2 34.02049221667614 46.024651786417536
point_3 23.473087508078684 60.62328221968349
point_4 10.418513808840482 24.59808378533684
point_5 10.583414831970764 25.902459835735534
point_6 9.756855426925464 24.062840099892146
point_7 10.567067873860672 22.32511341184489
point_8 11.070046359352189 25.91278382647844
point_9 9.537780590838175 25.000630928726288
point_10 10.507367338512058 27.637356924097915
point_11 11.949089580614444 30.67843911922257
point_12 10.373548645248105 25.699863108892945
point_13 47.061169019689615 12.482585189174058
point_14 47.00269836645959 12.04880276389404
point_15 47.197663384856476 12.899232975457025
point_16 44.3719178488551 15.41709269630616
point_17 46.31921200316786 12.556849509965417
point_18 44.128763621333135 14.657970021594974
point_19 48.89953587475758 15.183892607591467
point_20 52.15333345222132 16.354597634497154
point_21 50.03978361242539 14.85901473647285';
my $dataset = Algorithm::DBSCAN::Dataset->new();
my @lines = split(/\n\s+/, $points_data_file);
foreach my $line (@lines) {
$dataset->AddPoint(new Algorithm::DBSCAN::Point(split(/\s+/, $line)));
}
my $dbscan = Algorithm::DBSCAN->new($dataset, 4 * 4, 2);
$dbscan->FindClusters();
$dbscan->PrintClustersShort();
If you have huge datasets and want to use multiple CPUs in a optimal way you can build
the region index with an external tool (will soon be available). En axample of code that
uses a region index would be as follow.
Given the dataset:
point_1 56 37
point_2 34 46
point_3 23 60
point_4 10 24
point_5 10 25
point_6 9 24
point_7 10 22
point_8 11 25
point_9 9 25
point_10 10 27
point_11 11 30
point_12 10 25
point_13 47 12
point_14 47 12
point_15 47 12
point_16 44 15
point_17 46 12
point_18 44 14
point_19 48 15
point_20 52 16
point_21 50 14
The region index with $eps = 4 x 4 and $min_distance = 2 would look like this:
0 0
1 1
2 2
3 3 4 5 6 7 8 9 11
4 3 4 5 6 7 8 9 11
5 3 4 5 6 7 8 9 11
10 9 10
12 12 13 14 16 17 18 20
11 3 4 5 6 7 8 9 11
13 12 13 14 16 17 18 20
14 12 13 14 16 17 18 20
( run in 0.978 second using v1.01-cache-2.11-cpan-39bf76dae61 )