Algorithm-DBSCAN

 view release on metacpan or  search on metacpan

lib/Algorithm/DBSCAN.pm  view on Meta::CPAN

package Algorithm::DBSCAN;

use strict;
use warnings;
use 5.10.1;

use Data::Dumper;

use Algorithm::DBSCAN::Point;
use Algorithm::DBSCAN::Dataset;

=head1 NAME

Algorithm::DBSCAN - (ALFA code) Perl implementation of the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm

=cut

our $VERSION = '0.07';

=head1 SYNOPSIS

This module can be used to find clusters of points in a multidimensional space. 
More information can be found on Wikipedia: L<DBSCAN|https://en.wikipedia.org/wiki/DBSCAN>

The simple usage:

    use Algorithm::DBSCAN;
    
    my $points_data_file =     
        'point_1 56.514307478581514 37.146118456702034
        point_2 34.02049221667614 46.024651786417536
        point_3 23.473087508078684 60.62328221968349
        point_4 10.418513808840482 24.59808378533684
        point_5 10.583414831970764 25.902459835735534
        point_6 9.756855426925464 24.062840099892146
        point_7 10.567067873860672 22.32511341184489
        point_8 11.070046359352189 25.91278382647844
        point_9 9.537780590838175 25.000630928726288
        point_10 10.507367338512058 27.637356924097915
        point_11 11.949089580614444 30.67843911922257
        point_12 10.373548645248105 25.699863108892945
        point_13 47.061169019689615 12.482585189174058
        point_14 47.00269836645959 12.04880276389404
        point_15 47.197663384856476 12.899232975457025
        point_16 44.3719178488551 15.41709269630616
        point_17 46.31921200316786 12.556849509965417
        point_18 44.128763621333135 14.657970021594974
        point_19 48.89953587475758 15.183892607591467
        point_20 52.15333345222132 16.354597634497154
        point_21 50.03978361242539 14.85901473647285';

    my $dataset = Algorithm::DBSCAN::Dataset->new();
    my @lines = split(/\n\s+/, $points_data_file);
    foreach my $line (@lines) {
        $dataset->AddPoint(new Algorithm::DBSCAN::Point(split(/\s+/, $line)));
    }

    my $dbscan = Algorithm::DBSCAN->new($dataset, 4 * 4, 2);

    $dbscan->FindClusters();
    $dbscan->PrintClustersShort();
    
If you have huge datasets and want to use multiple CPUs in a optimal way you can build 
the region index with an external tool (will soon be available). En axample of code that 
uses a region index would be as follow.

Given the dataset:

    point_1 56 37
    point_2 34 46
    point_3 23 60
    point_4 10 24
    point_5 10 25
    point_6 9 24
    point_7 10 22
    point_8 11 25
    point_9 9 25
    point_10 10 27
    point_11 11 30
    point_12 10 25
    point_13 47 12
    point_14 47 12
    point_15 47 12
    point_16 44 15
    point_17 46 12
    point_18 44 14
    point_19 48 15
    point_20 52 16
    point_21 50 14

The region index with $eps = 4 x 4 and $min_distance = 2 would look like this:

    0 0
    1 1
    2 2
    3 3 4 5 6 7 8 9 11
    4 3 4 5 6 7 8 9 11
    5 3 4 5 6 7 8 9 11
    10 9 10
    12 12 13 14 16 17 18 20
    11 3 4 5 6 7 8 9 11
    13 12 13 14 16 17 18 20
    14 12 13 14 16 17 18 20



( run in 0.978 second using v1.01-cache-2.11-cpan-39bf76dae61 )