Algorithm-PageRank-XS

 view release on metacpan or  search on metacpan

lib/Algorithm/PageRank/XS.pm  view on Meta::CPAN

package Algorithm::PageRank::XS;

use 5.008005;
use strict;
use warnings;
use Carp;

require Exporter;
use AutoLoader;

our $VERSION = '0.04';

require XSLoader;
XSLoader::load('Algorithm::PageRank::XS', $VERSION);

1;

__END__;

=head1 NAME

Algorithm::PageRank::XS - A Fast PageRank implementation

=head1 DESCRIPTION

This module implements a simple PageRank algorithm in C. The goal is
to quickly get a vector that is closed to the eigenvector of the
stochastic matrix of a graph.

L<Algorithm::PageRank> does some pagerank calculations, but it's 
slow and memory intensive. This module was developed to compute pagerank
on graphs with millions of arcs. This module will not, however, scale
up to quadrillions of arcs (see the TODO).

=head1 SYNOPSYS

    use Algorithm::PageRank::XS;

    my $pr = Algorithm::PageRank::XS->new();

    $pr->graph([
              'John'  => 'Joey',
              'John'  => 'James',
              'Joey'  => 'John',
              'James' => 'Joey',
              ]
              );

    $pr->result();
    # {
    #      'James' => '0.569840431213379',
    #      'Joey'  => '1',
    #      'John'  => '0.754877686500549'
    # }



    #
    #
    # The following simple program takes up arcs and prints the ranks.
    use Algorithm::PageRank::XS;

    my $pr = Algorithm::PageRank::XS->new();

    while (<>) {
        chomp;
        my ($from, to) = split(/\t/, $_);
        $pr->add_arc($from, $to);
    }

    my $r = $pr->results();
    while (my ($name, $rank) = each(%{$r})) {
        print "$name,$rank\n";
    }

=head1 METHODS

=head2 new %PARAMS

Create a new PageRank object. Possible parameters:

=over 4

=item alpha

This is (1 - how much people can move from one node to another unconnected one randomly). Decreasing
this number makes convergence more likely, but brings us further from the true eigenvector.

=item max_tries

The maximum number of tries until we give up trying to achieve convergence.

=item convergence

The maximum number the difference between two subsequent vectors must be before we say we are
"convergent enough". The convergence rate is the rate at which C<alpha^t> goes to 0. Thus,
if you set C<alpha> to C<0.85>, and C<convergence> to C<0.000001>, then you will need C<85> tries.

=back

=head2 add_arc

Add an arc to the pagerank object before running the computation.
The actual values don't matter. So you can run:

    $pr->add_arc("Apple", "Orange");

and you mean that C<"Apple"> links to C<"Orange">.


=head2 graph



( run in 1.797 second using v1.01-cache-2.11-cpan-39bf76dae61 )