Algorithm-PageRank-XS
view release on metacpan or search on metacpan
NAME
Algorithm::PageRank::XS - A Fast PageRank implementation
DESCRIPTION
This module implements a simple PageRank algorithm in C. The goal is to
quickly get a vector that is closed to the eigenvector of the stochastic
matrix of a graph.
Algorithm::PageRank does some pagerank calculations, but it's slow and
memory intensive. This module was developed to compute pagerank on
graphs with millions of arcs. This module will not, however, scale up to
quadrillions of arcs (see the TODO).
SYNOPSYS
use Algorithm::PageRank::XS;
my $pr = Algorithm::PageRank::XS->new();
$pr->graph([
'John' => 'Joey',
'John' => 'James',
'Joey' => 'John',
'James' => 'Joey',
]
);
$pr->result();
# {
# 'James' => '0.569840431213379',
# 'Joey' => '1',
# 'John' => '0.754877686500549'
# }
#
#
# The following simple program takes up arcs and prints the ranks.
use Algorithm::PageRank::XS;
my $pr = Algorithm::PageRank::XS->new();
while (<>) {
chomp;
my ($from, to) = split(/\t/, $_);
$pr->add_arc($from, $to);
}
my $r = $pr->results();
while (my ($name, $rank) = each(%{$r})) {
print "$name,$rank\n";
}
METHODS
new %PARAMS
Create a new PageRank object. Possible parameters:
alpha
This is (1 - how much people can move from one node to another
unconnected one randomly). Decreasing this number makes convergence
more likely, but brings us further from the true eigenvector.
max_tries
The maximum number of tries until we give up trying to achieve
convergence.
convergence
The maximum number the difference between two subsequent vectors
must be before we say we are "convergent enough". The convergence
rate is the rate at which "alpha^t" goes to 0. Thus, if you set
"alpha" to 0.85, and "convergence" to 0.000001, then you will need
85 tries.
add_arc
Add an arc to the pagerank object before running the computation. The
actual values don't matter. So you can run:
$pr->add_arc("Apple", "Orange");
and you mean that "Apple" links to "Orange".
graph
Add a graph, which is just an array of from, to combinations. This is
equivalent to calling "add_arc" a bunch of times, but may be more
convenient.
from_file FILE
This will load arcs from a file, whose lines contain:
( run in 1.255 second using v1.01-cache-2.11-cpan-39bf76dae61 )