Algorithm-PageRank-XS
view release on metacpan or search on metacpan
lib/Algorithm/PageRank/XS.pm view on Meta::CPAN
=head1 SYNOPSYS
use Algorithm::PageRank::XS;
my $pr = Algorithm::PageRank::XS->new();
$pr->graph([
'John' => 'Joey',
'John' => 'James',
'Joey' => 'John',
'James' => 'Joey',
]
);
$pr->result();
# {
# 'James' => '0.569840431213379',
# 'Joey' => '1',
# 'John' => '0.754877686500549'
# }
#
#
# The following simple program takes up arcs and prints the ranks.
use Algorithm::PageRank::XS;
my $pr = Algorithm::PageRank::XS->new();
while (<>) {
chomp;
my ($from, to) = split(/\t/, $_);
$pr->add_arc($from, $to);
}
my $r = $pr->results();
while (my ($name, $rank) = each(%{$r})) {
print "$name,$rank\n";
}
=head1 METHODS
=head2 new %PARAMS
Create a new PageRank object. Possible parameters:
=over 4
=item alpha
This is (1 - how much people can move from one node to another unconnected one randomly). Decreasing
this number makes convergence more likely, but brings us further from the true eigenvector.
=item max_tries
The maximum number of tries until we give up trying to achieve convergence.
=item convergence
The maximum number the difference between two subsequent vectors must be before we say we are
"convergent enough". The convergence rate is the rate at which C<alpha^t> goes to 0. Thus,
if you set C<alpha> to C<0.85>, and C<convergence> to C<0.000001>, then you will need C<85> tries.
=back
=head2 add_arc
Add an arc to the pagerank object before running the computation.
The actual values don't matter. So you can run:
$pr->add_arc("Apple", "Orange");
and you mean that C<"Apple"> links to C<"Orange">.
=head2 graph
Add a graph, which is just an array of from, to combinations.
This is equivalent to calling C<add_arc> a bunch of times, but may
be more convenient.
=head2 from_file FILE
This will load arcs from a file, whose lines contain:
from,to\n
It's designed to be fast, and doesn't handle quoting or even commas
in the from string. This will just allow you to load a bit faster and maybe
save a few megabytes of ram if you wanted to.
=head2 iterate
Doesn't do anything, but provided so that you can substitute this module
in for L<Algorithm::PageRank>.
=head2 result
Compute the pagerank vector, and return it as a hash.
Whatever you called the nodes when specifying the arcs will be the keys of this hash, where the
values will be the vector.
The result vector is normalized such that the sum is C<1> (the L-1 norm). You can normalize it any other way you like if you don't like this.
=head1 BUGS
None known.
=head1 TODO
=over 4
=item * Support for "Personalized PageRank" (see L<http://ilpubs.stanford.edu:8090/596/>)
=item * We may want to support C<double> values rather than single floats
=item * We may or may not want to adjust the weighting of individual arcs, as you cannot do now.
=item * At present the indexes are C<unsigned int>, rather than C<size_t>. Thus this will not scale with 64-bit architectures.
( run in 0.946 second using v1.01-cache-2.11-cpan-483215c6ad5 )