Graph-Centrality-Pagerank

 view release on metacpan or  search on metacpan

lib/Graph/Centrality/Pagerank.pm  view on Meta::CPAN

}

=head1 NAME

C<Graph::Centrality::Pagerank> - Computes pagerank of all nodes in a graph.

=head1 SYNOPSIS

  use Graph::Centrality::Pagerank;
  use Data::Dump qw(dump);
  my $ranker = Graph::Centrality::Pagerank->new();
  my $listOfEdges = [[1,2],[3,4]];
  dump $ranker->getPagerankOfNodes (listOfEdges => $listOfEdges);
  # dumps:
  # {
  #   1 => "0.175438596989046",
  #   2 => "0.324561403010954",
  #   3 => "0.175438596989046",
  #   4 => "0.324561403010954",
  # }

=head1 DESCRIPTION

C<Graph::Centrality::Pagerank> computes the pagerank of the all nodes in a graph.
The input can be a list of edges or a L<Graph>. C<Graph::Centrality::Pagerank> is
written entirely in Perl and is not recommended for use in high performance applications.

=head1 CONSTRUCTOR

=head2 C<new>

The method C<new> creates an instance of the C<Graph::Centrality::Pagerank>
class with the following parameters:

=over

=item C<dampeningFactor>

 dampeningFactor => 0.85

C<dampeningFactor> is the dampening factor used when computing pagerank. It
must be a value ranging from zero to one; the default is 0.85. Note the
incidence matrix
generated from the graph is multiplied (scaled) by C<dampeningFactor>, I<not>
by C<1 - dampeningFactor>.

=item C<maxRelError>

 maxRelError => sqrt (machine-epsilon)

C<maxRelError> is the maximum I<average> relative error that is permitted between
successive pagerank vectors before the iterative process to approximate the pagerank vector
should be stopped. The default is the square root of the systems machine epsilon.
Usually, most pagerank values computed will have C<-log10(maxRelError)> digits of accuracy.
C<maxRelError> must be positive and less than or equal to 0.01.

=item C<minIterations>

 minIterations => 0

C<minIterations> is the minimum number of iterations that will be computed
before the pagerank iterations are stopped, even if C<maxRelError> is achieved.
The default is zero.

=item C<maxIterations>

 maxIterations => int (2 * ((maxRelError / ln (dampeningFactor) + 1))

C<maxIterations> is the maximum number of iterations that can be performed
to approximate the pagerank vector even if C<maxRelError> is not achieved.
The default is C<2 * ((maxRelError / ln (dampeningFactor) + 1)>. If
C<dampeningFactor> is zero, then C<maxIterations> is one. If C<dampeningFactor>
is one, then C<maxIterations> is equal to the total nodes in the graph.

=item C<linkSinkNodes>

 linkSinkNodes => 1

In a directed graph sink nodes are the nodes with no edges emanating out
from them. In the pagerank algorithm these nodes are automatically linked
to all other nodes in the graph. To prevent this set C<linkSinkNodes> to zero;
the default is one.

=item C<directed>

 directed => 1

If C<directed> is true, the pagerank computations are done with the graph
edges being directed. If C<directed> is false, the pageranks are computed
treating the graph as undirected; the default value of C<directed> is one.

=item C<useEdgeWeights>

 useEdgeWeights => 0

If C<useEdgeWeights> is true, then any weight associated with an edge is
used in the computation of pagerank. The default weight for any edge without an
assigned weight is one. The default value of C<useEdgeWeights> is zero,
which forces all edge weights to be one.

=back

=cut

sub new
{
  my ($Class, %Parameters) = @_;
  my $Self = bless ({}, ref ($Class) || $Class);

  # set the default parameters.
  my %parameters = $Self->_setDefaultParameters (%Parameters);
  $Self->{defaultParameters} = \%parameters;

  return $Self;
}

sub _setDefaultParameters
{
  my ($Self, %Parameters) = @_;

  # set dampening factor, default is .85;
  $Parameters{dampeningFactor} = 0.85 unless (exists ($Parameters{dampeningFactor}));
  $Parameters{dampeningFactor} = abs ($Parameters{dampeningFactor}) if (exists ($Parameters{dampeningFactor}));
  $Parameters{dampeningFactor} = 1 if ($Parameters{dampeningFactor} > 1);

  # set max relative error, default is sqrt (machine epsilon).
  $Parameters{maxRelError} = sqrt ($Self->_getMachineEpsilon ()) unless (exists ($Parameters{maxRelError}));
  $Parameters{maxRelError} = abs ($Parameters{maxRelError}) if (exists ($Parameters{maxRelError}));
  $Parameters{maxRelError} = 0.01 if ($Parameters{maxRelError} > 0.01);

  # set default min iterations.
  $Parameters{minIterations} = 1 unless (exists ($Parameters{minIterations}));
  $Parameters{minIterations} = abs ($Parameters{minIterations});

  # set default max iterations.
  unless (exists ($Parameters{maxIterations}))
  {
    $Parameters{maxIterations} = 1;
    $Parameters{maxIterations} = 1 if ($Parameters{dampeningFactor} <= 0);
    $Parameters{maxIterationsIsTotalNodes} = 1 if ($Parameters{dampeningFactor} >= 1);
    if ($Parameters{dampeningFactor} < 1)
    {
      $Parameters{maxIterations} = 2 * (int (log ($Parameters{maxRelError}) / log ($Parameters{dampeningFactor})) + 1);
    }
  }
  $Parameters{maxIterations} = abs ($Parameters{maxIterations});
  $Parameters{maxIterations} = 1 if ($Parameters{maxIterations} < 1);
  $Parameters{maxIterations} = $Parameters{minIterations} if ($Parameters{maxIterations} < $Parameters{minIterations});

  # set type of graph, default is directed.
  unless (exists ($Parameters{directed}))
  {
    if (exists ($Parameters{undirected})) { $Parameters{directed} = !$Parameters{undirected}; }
    else { $Parameters{directed} = 1; }
  }

  # set use of edge weights in graph, default is 0.
  $Parameters{useEdgeWeights} = 0 unless (exists ($Parameters{useEdgeWeights}));

  # set linking of sinks nodes in graph, default is 1. if graph is
  # undirected there are no sink nodes.
  $Parameters{linkSinkNodes} = 1 unless (exists ($Parameters{linkSinkNodes}));

  # when forcing dangling nodes to link to all other nodes, this should be
  # with a probability of (1 - dampenFactor) / totalNodes to keep the matrix
  # stocastic; however, some implementation of Pagerank do not do this. the
  # default is 1.
  $Parameters{scaleDampeningFactor} = 1 unless (exists ($Parameters{scaleDampeningFactor}));

  return %Parameters;
}

# gets all the parameters needed to compute the pagerank. uses the
# values set when the object was insantiated to set any missing values.
sub _getAllParameters
{
  my ($Self, %Parameters) = @_;

  # set dampening factor.
  $Parameters{dampeningFactor} = $Self->{defaultParameters}{dampeningFactor} unless (exists ($Parameters{dampeningFactor}));
  $Parameters{dampeningFactor} = abs ($Parameters{dampeningFactor}) if (exists ($Parameters{dampeningFactor}));
  $Parameters{dampeningFactor} = 1 if ($Parameters{dampeningFactor} > 1);

  # set max relative error.
  $Parameters{maxRelError} = $Self->{defaultParameters}{maxRelError} unless (exists ($Parameters{maxRelError}));
  $Parameters{maxRelError} = abs ($Parameters{maxRelError}) if (exists ($Parameters{maxRelError}));
  $Parameters{maxRelError} = 0.01 if ($Parameters{maxRelError} > 0.01);

  # set default min iterations.
  $Parameters{minIterations} = $Self->{defaultParameters}{minIterations} unless (exists ($Parameters{minIterations}));
  $Parameters{minIterations} = abs ($Parameters{minIterations});

  # set default max iterations.
  unless (exists ($Parameters{maxIterations}))
  {
    $Parameters{maxIterations} = $Self->{defaultParameters}{maxIterations};
    if ((0 < $Parameters{dampeningFactor}) && ($Parameters{dampeningFactor} < 1))
    {
      $Parameters{maxIterations} = 2 * (int (log ($Parameters{maxRelError}) / log ($Parameters{dampeningFactor})) + 1);
    }
  }
  $Parameters{maxIterations} = abs ($Parameters{maxIterations});
  $Parameters{maxIterations} = 1 if ($Parameters{maxIterations} < 1);
  $Parameters{maxIterations} = $Parameters{minIterations} if ($Parameters{maxIterations} < $Parameters{minIterations});

  # set the type of graph.
  my $directed;
  $directed = !$Parameters{undirected} if (exists ($Parameters{undirected}));
  $directed = $Parameters{directed} if (exists ($Parameters{directed}));
  $directed = $Parameters{graph}->is_directed () if (!defined ($directed) && exists ($Parameters{graph}));
  $directed = $Self->{defaultParameters}{directed} unless defined $directed;
  $Parameters{directed} = $directed;

  # set use of edge weights in graph, default is 0.
  $Parameters{useEdgeWeights} = $Self->{defaultParameters}{useEdgeWeights} unless (exists ($Parameters{useEdgeWeights}));

  # set linking of sinks nodes in graph, default is 1. if graph is
  # undirected there are no sink nodes.
  $Parameters{linkSinkNodes} = $Self->{defaultParameters}{linkSinkNodes} unless (exists ($Parameters{linkSinkNodes}));

  # when forcing dangling nodes to link to all other nodes, this should be
  # with a probability of (1 - dampenFactor) / totalNodes to keep the matrix
  # stocastic; however, some implementation of Pagerank do not do this. the
  # default is 1.
  $Parameters{scaleDampeningFactor} = $Self->{defaultParameters}{scaleDampeningFactor} unless (exists ($Parameters{scaleDampeningFactor}));

  return %Parameters;
}

=head1 METHOD

=head2 C<getPagerankOfNodes>

The method C<getPagerankOfNodes> computes the pagerank of each node in the graph.
The graph can be defined using the C<graph> parameter or by supplying a list of edges.
All the parameters used by the constructor C<new> can also be set here and they will override
the values used with C<new>. C<getPagerankOfNodes> returns a reference to a hash where the
keys are the graph nodes and the values are the pageranks of the node.

=over

=item C<graph>

 graph => Graph

C<graph> must be a L<Graph> object. If the C<directed> parameter was not set
with the constructor C<new> or with this method, then C<directed>
is set to the value of L<Graph>->L<is_directed|Graph/Accessors>().

=item C<listOfEdges>

 listOfEdges => [['a',10],[10,11]]

lib/Graph/Centrality/Pagerank.pm  view on Meta::CPAN

  {
    if ($sum == 0)
    {
      # if the column sum for a node is zero, it is a sink.
      push @sinkNodes, $node;
    }
    else
    {
      # normalize the columns of the node so it sums to 1.
      foreach my $rowNode (@rowNodes)
      {
        if (exists ($matrixRows{$rowNode}->{$node}))
        {
          $matrixRows{$rowNode}->{$node} /= $sum;
        }
      }
    }
  }

  # get the list of all the nodes.
  my @allNodes = keys %columnSum;

  # get the total number of nodes.
  my $totalNodes = @allNodes;

  # get or make the nodeWeights vector.
  my %nodeWeights;
  %nodeWeights = %{$Parameters{nodeWeights}} if exists $Parameters{nodeWeights};

  # ensure $nodeWeights is nonegative for all nodes.
  my $sum = 0;
  foreach my $node (@allNodes)
  {
    # if a value is not defined for a node, make it zero at first.
    $nodeWeights{$node} = 0 unless exists $nodeWeights{$node};

    # force values to be nonnegative.
    $nodeWeights{$node} = -$nodeWeights{$node} if ($nodeWeights{$node} < 0);

    # sum the positive values.
    $sum += $nodeWeights{$node};
  }

  # ensure $nodeWeights sum to one.
  if ($sum > 0)
  {
    foreach my $node (@allNodes) { $nodeWeights{$node} /= $sum; }
  }
  else
  {
    foreach my $node (@allNodes) { $nodeWeights{$node} = 1 / $totalNodes; }
  }


  # initialize the pagerank vector;
  my $pagerank = {};
  return $pagerank if ($totalNodes == 0);
  foreach my $node (@allNodes) { $pagerank->{$node} = $nodeWeights{$node}; }
  my $newPageRank = {};

  # set the maximum number of iterations.
  my $maxIterations = $Parameters{maxIterations};
  $maxIterations = $totalNodes if exists $Parameters{maxIterationsIsTotalNodes};

  for (my $iteration = 0; $iteration < $maxIterations; $iteration++)
  {
    # first set the new page rank to the average pageranks times (1 - $dampeningFactor).
    if ($Parameters{scaleDampeningFactor})
    {
      #my $sum = 0;
      #foreach my $node (@allNodes) { $sum += $pagerank->{$node}; }
      #$sum *= (1 - $Parameters{dampeningFactor}) / $totalNodes;
      foreach my $node (@allNodes) { $newPageRank->{$node} = (1 - $Parameters{dampeningFactor}) * $nodeWeights{$node}; }
    }
    else
    {
      foreach my $node (@allNodes) { $newPageRank->{$node} = 1 - $Parameters{dampeningFactor}; }
    }

    # add in the values for the sink nodes.
    if ($Parameters{linkSinkNodes})
    {
      my $sinkSum = 0;
      foreach my $sinkNode (@sinkNodes)
      {
        $sinkSum += $pagerank->{$sinkNode};
      }
      $sinkSum *= $Parameters{dampeningFactor} / $totalNodes;
      foreach my $node (@allNodes) { $newPageRank->{$node} += $sinkSum; }
    }

    # add in the rank from the graph links.
    foreach my $rowNode (@rowNodes)
    {
      my $sum = 0;
      while (my ($colNode, $value) = each %{$matrixRows{$rowNode}})
      {
        $sum += $value * $pagerank->{$colNode};
      }
      $newPageRank->{$rowNode} += $Parameters{dampeningFactor} * $sum;
    }

    # normalize (for rounding error stability -- i hope) then swap pagerank vectors.
    _normalizeByNorm1 ($newPageRank, \@allNodes);
    ($pagerank, $newPageRank) = ($newPageRank, $pagerank);

    # compute the error.
    my $error = 0;
    my $totalNonzero = 0;
    foreach my $node (@allNodes)
    {
      if ($pagerank->{$node} != 0)
      {
        $error += abs (($newPageRank->{$node} - $pagerank->{$node}) / $pagerank->{$node});
      }
      else
      {
        $error += abs ($newPageRank->{$node} - $pagerank->{$node});
      }
    }
    $error /= $totalNodes;



( run in 0.647 second using v1.01-cache-2.11-cpan-71847e10f99 )