Algorithm-Tree-NCA

 view release on metacpan or  search on metacpan

NCA.pm  view on Meta::CPAN

node in the tree.

=over 4

=item -
A I<node number> assigned to the node in a pre-order fashion

=item -
A number to identify the I<run> of the node (L<"ALGORITHM">)

=item -
The I<leader> for each run, which should be retrievable through its
node number

=item -
A I<magic> number (L<"ALGORITHM">)

=item -
The I<parent> node for each node

=item -
The I<maximum> number assigned to a any node in the subtree

=back

All data above, with the exception of the node number, is stored in an
array inside the C<Algorithm::Tree::NCA> object.

The node number has to be stored in the actual tree node in some
manner (alternative solutions would be to slow to give constant-time
retrieval), which requires a I<set method> and a I<get method> for the
nodes. Since the most common case is using hashes to represent nodes,
there are default implementations of the set and get methods.

The default set method is:

  sub _set_method {
      my($node,$value) = @_;

      $node->{'_nca_number'} = $value;
  }

and the default get method is:

  sub _get_method {
      my($node) = @_;

      return $node->{'_nca_number'};
  } 

If have chosen another representation of your nodes, you can provide
alternative set and get methods by using the B<-set> and B<-get>
options when creating the C<Algorithm::Tree::NCA> object.

=head1 CLASS AND OBJECT METHODS

=head1 EXAMPLES

=head2 ALGORITHM

This section describes the algorithm used for preprocessing and for
nearest common ancestor retrieval. It does not provide any intuition
to I<why> the algorithm works, just a description how it works. For
the algorithm description, it is assumed that the nodes themself
contain all necessary information. The algorithm is described in a
Pascal-like fashion. For detailed information about the algorithm,
please have a look in [1] or [2].

The I<height> of a non-zero integer is the number of zeros at the
right end of the integer. The I<least significant set bit> (LSSB) of a
non-zero number is the the number with only the least significant bit
set (surprise). For instance, here is the LSSB and the height of some
numbers:

  Number    LSSB       Height
  --------  --------   ------
  01001101  00000001   0
  01001100  00000100   2

Important to note here is that for numbers I<i> and I<j>, I<height(i)
E<lt> height(j)> if and only if I<LSSB(i) E<lt> LSSB(j)>, which means
that we can replace a test of I<height(i) E<lt> height(j)> with
I<LSSB(i) E<lt> LSSB(j)>. Since I<LSSB(i)> is easier to compute, this
will speed up the computation.

=head2 Preprocessing the tree

Preprocessing the tree requires the computation of three numbers: the
I<node number>, the I<run>, and a I<magic> number. It also requires
computation of the I<leader> of each run. These computations are done
in two recursive descents and ascents of the tree.

  Procedure Preprocess(root:Node)
  Var x,y : Integer;   (* Dummy variables *)
  Begin
      (x,y) := Enumerate(root,nil,1);
      ComputeMagic(root,root,0);
  End;

In the first phase, we enumerate the tree, compute the I<run> for each
node, the I<max> number assigned to a node in the subtree, and also
the I<parent> of each node. If the parent is already available through
other means, that part is redundant. The run of a node is the number
of the node in the subtree with the largest height.

  Function Enumerate(node,parent:Node; num:Integer) : (Integer,Integer)
  Var run : Integer;
  Begin
      node.parent := parent;
      node.number := num;
      node.run := num;

      run := num;
      num := num + 1;

      Foreach child in node.children Do
	  (num,run) := Enumerate(child,node,num);
	  If height(run) > height(node.run) Then
	      node.run := run;
      Done
      node.max := num;
      Return (num,node.run)
  End;

In the second phase, we compute the I<leader> for each run (which we
can since we know the run for each node) and the I<magic> number. The
leader I<has> to be stored so that we can access is through a node
number, so we store it in an array.
  
  VAR Leader : Array [1..NODE_COUNT] of NodePtr;

The leader for each run can either be stored for each node (which we
assume here), or only stored in the node where the C<node.run ==
node.number>. We can then compute the leader of any C<node> through
C<Leader(node.run)>, which requires less storage if C<Leader> is
implemented as a spare array.

The magic number of a node is the bitwise or of all run:s of nodes in
the path leading from the root node to the node.

  Procedure ComputeMagic(node, current_leader:Node; magic:Integer)
  Begin
    node.magic = magic | LSSB(node.run);
    If node.run != leader.run Then
      Leader(node.number) = node
    Else
      Leader(node.number) = current_leader

    Foreach child in node.children Do
      ComputeMagic(child, Leader(node.number), node.magic)
    Done
  End;

=head2 Constant-time retrieval of the nearest common ancestor

To find the NCA of two nodes, we map the nodes to a binary tree and
find the NCA I<b> there (which is easy). We then do some bitwise
arithmetics to find the bit I<j> where the magic numbers of the two
nodes have a C<1> in common and that has a greater height than I<b>.

  Function NCA(x,y:Node) : Node;
  Begin
    b  := BinNCA(x,y);          (* b = 10111000 *)
    m  := (NOT b) AND (b - 1);  (* m = 00000111 *)
    c  := x.magic AND y.magic;
    u  := c AND (NOT m);
    j  := LSSB(u);
    x1 := Closest(x,j);
    y1 := Closest(y,j);
    If x1.number < y1.number Then
      Return x1
    Else
      Return y1
  End;

Retrieving the nearest common ancestor in a complete binary tree is
easy assuming you have a special numbering of the nodes. We number
each node with a I<path number> such that the root node is numbered
C<10000000> and for each choice down the tree, we use (for example)
C<0> for left and C<1> for right. Assuming that the nodes are not on
the same path from the root, we can then:

=over 4

=item a.



( run in 1.402 second using v1.01-cache-2.11-cpan-39bf76dae61 )