formula results from the CPAN

formula

HTML-Similarity

view release on metacpan or search on metacpan

  print "Similarity: $score\n";

=head1 DESCRIPTION

This module is a small and handy tool to calculate structural
similarity between any two HTML documents. The underlying algorithm is
quite simple and straight-forward. It serializes two HTML tree to two
arrays containing node's tag names and finds the longest common
sequence between the two serialized arrays.

The similarity is measured with the formula (2 * LCS' length) /
(treeA's length + treeB's length).

Structural similarity can be useful for web page classification and
clustering.

=head1 PREREQUISITE

L<HTML::DOM>, L<Algorithm::LCS>

=head1 COPYRIGHT

  print "Similarity: $score\n";

=head1 DESCRIPTION

This module is a small and handy tool to calculate structural
similarity between any two XML documents. The underlying algorithm is
quite simple and straight-forward. It serializes two XML tree to two
arrays containing node's tag names and finds the longest common
sequence between the two serialized arrays.

The similarity is measured with the formula (2 * LCS' length) /
(treeA's length + treeB's length).

Structural similarity can be useful for XML document classification
and clustering.

=head1 PREREQUISITE

L<XML::DOM>, L<Algorithm::LCS>

=head1 COPYRIGHT

( run in 0.821 second using v1.01-cache-2.11-cpan-179a2ef0c17 )