HTML-Similarity

 view release on metacpan or  search on metacpan

lib/HTML/Similarity.pm  view on Meta::CPAN

  print "Similarity: $score\n";

=head1 DESCRIPTION

This module is a small and handy tool to calculate structural
similarity between any two HTML documents. The underlying algorithm is
quite simple and straight-forward. It serializes two HTML tree to two
arrays containing node's tag names and finds the longest common
sequence between the two serialized arrays.

The similarity is measured with the formula (2 * LCS' length) /
(treeA's length + treeB's length).

Structural similarity can be useful for web page classification and
clustering.

=head1 PREREQUISITE

L<HTML::DOM>, L<Algorithm::LCS>

=head1 COPYRIGHT

lib/XML/Similarity.pm  view on Meta::CPAN

  print "Similarity: $score\n";

=head1 DESCRIPTION

This module is a small and handy tool to calculate structural
similarity between any two XML documents. The underlying algorithm is
quite simple and straight-forward. It serializes two XML tree to two
arrays containing node's tag names and finds the longest common
sequence between the two serialized arrays.

The similarity is measured with the formula (2 * LCS' length) /
(treeA's length + treeB's length).

Structural similarity can be useful for XML document classification
and clustering.

=head1 PREREQUISITE

L<XML::DOM>, L<Algorithm::LCS>

=head1 COPYRIGHT



( run in 0.546 second using v1.01-cache-2.11-cpan-3cd7ad12f66 )