HTML-Similarity
view release on metacpan or search on metacpan
lib/HTML/Similarity.pm view on Meta::CPAN
print "Similarity: $score\n";
=head1 DESCRIPTION
This module is a small and handy tool to calculate structural
similarity between any two HTML documents. The underlying algorithm is
quite simple and straight-forward. It serializes two HTML tree to two
arrays containing node's tag names and finds the longest common
sequence between the two serialized arrays.
The similarity is measured with the formula (2 * LCS' length) /
(treeA's length + treeB's length).
Structural similarity can be useful for web page classification and
clustering.
=head1 PREREQUISITE
L<HTML::DOM>, L<Algorithm::LCS>
=head1 COPYRIGHT
lib/XML/Similarity.pm view on Meta::CPAN
print "Similarity: $score\n";
=head1 DESCRIPTION
This module is a small and handy tool to calculate structural
similarity between any two XML documents. The underlying algorithm is
quite simple and straight-forward. It serializes two XML tree to two
arrays containing node's tag names and finds the longest common
sequence between the two serialized arrays.
The similarity is measured with the formula (2 * LCS' length) /
(treeA's length + treeB's length).
Structural similarity can be useful for XML document classification
and clustering.
=head1 PREREQUISITE
L<XML::DOM>, L<Algorithm::LCS>
=head1 COPYRIGHT
( run in 0.546 second using v1.01-cache-2.11-cpan-3cd7ad12f66 )