App-FeedDeduplicator
view release on metacpan or search on metacpan
lib/App/FeedDeduplicator/Deduplicator.pm view on Meta::CPAN
deduplicated entries are stored in the $deduplicated attribute.
It is designed to be used in conjunction with the Aggregator and Publisher
classes to provide a complete feed deduplication and publishing solution.
=head2 find_canonical
Finds the canonical link for a given entry. It fetches the entry's link using
LWP::UserAgent and parses the HTML content using HTML::TreeBuilder::XPath.
It looks for the <link rel="canonical"> tag in the HTML content and returns
the canonical URL if found. If the canonical link is not found, it returns
undef.
It is used during the deduplication process to determine the unique
identifier for each entry.
=cut
package App::FeedDeduplicator::Deduplicator; # For MetaCPAN
( run in 1.881 second using v1.01-cache-2.11-cpan-39bf76dae61 )