HTML-ResolveLink

 view release on metacpan or  search on metacpan

lib/HTML/ResolveLink.pm  view on Meta::CPAN


sub _default {
    my($self, $tagname, $attr, $text) = @_;
    $self->{resolvelink_html} .= $text;
}

my %escape = (
    '<' => '&lt;',
    '>' => '&gt;',
    '"' => '&quot;',
    '&' => '&amp;',
);
my $esc_re = join '|', keys %escape;

sub _escape {
    my $str = shift;
    $str =~ s/($esc_re)/$escape{$1}/g;
    $str;
}

sub resolve {
    my($self, $html) = @_;

    # init
    $self->{resolvelink_html} = '';
    $self->{resolvelink_count} = 0;

    $self->parse($html);
    $self->eof;

    $self->{resolvelink_html};
}

sub resolved_count {
    my $self = shift;
    $self->{resolvelink_count};
}

1;
__END__

=head1 NAME

HTML::ResolveLink - Resolve relative links in (X)HTML into absolute URI

=head1 SYNOPSIS

  use HTML::ResolveLink;

  my $resolver = HTML::ResolveLink->new(
      base => 'http://www.example.com/foo/bar.html',
      callback => sub {
         my($uri, $old) = @_;
         # ...
      },
  );
  $html = $resolver->resolve($html);

=head1 DESCRIPTION

HTML::ResolveLink is a module to rewrite relative links in XHTML or
HTML into absolute URI.

For example. when you have

  <a href="foo.html">foo</a>
  <img src="/bar.gif" />

and use C<http://www.example.com/foo/bar> as C<base> URL, you'll get:

  <a href="http://www.example.com/foo/foo.html">foo</a>
  <img src="http://www.example.com/bar.gif" />

If the parser encounters C<< <base> >> tag in HTML, it'll honor that.

=head1 METHODS

=over 4

=item new

  my $resolver = HTML::ResolveLink->new(
      base => 'http://www.example.com/',
      callback => \&callback,
  );

C<base> is a required parameter, which is used to resolve the relative
URI found in the document.

C<callback> is an optional parameter, which is a callback subroutine
reference which would take new resolved URI and the original path as
arguments.

Here's an example code to illustrate how to use callback function.

  my $count;
  my $resolver = HTML::ResolveLink->new(
      base => $base,
      callback => sub {
          my($uri, $old) = @_;
          warn "$old is resolved to $uri";
          $count++;
      },
  );

  $html = $resolver->resolve($html);

  if ($count) {
      warn "HTML::ResolveLink resolved $count links";
  }

=item resolve

  $html = $resolver->resolve($html);

Resolves relative URI found in C<$html> into absolute and returns a
string containing rewritten one.

=item resolved_count

  $count = $resolver->resolved_count;



( run in 2.831 seconds using v1.01-cache-2.11-cpan-119454b85a5 )