FramesReady
view release on metacpan or search on metacpan
lib/LWP/UserAgent/FramesReady.pm view on Meta::CPAN
my $p = HTML::TokeParser->new(\$response->content);
while (my $frm = $p->get_tag('frame')) {
next unless $self->valid_scheme($frm->[1]{'src'});
my $nurl = URI->new_abs($frm->[1]{'src'}, $base_path);
push @uris, $nurl;
}
$p = HTML::TokeParser->new(\$response->content);
while (my $tag = $p->get_tag('iframe')) {
next unless $self->valid_scheme($tag->[1]{'src'});
my $nurl = URI->new_abs($tag->[1]{'src'}, $base_path);
push @uris, $nurl;
}
return @uris;
}
=back 4
The following method is new:
=over 4
=item $ua->max_depth([$depth])
This gets or sets the maximum depth that the user agent is allowed to go
to fetch framed pages. 0 means it will not fetch any frames, 1 means it
will fetch only the frames for the topmost page and not any sub-frames,
and so on. The default is 3.
=cut
sub max_depth {
my $self = shift;
my $depth = shift;
if (defined($depth)) {
$self->{_luf_max_depth} = int($depth);
}
return $self->{_luf_max_depth};
}
=item $ua->callbk([\&callback])
Get/set the callback subroutine to use to process input as it is
gathered. This causes the input to be chunked and the routine must
either process the data itself or append it to the
B<$response->{_content}> in order for the final content to be
processed all at one time.
=item $ua->size([$size])
Get/Set the size suggested for chunks when the callback routine is used.
=item $ua->nomax([$max])
Get/set the B<$self->{'nomax'}> can be used by the included callback
routine to enforce truncation of received content even if the request
to do so is not honored by the server called for the content.
=item $ua->credent([$credentials])
Get/set the credentials for authentification if called for.
=item $ua->errors()
Get the last error encountered if a return was an undef. This routine
has only been tested in development so there is no guarantee that it
will work within your functions.
=cut
sub callbk { shift->_elem('callback', @_); }
sub size { shift->_elem('size', @_); }
sub nomax { shift->_elem('nomax', @_); }
sub credent { shift->_elem('credent', @_); }
sub errors { shift->{'errstring'}; }
=item callback()
The callback routine is a sample of how to revise the way the
immediate refresh responses are processed by converting them into
redirects. Since the routine is called whenever there is chunked
response data available by use of the alternate
LWP::UserAgent::request() method, and we only change headers for
immediate refreshes. We must also deal with the fact that the
callback was originally designed for processing the content. The
$resp->{_content} field must have the unprocessed data element
appended back in.. appended, as this data is chunked and there may
already be content from a previous chunk that was processed.
=cut
sub callback {
my ($data, $resp, $proto) = @_;
# LWP::UserAgent should be populating the refresh header--process it here
if (exists($resp->headers->{'refresh'})) {
if ($resp->headers->{'refresh'} =~ /^[0-9];.*URL=([^">]+)/is) {
my $url = $1;
unless ($url =~ /^(file|java|vb)/is ) {
delete $resp->headers->{'refresh'};
$resp->headers->{'location'} = $url;
$resp->code(&HTTP::Status::RC_MOVED_TEMPORARILY);
}
}
} elsif ($data =~ /HTTP-EQUIV=\"?REFRESH\"? CONTENT=\"?\s?[0-9];.*URL=([^">]+)/is) {
# if headers->{refresh} was not generated (in content instead of header)
my $loc = $1;
unless ($loc =~ /^(file|java|vb)/is ) {
delete $resp->headers->{'refresh'} if
exists $resp->headers->{'refresh'};
$resp->headers->{'location'} = $loc;
$resp->code(&HTTP::Status::RC_MOVED_TEMPORARILY);
}
}
# Fixup to correct override by server for request for max bytes
# Servers have no compulsion to follow the request but if we made it
# we want it enforced here unless told otherwise
if (defined($resp->request->headers->{'range'}) && ! $self->{nomax}) {
my ($maxs) = $resp->request->headers->{'range'} =~ /bytes=0-(.*)/;
if ($maxs && length($resp->content) > $maxs) {
$data = '';
if ($resp->code ne &HTTP::Status::RC_PARTIAL_CONTENT) {
$resp->headers->{'content-length'} = length($resp->content);
$resp->code(&HTTP::Status::RC_PARTIAL_CONTENT);
$resp->{_msg} = HTTP::Status::status_message($resp->code);
}
}
}
# We must restore the _content since the parent assumes we deal with it
$resp->{_content} .= $data;
return undef;
}
=item $ua->valid_scheme()
The valid_scheme validates the frame src entry to scheme types we can
process.
=cut
sub valid_scheme ($) {
my $self = shift;
my $urlchk = shift;
my $scheme = '';
$self->{'errstring'} = '';
if ($urlchk =~ s/^([^:]*)://) {
$scheme = lc($1);
}
if ($scheme && ! grep {$scheme eq $_} @schema) {
$self->{'errstring'} = "Invalid scheme [$scheme]";
return 0;
}
return 1;
}
=item $ua->get_basic_credentials()
This routine overloads the LWP::UserAgent::get_basic_credentials in
order to supply authorization if it has been pre-loaded an initial/new
or by use of the $ua->credent() routine. Supplies a return in a list
context of a UserID and a Password to LWP::UserAgent::credentials().
=cut
sub get_basic_credentials {
my($self, $realm, $uri) = @_;
$self->{'errstring'} = '';
if ($self->{credent}) {
return split(':', $self->{credent}, 2);
} else {
$self->{'errstring'} = "Not found: Credent $realm";
return (undef, undef);
}
}
=back
=head1 NOTES
Processing other embedded objects in an HTML page is similar to processing
frames. Perhaps someday there will be yet another version of this that
can also handle things like in-line images, layers, etc.
=head1 BUGS
Any known bugs will be noted here and documented in the source with "BUG:"
in the comments.
=head1 COPYRIGHT
Copyright 2002 N2H2, Inc. All rights reserved.
This library is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.
=cut
1;
( run in 0.443 second using v1.01-cache-2.11-cpan-cdf2f3d4e48 )