HTML-PullParser-Nested
view release on metacpan or search on metacpan
lib/HTML/PullParser/Nested.pm view on Meta::CPAN
# $Id: Nested.pm 4647 2010-03-09 18:10:10Z chris $
=head1 NAME
HTML::PullParser::Nested - Wrapper around HTML::PullParser with awareness of tag nesting.
=head1 SYNOPSIS
use HTML::PullParser::Nested;
my $p = HTML::PullParser::Nested->new(
doc => \ "<html>...<ul><li>abcd<li>efgh<li>wvyz</ul>...<ul><li>1<li>2<li>9</ul></html>",
start => "'S',tagname,attr,attrseq,text",
end => "'E',tagname,text",
text => "'T',text,is_cdata",
);
while (my $token = $p->get_token()) {
if ($token->[0] eq "S" && $token->[1] eq "ul") {
$p->push_nest($token);
print "List:\n";
while (my $token = $p->get_token()) {
if ($token->[0] eq "S" && $token->[1] eq "li") {
print $p->get_token()->[1], "\n";
}
}
print "\n";
$p->pop_nest();
}
}
=head1 DESCRIPTION
This class is a wrapper around HTML::PullParser with awareness of the nesting
of tags.
There is a cursor, which points to the current position within the
document. It should be thought of as pointing to the start of the
next token, or to 'EOL' (eof of level).
Tokens can be read sequentially, and the cursor will be advanced after
each read. They can also be unread, reversing any effects of their having
been read.
As noted, the class is aware of tag nesting, giving the concept of
nesting levels. Level 1 encompasses the whole document. As any point
a new nesting level can be pushed on, specifying a tag type. In effect,
the parser then behaves as if it had instead been opened on a document
only containing the content up the closing tag. It is then possible to
pop a nesting level, which then moves the cursor to the start of the
closing tag and switches to the parent nesting level.
=cut
package HTML::PullParser::Nested;
use strict;
use warnings;
our $VERSION = '0.04';
use Carp;
use HTML::PullParser;
=head1 METHODS
=head2 new(file => $file, %options), new(doc => \$doc, %options)
Constructor. %options gets passed to the encapsulated HTML::PullParser
object and largely has the same restrictions. As HTML::PullParser::Nested
needs to be able to process tokens returned by HTML::PullParser, there are
some restrictions on the argspecs for each token type. Firstly, so that
( run in 1.533 second using v1.01-cache-2.11-cpan-8644d7adfcd )