HTML-PullParser-Nested

 view release on metacpan or  search on metacpan

lib/HTML/PullParser/Nested.pm  view on Meta::CPAN

# $Id: Nested.pm 4647 2010-03-09 18:10:10Z chris $

=head1 NAME

HTML::PullParser::Nested - Wrapper around HTML::PullParser with awareness of tag nesting.


=head1 SYNOPSIS

use HTML::PullParser::Nested;

    my $p = HTML::PullParser::Nested->new(
        doc         => \ "<html>...<ul><li>abcd<li>efgh<li>wvyz</ul>...<ul><li>1<li>2<li>9</ul></html>",
        start       => "'S',tagname,attr,attrseq,text",
        end         => "'E',tagname,text",
        text        => "'T',text,is_cdata",
        );
    
    while (my $token = $p->get_token()) {
        if ($token->[0] eq "S" && $token->[1] eq "ul") {
            $p->push_nest($token);
            print "List:\n";
            while (my $token = $p->get_token()) {
                if ($token->[0] eq "S" && $token->[1] eq "li") {
                    print $p->get_token()->[1], "\n";
                }
            }
            print "\n";
            $p->pop_nest();
        }
    }


=head1 DESCRIPTION

This class is a wrapper around HTML::PullParser with awareness of the nesting
of tags.

There is a cursor, which points to the current position within the 
document.  It should be thought of as pointing to the start of the 
next token, or to 'EOL' (eof of level).

Tokens can be read sequentially, and the cursor will be advanced after 
each read.  They can also be unread, reversing any effects of their having
been read.

As noted, the class is aware of tag nesting, giving the concept of 
nesting levels.  Level 1 encompasses the whole document.  As any point
a new nesting level can be pushed on, specifying a tag type.  In effect, 
the parser then behaves as if it had instead been opened on a document
only containing the content up the closing tag.  It is then possible to
pop a nesting level, which then moves the cursor to the start of the 
closing tag and switches to the parent nesting level.

=cut


package HTML::PullParser::Nested;

use strict;
use warnings;

our $VERSION = '0.04';

use Carp;

use HTML::PullParser;

=head1 METHODS

=head2 new(file => $file, %options), new(doc => \$doc, %options)

Constructor.  %options gets passed to the encapsulated HTML::PullParser 
object and largely has the same restrictions.  As HTML::PullParser::Nested
needs to be able to process tokens returned by HTML::PullParser, there are
some restrictions on the argspecs for each token type.  Firstly, so that



( run in 1.533 second using v1.01-cache-2.11-cpan-8644d7adfcd )