HTML-TableExtractor
view release on metacpan or search on metacpan
TableExtractor.pm view on Meta::CPAN
=head1 DESCRIPTION
Parses HTML looking for table-related elements (table, tr, td and th as of
version 0.1).
Three callbacks can be registered for each element. These callbacks,
described below, are executed whenever an element of a particular type is
encountered.
o start_${tagname} Called whenever $tagname is opened.
o ${tagname} Called immediately after start_${tagname}, and
immediately before end_${tagname}.
o end_${tagname} Called whenever a closing $tagname is encountered.
=head2 EXAMPLE
use HTML::TableExtractor;
$p = HTML::TableExtractor->new();
$p->parse($html,
start_table => sub {
my ($attr, $origtext) = @_;
print "Table border is $table->{border}\n";
},
tr => sub { print "Row opened or closed.\n" },
);
=head1 METHODS
=over 4
=item start($parser, $tag, $attr, $attrseq, $origtext);
Called whenever a particular start tag has been recognised. This module
recognises these tags: <table>, <tr>, <td> & <th>.
This method will be called by the parser and is not intended to be called from
an application.
=item end($parser, $tag, $origtext);
Called whenever a particular end tag is encountered.
This method will be called by the parser and is not intended to be called from
an application.
=item $p->parse($html, tag_type => \&coderef, ...);
This method is all you really need to do. Call it with callbacks for each tag
type. These will be executed as described above.
=back
=head2 EXPORTS
=head2 CAVEATS, BUGS, and TODO
o parse() should handle other data sources, such as streaming, file handle
etc.
=head2 SEE ALSO
HTML::Parser, HTML::TableContentParser
=head1 AUTHOR
Simon Drabble E<lt>simon@thebigmachine.org<gt>
(C) 2002 Simon Drabble
This software is released under the same terms as perl.
=cut
( run in 2.135 seconds using v1.01-cache-2.11-cpan-140bd7fdf52 )