HTML-Bare
view release on metacpan or search on metacpan
sorted and output in the correct order to HTML. Note that
nodes of the same name with the same parent will be
grouped together; the position of the first item to
appear will determine the output position of the group.
=item * Comments are parsed but only one is stored per node.
For each node, there can be a comment within it, and that
comment will be saved and output back when dumping to HTML.
=item * Comments override output of immediate value
If a node contains only a comment node and a text value,
only the comment node will be displayed. This is in line
with treating a comment node as a node and only displaying
immediate values when a node contains no subnodes.
=item * PI sections are parsed, but discarded
=item * Unknown C<< <! >> sections are parsed, but discarded
=item * Attributes may use no quotes, single quotes, quotes, or backtics
=item * Quoted attributes cannot contain escaped quotes
No escape character is recognized within quotes. As a result,
regular quotes cannot be stored to HTML, or the written HTML
will not be correct, due to all attributes always being written
using quotes.
=item * Attributes are always written back to HTML with quotes
=item * Nodes cannot contain subnodes as well as an immediate value
Actually nodes can in fact contain a value as well, but that
value will be discarded if you write back to HTML. That value is
equal to the first continuous string of text besides a subnode.
<node>text<subnode/>text2</node>
( the value of node is text )
<node><subnode/>text</node>
( the value of node is text )
<node>
<subnode/>text
</node>
( the value of node is "\n " )
=item * Entities are not parsed
No entity parsing is done. This is intentional. Future versions of the module
may include a feature to automatically parse entities, but by default any such
feature will be disabled in order to keep from slowing down the parser.
Also, this is done so that round trip ( read and then write back out ) behavior
is consistent.
=item * Nodes named value
Previously iterations of this module had problems with nodes named 'value',
due to the fact that node contents are stored under the 'value' key already.
The current version should parse such files without any problem, although it
may be confusing to see a parsed tree with 'value' pointing to another hash
containing 'value' as well.
In a future version of the module it will be possible to alter the name that
values are stored under.
Note that node values are stored under the key 'content' when the "simple"
parsing mode is used, so as to be consistent with HTML::Simple.
=back
=head2 Module Functions
=over 2
=item * C<< $ob = HTML::Bare->new( text => "[some html]" ) >>
Create a new HTML object, with the given text as the html source.
=item * C<< $object = HTML::Bare->new( file => "[filename]" ) >>
Create a new HTML object, with the given filename/path as the html source
=item * C<< $object = HTML::Bare->new( text => "[some html]", file => "[filename]" ) >>
Create a new HTML object, with the given text as the html input, and the given
filename/path as the potential output ( used by save() )
=item * C<< $object = HTML::Bare->new( file => "data.html", scheme => { file => "scheme.xbs" } ) >>
Create a new HTML object and check to ensure it is valid html by way of the XBS scheme.
=item * C<< $tree = $object->parse() >>
Parse the html of the object and return a tree reference
=item * C<< $tree = $object->simple() >>
Alternate to the parse function which generates a tree similar to that
generated by HTML::Simple. Note that the sets of nodes are turned into
arrays always, regardless of whether they have a 'name' attribute, unlike
HTML::Simple.
Note that currently the generated tree cannot be used with any of the
functions in this module that operate upon trees. The function is provided
purely as a quick and dirty way to read simple HTML files.
=item * C<< $tree = htmlin( $htmlext, keeproot => 1 ) >>
The htmlin function is a shortcut to creating an HTML::Bare object and
parsing it using the simple function. It behaves similarly to the
HTML::Simple function by the same name. The keeproot option is optional
and if left out the root node will be discarded, same as the function
in HTML::Simple.
=item * C<< $text = $object->html( [root] ) >>
Take the hash tree in [root] and turn it into cleanly indented ( 2 spaces )
( run in 1.423 second using v1.01-cache-2.11-cpan-71847e10f99 )