Mojo-DOM58
view release on metacpan or search on metacpan
SYNOPSIS
use Mojo::DOM58;
# Parse
my $dom = Mojo::DOM58->new('<div><p id="a">Test</p><p id="b">123</p></div>');
# Find
say $dom->at('#b')->text;
say $dom->find('p')->map('text')->join("\n");
say $dom->find('[id]')->map(attr => 'id')->join("\n");
# Iterate
$dom->find('p[id]')->reverse->each(sub { say $_->{id} });
# Loop
for my $e ($dom->find('p[id]')->each) {
say $e->{id}, ':', $e->text;
}
# Modify
$dom->find('div p')->last->append('<p id="c">456</p>');
$dom->at('#c')->prepend($dom->new_tag('p', id => 'd', '789'));
$dom->find(':not(p)')->map('strip');
# Render
say "$dom";
DESCRIPTION
Mojo::DOM58 is a minimalistic and relaxed pure-perl HTML/XML DOM parser
based on Mojo::DOM. It supports the HTML Living Standard
<https://html.spec.whatwg.org/> and Extensible Markup Language (XML)
1.0 <https://www.w3.org/TR/xml/>, and matching based on CSS3 selectors
<https://www.w3.org/TR/selectors/>. It will even try to interpret
broken HTML and XML, so you should not use it for validation.
FORK INFO
Mojo::DOM58 is a fork of Mojo::DOM and tracks features and fixes to
stay closely compatible with upstream. It differs only in the
standalone format and compatibility with Perl 5.8. Any bugs or patches
not related to these changes should be reported directly to the
Mojolicious issue tracker.
This release of Mojo::DOM58 is up to date with version 9.40 of
Mojolicious.
NODES AND ELEMENTS
When we parse an HTML/XML fragment, it gets turned into a tree of
nodes.
<!DOCTYPE html>
<html>
<head><title>Hello</title></head>
<body>World!</body>
</html>
There are currently eight different kinds of nodes, cdata, comment,
doctype, pi, raw, root, tag and text. Elements are nodes of the type
tag.
root
|- doctype (html)
+- tag (html)
|- tag (head)
| +- tag (title)
| +- raw (Hello)
+- tag (body)
+- text (World!)
While all node types are represented as Mojo::DOM58 objects, some
methods like "attr" and "namespace" only apply to elements.
HTML AND XML
Mojo::DOM58 defaults to HTML semantics, that means all tags and
attribute names are lowercased and selectors need to be lowercase as
well.
# HTML semantics
my $dom = Mojo::DOM58->new('<P ID="greeting">Hi!</P>');
say $dom->at('p[id]')->text;
If an XML declaration is found, the parser will automatically switch
into XML mode and everything becomes case-sensitive.
# XML semantics
my $dom = Mojo::DOM58->new('<?xml version="1.0"?><P ID="greeting">Hi!</P>');
say $dom->at('P[ID]')->text;
HTML or XML semantics can also be forced with the "xml" method.
# Force HTML semantics
my $dom = Mojo::DOM58->new->xml(0)->parse('<P ID="greeting">Hi!</P>');
say $dom->at('p[id]')->text;
# Force XML semantics
my $dom = Mojo::DOM58->new->xml(1)->parse('<P ID="greeting">Hi!</P>');
say $dom->at('P[ID]')->text;
SELECTORS
Mojo::DOM58 uses a CSS selector engine based on Mojo::DOM::CSS. All CSS
selectors that make sense for a standalone parser are supported.
*
Any element.
my $all = $dom->find('*');
E
An element of type E.
my $title = $dom->at('title');
E[foo]
my $parent = $dom->strip;
Remove this element while preserving its content and return "parent".
# "<div>Test</div>"
$dom->parse('<div><h1>Test</h1></div>')->at('h1')->strip;
tag
my $tag = $dom->tag;
$dom = $dom->tag('div');
This element's tag name.
# List tag names of child elements
say $dom->children->map('tag')->join("\n");
tap
$dom = $dom->tap(sub {...});
Equivalent to "tap" in Mojo::Base.
text
my $text = $dom->text;
Extract text content from this element only (not including child
elements).
# "bar"
$dom->parse("<div>foo<p>bar</p>baz</div>")->at('p')->text;
# "foo\nbaz\n"
$dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->at('div')->text;
to_string
my $str = $dom->to_string;
Render this node and its content to HTML/XML.
# "<b>Test</b>"
$dom->parse('<div><b>Test</b></div>')->at('div b')->to_string;
To extract text content from all descendant nodes, see "all_text".
tree
my $tree = $dom->tree;
$dom = $dom->tree(['root']);
Document Object Model. Note that this structure should only be used
very carefully since it is very dynamic.
type
my $type = $dom->type;
This node's type, usually cdata, comment, doctype, pi, raw, root, tag
or text.
# "cdata"
$dom->parse('<![CDATA[Test]]>')->child_nodes->first->type;
# "comment"
$dom->parse('<!-- Test -->')->child_nodes->first->type;
# "doctype"
$dom->parse('<!DOCTYPE html>')->child_nodes->first->type;
# "pi"
$dom->parse('<?xml version="1.0"?>')->child_nodes->first->type;
# "raw"
$dom->parse('<title>Test</title>')->at('title')->child_nodes->first->type;
# "root"
$dom->parse('<p>Test</p>')->type;
# "tag"
$dom->parse('<p>Test</p>')->at('p')->type;
# "text"
$dom->parse('<p>Test</p>')->at('p')->child_nodes->first->type;
val
my $value = $dom->val;
Extract value from form element (such as button, input, option, select
and textarea), or return undef if this element has no value. In the
case of select with multiple attribute, find option elements with
selected attribute and return an array reference with all values, or
undef if none could be found.
# "a"
$dom->parse('<input name=test value=a>')->at('input')->val;
# "b"
$dom->parse('<textarea>b</textarea>')->at('textarea')->val;
# "c"
$dom->parse('<option value="c">Test</option>')->at('option')->val;
# "d"
$dom->parse('<select><option selected>d</option></select>')
->at('select')->val;
# "e"
$dom->parse('<select multiple><option selected>e</option></select>')
->at('select')->val->[0];
# "on"
$dom->parse('<input name=test type=checkbox>')->at('input')->val;
with_roles
my $new_class = Mojo::DOM58->with_roles('Mojo::DOM58::Role::One');
my $new_class = Mojo::DOM58->with_roles('+One', '+Two');
$dom = $dom->with_roles('+One', '+Two');
Equivalent to "with_roles" in Mojo::Base. Note that role support
( run in 1.295 second using v1.01-cache-2.11-cpan-13bb782fe5a )