Mojo-DOM58

 view release on metacpan or  search on metacpan

README  view on Meta::CPAN


SYNOPSIS

      use Mojo::DOM58;
    
      # Parse
      my $dom = Mojo::DOM58->new('<div><p id="a">Test</p><p id="b">123</p></div>');
    
      # Find
      say $dom->at('#b')->text;
      say $dom->find('p')->map('text')->join("\n");
      say $dom->find('[id]')->map(attr => 'id')->join("\n");
    
      # Iterate
      $dom->find('p[id]')->reverse->each(sub { say $_->{id} });
    
      # Loop
      for my $e ($dom->find('p[id]')->each) {
        say $e->{id}, ':', $e->text;
      }
    
      # Modify
      $dom->find('div p')->last->append('<p id="c">456</p>');
      $dom->at('#c')->prepend($dom->new_tag('p', id => 'd', '789'));
      $dom->find(':not(p)')->map('strip');
    
      # Render
      say "$dom";

DESCRIPTION

    Mojo::DOM58 is a minimalistic and relaxed pure-perl HTML/XML DOM parser
    based on Mojo::DOM. It supports the HTML Living Standard
    <https://html.spec.whatwg.org/> and Extensible Markup Language (XML)
    1.0 <https://www.w3.org/TR/xml/>, and matching based on CSS3 selectors
    <https://www.w3.org/TR/selectors/>. It will even try to interpret
    broken HTML and XML, so you should not use it for validation.

FORK INFO

    Mojo::DOM58 is a fork of Mojo::DOM and tracks features and fixes to
    stay closely compatible with upstream. It differs only in the
    standalone format and compatibility with Perl 5.8. Any bugs or patches
    not related to these changes should be reported directly to the
    Mojolicious issue tracker.

    This release of Mojo::DOM58 is up to date with version 9.40 of
    Mojolicious.

NODES AND ELEMENTS

    When we parse an HTML/XML fragment, it gets turned into a tree of
    nodes.

      <!DOCTYPE html>
      <html>
        <head><title>Hello</title></head>
        <body>World!</body>
      </html>

    There are currently eight different kinds of nodes, cdata, comment,
    doctype, pi, raw, root, tag and text. Elements are nodes of the type
    tag.

      root
      |- doctype (html)
      +- tag (html)
         |- tag (head)
         |  +- tag (title)
         |     +- raw (Hello)
         +- tag (body)
            +- text (World!)

    While all node types are represented as Mojo::DOM58 objects, some
    methods like "attr" and "namespace" only apply to elements.

HTML AND XML

    Mojo::DOM58 defaults to HTML semantics, that means all tags and
    attribute names are lowercased and selectors need to be lowercase as
    well.

      # HTML semantics
      my $dom = Mojo::DOM58->new('<P ID="greeting">Hi!</P>');
      say $dom->at('p[id]')->text;

    If an XML declaration is found, the parser will automatically switch
    into XML mode and everything becomes case-sensitive.

      # XML semantics
      my $dom = Mojo::DOM58->new('<?xml version="1.0"?><P ID="greeting">Hi!</P>');
      say $dom->at('P[ID]')->text;

    HTML or XML semantics can also be forced with the "xml" method.

      # Force HTML semantics
      my $dom = Mojo::DOM58->new->xml(0)->parse('<P ID="greeting">Hi!</P>');
      say $dom->at('p[id]')->text;
    
      # Force XML semantics
      my $dom = Mojo::DOM58->new->xml(1)->parse('<P ID="greeting">Hi!</P>');
      say $dom->at('P[ID]')->text;

SELECTORS

    Mojo::DOM58 uses a CSS selector engine based on Mojo::DOM::CSS. All CSS
    selectors that make sense for a standalone parser are supported.

    *

      Any element.

        my $all = $dom->find('*');

    E

      An element of type E.

        my $title = $dom->at('title');

    E[foo]

README  view on Meta::CPAN


      my $parent = $dom->strip;

    Remove this element while preserving its content and return "parent".

      # "<div>Test</div>"
      $dom->parse('<div><h1>Test</h1></div>')->at('h1')->strip;

 tag

      my $tag = $dom->tag;
      $dom    = $dom->tag('div');

    This element's tag name.

      # List tag names of child elements
      say $dom->children->map('tag')->join("\n");

 tap

      $dom = $dom->tap(sub {...});

    Equivalent to "tap" in Mojo::Base.

 text

      my $text = $dom->text;

    Extract text content from this element only (not including child
    elements).

      # "bar"
      $dom->parse("<div>foo<p>bar</p>baz</div>")->at('p')->text;
    
      # "foo\nbaz\n"
      $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->at('div')->text;

 to_string

      my $str = $dom->to_string;

    Render this node and its content to HTML/XML.

      # "<b>Test</b>"
      $dom->parse('<div><b>Test</b></div>')->at('div b')->to_string;

    To extract text content from all descendant nodes, see "all_text".

 tree

      my $tree = $dom->tree;
      $dom     = $dom->tree(['root']);

    Document Object Model. Note that this structure should only be used
    very carefully since it is very dynamic.

 type

      my $type = $dom->type;

    This node's type, usually cdata, comment, doctype, pi, raw, root, tag
    or text.

      # "cdata"
      $dom->parse('<![CDATA[Test]]>')->child_nodes->first->type;
    
      # "comment"
      $dom->parse('<!-- Test -->')->child_nodes->first->type;
    
      # "doctype"
      $dom->parse('<!DOCTYPE html>')->child_nodes->first->type;
    
      # "pi"
      $dom->parse('<?xml version="1.0"?>')->child_nodes->first->type;
    
      # "raw"
      $dom->parse('<title>Test</title>')->at('title')->child_nodes->first->type;
    
      # "root"
      $dom->parse('<p>Test</p>')->type;
    
      # "tag"
      $dom->parse('<p>Test</p>')->at('p')->type;
    
      # "text"
      $dom->parse('<p>Test</p>')->at('p')->child_nodes->first->type;

 val

      my $value = $dom->val;

    Extract value from form element (such as button, input, option, select
    and textarea), or return undef if this element has no value. In the
    case of select with multiple attribute, find option elements with
    selected attribute and return an array reference with all values, or
    undef if none could be found.

      # "a"
      $dom->parse('<input name=test value=a>')->at('input')->val;
    
      # "b"
      $dom->parse('<textarea>b</textarea>')->at('textarea')->val;
    
      # "c"
      $dom->parse('<option value="c">Test</option>')->at('option')->val;
    
      # "d"
      $dom->parse('<select><option selected>d</option></select>')
        ->at('select')->val;
    
      # "e"
      $dom->parse('<select multiple><option selected>e</option></select>')
        ->at('select')->val->[0];
    
      # "on"
      $dom->parse('<input name=test type=checkbox>')->at('input')->val;

 with_roles

      my $new_class = Mojo::DOM58->with_roles('Mojo::DOM58::Role::One');
      my $new_class = Mojo::DOM58->with_roles('+One', '+Two');
      $dom          = $dom->with_roles('+One', '+Two');

    Equivalent to "with_roles" in Mojo::Base. Note that role support



( run in 1.295 second using v1.01-cache-2.11-cpan-13bb782fe5a )