HTML-HTML5-Outline
view release on metacpan or search on metacpan
lib/HTML/HTML5/Outline.pm view on Meta::CPAN
my $langtag = "
($language)
($s ( $script ) )?
($s ( $region ) )?
($s ( $variantList ) )?
($s ( $extensionList ) )?
($s ( $privateUse ) )?
";
# Here is the final breakdown, with capturing groups for each of these components
# The variants, extensions, grandfathered, and private-use may have interior '-'
my $r = ($value_to_test =~
/^(
($langtag)
| ($privateUse)
| ($grandfathered)
)$/xi);
return $r;
}
1;
__END__
=head1 NAME
HTML::HTML5::Outline - implementation of the HTML5 Outline algorithm
=head1 SYNOPSIS
use JSON;
use HTML::HTML5::Outline;
my $html = <<'HTML';
<!doctype html>
<h1>Hello</h1>
<h2>World</h2>
<h1>Good Morning</h1>
<h2>Vietnam</h2>
HTML
my $outline = HTML::HTML5::Outline->new($html);
print to_json($outline->to_hashref, {pretty=>1,canonical=>1});
=head1 DESCRIPTION
This is an implementation of the HTML5 Outline algorithm, as per
L<http://www.w3.org/TR/html5/sections.html#outlines>.
The module can output a JSON-friendly hashref, or an RDF model.
=head2 Constructor
=over
=item * C<< HTML::HTML5::Outline->new($html, %options) >>
Construct a new outline. C<< $html >> is the HTML to generate an outline from,
either as an HTML or XHTML string, or as an L<XML::LibXML::Document> object.
Options:
=over
=item * B<default_language> - default language to assume text is in when no
lang/xml:lang attribute is available. e.g. 'en-gb'.
=item * B<element_subjects> - rather advanced feature that doesn't bear
explaining. See USE WITH RDF::RDFA::PARSER for an example.
=item * B<microformats> - support C<< <ul class="xoxo"> >>,
C<< <ol class="xoxo"> >> and C<< <whatever class="figure"> >> as
sectioning elements (like C<< <section> >>, C<< <figure> >>, etc).
Boolean, defaults to false.
=item * B<parser> - 'html' (default) or 'xml' - choose the parser to use for
XHTML/HTML. If the constructor is passed an XML::LibXML::Document, this is
ignored.
=item * B<suppress_collections> - allows rdf:List stuff to be suppressed
from RDF output. RDF output - especially in Turtle format - looks somewhat
nicer without them, but if you care about the order of headings and sections,
then you'll want them. Boolean, defaults to false.
=item * B<uri> - the document URI for resolving relative URI references.
Only really used by the RDF output.
=back
=back
=head2 Object Methods
=over
=item * C<< to_hashref >>
Returns data as a nested hashref/arrayref structure. Dump it as JSON and
you'll figure out the format pretty easily.
=item * C<< to_rdf >>
Returns data as a n L<RDF::Trine::Model>. Requires RDF::Trine to be
installed. Otherwise this method won't exist.
=item * C<< primary_outlinee >>
Returns a L<HTML::HTML5::Outline::Outlinee> element representing the
outline for the page.
=back
=head2 Class Methods
=over
=item * C<< has_rdf >>
Indicates whether the C<< to_rdf >> object method exists.
( run in 0.716 second using v1.01-cache-2.11-cpan-4991d5b9bd9 )