HTML-HTML5-Outline

 view release on metacpan or  search on metacpan

lib/HTML/HTML5/Outline.pm  view on Meta::CPAN

	my $langtag = "
			($language)
			($s ( $script ) )?
			($s ( $region ) )?
			($s ( $variantList ) )?
			($s ( $extensionList ) )?
			($s ( $privateUse ) )?
			";

	# Here is the final breakdown, with capturing groups for each of these components
	# The variants, extensions, grandfathered, and private-use may have interior '-'
	
	my $r = ($value_to_test =~ 
		/^(
			($langtag)
		 | ($privateUse)
		 | ($grandfathered)
		 )$/xi);
	
	return $r;
}

1;

__END__

=head1 NAME

HTML::HTML5::Outline - implementation of the HTML5 Outline algorithm

=head1 SYNOPSIS

	use JSON;
	use HTML::HTML5::Outline;
	
	my $html = <<'HTML';
	<!doctype html>
	<h1>Hello</h1>
	<h2>World</h2>
	<h1>Good Morning</h1>
	<h2>Vietnam</h2>
	HTML
	
	my $outline = HTML::HTML5::Outline->new($html);
	print to_json($outline->to_hashref, {pretty=>1,canonical=>1});

=head1 DESCRIPTION

This is an implementation of the HTML5 Outline algorithm, as per
L<http://www.w3.org/TR/html5/sections.html#outlines>.

The module can output a JSON-friendly hashref, or an RDF model.

=head2 Constructor

=over

=item * C<< HTML::HTML5::Outline->new($html, %options) >>

Construct a new outline. C<< $html >> is the HTML to generate an outline from,
either as an HTML or XHTML string, or as an L<XML::LibXML::Document> object.

Options:

=over

=item * B<default_language> - default language to assume text is in when no
lang/xml:lang attribute is available. e.g. 'en-gb'.

=item * B<element_subjects> - rather advanced feature that doesn't bear
explaining. See USE WITH RDF::RDFA::PARSER for an example.

=item * B<microformats> - support C<< <ul class="xoxo"> >>,
C<< <ol class="xoxo"> >> and C<< <whatever class="figure"> >> as
sectioning elements (like C<< <section> >>, C<< <figure> >>, etc).
Boolean, defaults to false.

=item * B<parser> - 'html' (default) or 'xml' - choose the parser to use for
XHTML/HTML. If the constructor is passed an XML::LibXML::Document, this is
ignored.

=item * B<suppress_collections> - allows rdf:List stuff to be suppressed
from RDF output. RDF output - especially in Turtle format - looks somewhat
nicer without them, but if you care about the order of headings and sections,
then you'll want them. Boolean, defaults to false.

=item * B<uri> - the document URI for resolving relative URI references.
Only really used by the RDF output.

=back

=back

=head2 Object Methods

=over

=item * C<< to_hashref >>

Returns data as a nested hashref/arrayref structure. Dump it as JSON and
you'll figure out the format pretty easily.

=item * C<< to_rdf >>

Returns data as a n L<RDF::Trine::Model>. Requires RDF::Trine to be
installed. Otherwise this method won't exist.

=item * C<< primary_outlinee >>

Returns a L<HTML::HTML5::Outline::Outlinee> element representing the
outline for the page.

=back

=head2 Class Methods

=over

=item * C<< has_rdf >>

Indicates whether the C<< to_rdf >> object method exists.



( run in 0.716 second using v1.01-cache-2.11-cpan-4991d5b9bd9 )