App-Chronicle

 view release on metacpan or  search on metacpan

lib/Chronicle/Plugin/Tidy.pm  view on Meta::CPAN


=head1 NAME

Chronicle::Plugin::Tidy - Attempt to fix malformed HTML.

=head1 DESCRIPTION

This plugin is designed to prevent malformed HTML from being generated.

It does that by using the L<HTML::TreeBuilder> module to parse the
HTML that has been inserted into the SQLite database and then rewalks
the tree to try to fix broken entries.

As an example the following bogus HTML will be fixed:

=for example begin

    <p>This is a line of text.</P>

=for example end

Similarly tags that are not closed will be fixed up.

=cut

=head1 METHODS

Now follows documentation on the available methods.

=cut

package Chronicle::Plugin::Tidy;


use strict;
use warnings;


our $VERSION = "5.1.7";


=head2 on_insert

The C<on_insert> method is automatically invoked when a new blog post
must be inserted into the SQLite database, that might be because a post
is new, or because it has been updated.

The method is designed to return an updated blog-post structure,
after performing any massaging required.  If the method returns undef
then the post is not inserted.

Here we walk the HTML entry, which might have been written by hand
or which might have been created via L<Chronicle::Plugin::Markdown>,
or some other plugin, and try to ensure it is well-formed.

=cut

sub on_insert
{
    my ( $self, %args ) = (@_);

    my $conf = $args{ 'config' };
    my $data = $args{ 'data' };
    my $html = $data->{ 'body' };

    #
    #  Load the HTML::TreeBuilder module, if present.
    #
    foreach my $mod (qw! HTML::TreeBuilder !)
    {
        my $test = "use $mod;";
        ## no critic (Eval)
        eval($test);
        ## use critic

        if ($@)
        {
            return ($data);
        }
    }

    my $tree = HTML::TreeBuilder->new();
    $tree->ignore_unknown(0);
    $tree->ignore_ignorable_whitespace(0);
    $tree->no_space_compacting(1);
    $tree->p_strict(1);
    $tree->store_comments(0);
    $tree->store_declarations(0);
    $tree->store_pis(0);
    $tree->parse_content($html);

    my $txt;

    my @nodes = $tree->disembowel();
    foreach my $node (@nodes)
    {
        if ( ref $node )
        {
            $txt .= $node->as_HTML( undef, '', {} );
            chomp $txt;
            $node->delete();
        }
        else
        {
            $txt .= $node;
        }
    }
    $tree->delete();

    #
    #  Update the body and return the updated post.
    #
    $data->{ 'body' } = $txt;
    return ($data);
}


=head2 _order

We want this plugin to be called I<after> the other plugins which
filter new entries - so that we can fix their broken HTML.

This method is present such that L<Module::Pluggable::Ordered> can
order our plugins.

=cut

sub _order
{
    return 999;
}


1;


=head1 LICENSE

This module is free software; you can redistribute it and/or modify it
under the terms of either:

a) the GNU General Public License as published by the Free Software
Foundation; either version 2, or (at your option) any later version,
or

b) the Perl "Artistic License".

=cut

=head1 AUTHOR

Steve Kemp <steve@steve.org.uk>

=cut



( run in 2.841 seconds using v1.01-cache-2.11-cpan-75ffa21a3d4 )