HTML-Differences
view release on metacpan or search on metacpan
lib/HTML/Differences.pm view on Meta::CPAN
Start tags are normalized so that attributes appear in sorted order, and all
quotes are converted to double quotes, with one space before each
attribute. Self-closing tags (like C<< <hr/> >>) are converted to their
simpler form (C<< <hr> >>).
Note that because L<HTML::Parser> decodes HTML entities inside attribute
values, this module cannot distinguish between two attributes where one
contains an entity and one does not.
Missing end tags I<are not> added, and will show up in the diff.
Comments are included by default, but you can pass a flag to ignore them.
=head1 IMPORTABLE SUBROUTINES
This module offers two optionally importable subroutines. Nothing is exported
by default.
=head2 html_text_diff( $html1, $html2, %options )
This subroutine uses L<Text::Diff>'s C<diff()> subroutine to provide a string
version of the diff between the two pieces of HTML provided.
The HTML can be passed as a plain scalar or as a reference to a scalar.
After the two HTML parameters, you can pass key/value pairs as options:
=over 4
=item * ignore_comments
If this is true, then comments are ignored for the purpose of the diff. This
defaults to false.
=item * style
The style for the diff. This defaults to "Table". See L<Text::Diff> for the
available options.
=item * context
The amount of context to show in the diff. This defaults to C<2**31> to
include all the context. You can set this to some smaller value if you prefer.
=back
=head2 diffable_html( $html1, $html2, %options )
This returns an array reference of strings suitable for passing to any of
L<Algorithm::Diff>'s methods or exported subroutines.
The only option currently accepted is C<ignore_comments>.
=head1 WHY THIS MODULE EXISTS
There are a couple other modules out there that do HTML diffs, so why write
this one?
The L<HTML::Diff> module uses regexes to parse HTML. This is crazy.
The L<Test::HTML::Differences> module attempts to fix up the HTML a little too
much for my purposes. It ends up ignoring missing end tags or breaking on them
in various ways.
=head1 AUTHOR
Dave Rolsky <autarch@urth.org>
=head1 COPYRIGHT AND LICENSE
This software is Copyright (c) 2015 by Dave Rolsky.
This is free software, licensed under:
The Artistic License 2.0 (GPL Compatible)
=cut
( run in 1.864 second using v1.01-cache-2.11-cpan-39bf76dae61 )