HTML-Differences

 view release on metacpan or  search on metacpan

lib/HTML/Differences.pm  view on Meta::CPAN

Start tags are normalized so that attributes appear in sorted order, and all
quotes are converted to double quotes, with one space before each
attribute. Self-closing tags (like C<< <hr/> >>) are converted to their
simpler form (C<< <hr> >>).

Note that because L<HTML::Parser> decodes HTML entities inside attribute
values, this module cannot distinguish between two attributes where one
contains an entity and one does not.

Missing end tags I<are not> added, and will show up in the diff.

Comments are included by default, but you can pass a flag to ignore them.

=head1 IMPORTABLE SUBROUTINES

This module offers two optionally importable subroutines. Nothing is exported
by default.

=head2 html_text_diff( $html1, $html2, %options )

This subroutine uses L<Text::Diff>'s C<diff()> subroutine to provide a string
version of the diff between the two pieces of HTML provided.

The HTML can be passed as a plain scalar or as a reference to a scalar.

After the two HTML parameters, you can pass key/value pairs as options:

=over 4

=item * ignore_comments

If this is true, then comments are ignored for the purpose of the diff. This
defaults to false.

=item * style

The style for the diff. This defaults to "Table". See L<Text::Diff> for the
available options.

=item * context

The amount of context to show in the diff. This defaults to C<2**31> to
include all the context. You can set this to some smaller value if you prefer.

=back

=head2 diffable_html( $html1, $html2, %options )

This returns an array reference of strings suitable for passing to any of
L<Algorithm::Diff>'s methods or exported subroutines.

The only option currently accepted is C<ignore_comments>.

=head1 WHY THIS MODULE EXISTS

There are a couple other modules out there that do HTML diffs, so why write
this one?

The L<HTML::Diff> module uses regexes to parse HTML. This is crazy.

The L<Test::HTML::Differences> module attempts to fix up the HTML a little too
much for my purposes. It ends up ignoring missing end tags or breaking on them
in various ways.

=head1 AUTHOR

Dave Rolsky <autarch@urth.org>

=head1 COPYRIGHT AND LICENSE

This software is Copyright (c) 2015 by Dave Rolsky.

This is free software, licensed under:

  The Artistic License 2.0 (GPL Compatible)

=cut



( run in 1.864 second using v1.01-cache-2.11-cpan-39bf76dae61 )