view release on metacpan or search on metacpan
t/08_spaces.t
t/09_indent.t
t/10_empty.t
t/11_prolog.t
t/12_removeblanks.t
t/13_includes.t
t/14_comments.t
t/15_entities.t
t/16_keep.t
t/17_siblingscomments.t
t/18_siblingscdata.t
t/19_siblingspi.t
t/20_preserveattributes.t
t/21_nestedcontent.t
t/22_dtdprotection.t
t/23_scale.t
t/data/entitynotag.xml
t/data/entitywithtag.xml
t/data/inc.xml
t/data/xinclude.xml
META.yml
- Merge elements when empty
- Remove DTD (configurable).
- Remove processing instructions (configurable)
- Remove comments (configurable).
- Remove CDATA (configurable).
In addition, the minifier will drop every blanks between the first level children.
What you can find between first level children is not supposed to be meaningful data then we we can safely remove formatting here.
For instance we can remove a carriage return between prolog and a processing instruction (or even inside a DTD).
In addition again, the minifier will _smartly_ remove blanks between tags. By _smart_ I mean that it will not remove blanks if we are in a leaf (more chances to be meaningful blanks) or if the node contains something that will persist (a _not removed...
If there is no DTD (very often), we are blind and simply use the approach I just described above (keep blanks in leafs, remove blanks in nodes if all siblings contains only blanks).
Everything listed above is the default and should be perceived as almost lossyless minification in term of semantic (for humans).
It's not completely if you consider these things as data, but in this case you simply can't minify as you can't touch anything ;)
## EXTRA MINIFICATION
In addition, you could enable mode **aggressive**, **destructive** or **insane** to remove characters in the text nodes (sort of "cleaning") :
- **keep\_comments**
Keep comments, by default they are removed.
A comment is something like :
```
<!-- comment -->
```
- **keep\_cdata**
Keep cdata, by default they are removed.
A CDATA is something like :
```perl
<![CDATA[ my cdata ]]>
```
- **keep\_pi**
Keep processing instructions.
A processing instruction is something like :
```
<?xml-stylesheet href="style.css"/>
lib/XML/Minify.pm view on Meta::CPAN
}
#Â Let me explain, we could have text nodes basically everywhere, and we don't know if whitespaces are ignorable or not.
#Â As we want to minify the xml, we can't just keep all blanks, because it is generally indentation or spaces that could be ignored.
#Â Here is the strategy :
#Â A. If we have <name> </name> we should keep it anyway (unless forced with argument)
# B. If we have </name> </person> we should *maybe* remove (in this case parent node contains more than one child node : text node + element node)
# C. If we have <person> <name> we should *maybe* remove it (in this case parent node contains more than one child node : text node + element node)
# D. If we have </person> <person> we should *maybe* remove it (in this case parent node contains more than one child node : text node + element node)
#Â B, C, D : remove... unless explicitely declared in DTD as potential #PCDATA container OR unless it contains something...
# *something* is a comment (not removed), some other text not empty, some cdata.
# Imagine </name> <!-- comment --> some text </person> then we don't want to remove spaces in the first text node
# Same with </name> <!-- comment --> </person>
# But if comments are removed then the latter piece of code will become </name></person>
my $empty = 1;
my $childbak = $child;
my @siblings = ();
# We want to inspect siblings to the right until we reach an element
while($child = $child->nextSibling) {
lib/XML/Minify.pm view on Meta::CPAN
if($child->data =~ m/[^ \t\r\n]/) {
# Not empty
$empty = 0;
last;
}
}
if($child->nodeType eq XML_COMMENT_NODE and $opt{keep_comments}) {
$empty = 0;
last;
}
if($child->nodeType eq XML_CDATA_SECTION_NODE and $opt{keep_cdata}) {
$empty = 0;
last;
}
if($child->nodeType eq XML_PI_NODE and $opt{keep_pi}) {
$empty = 0;
last;
}
# Entity refs : we can choose to expand or not... but not to drop them
if($child->nodeType eq XML_ENTITY_REF_NODE) {
$empty = 0;
lib/XML/Minify.pm view on Meta::CPAN
$outnode->appendText($str);
} elsif($child->nodeType eq XML_ENTITY_REF_NODE) {
#Â Configuration will be done above when creating document
my $er = $doc->createEntityReference($child->getName());
$outnode->addChild($er);
} elsif($child->nodeType eq XML_COMMENT_NODE) {
#Â Configurable with keep_comments
my $com = $doc->createComment($child->getData());
$opt{keep_comments} and $outnode->addChild($com);
} elsif($child->nodeType eq XML_CDATA_SECTION_NODE) {
#Â Configurable with keep_cdata
#my $cdata = $child->cloneNode(1);
my $cdata = $doc->createCDATASection($child->getData());
$opt{keep_cdata} and $outnode->addChild($cdata);
} elsif($child->nodeType eq XML_PI_NODE) {
#Â Configurable with keep_pi
#my $pi = $child->cloneNode(1);
my $pi = $doc->createPI($child->nodeName, $child->getData());
$opt{keep_pi} and $outnode->addChild($pi);
} elsif($child->nodeType eq XML_ELEMENT_NODE) {
$outnode->addChild(traverse($child, $outnode));
}
}
return $outnode;
lib/XML/Minify.pm view on Meta::CPAN
=item Remove comments (configurable).
=item Remove CDATA (configurable).
=back
In addition, the minifier will drop every blanks between the first level children.
What you can find between first level children is not supposed to be meaningful data then we we can safely remove formatting here.
For instance we can remove a carriage return between prolog and a processing instruction (or even inside a DTD).
In addition again, the minifier will I<smartly> remove blanks between tags. By I<smart> I mean that it will not remove blanks if we are in a leaf (more chances to be meaningful blanks) or if the node contains something that will persist (a I<not remo...
If there is no DTD (very often), we are blind and simply use the approach I just described above (keep blanks in leafs, remove blanks in nodes if all siblings contains only blanks).
Everything listed above is the default and should be perceived as almost lossyless minification in term of semantic (for humans).
It's not completely if you consider these things as data, but in this case you simply can't minify as you can't touch anything ;)
=head2 EXTRA MINIFICATION
lib/XML/Minify.pm view on Meta::CPAN
It is aggressive and therefore lossy compression.
=item B<keep_comments>
Keep comments, by default they are removed.
A comment is something like :
<!-- comment -->
=item B<keep_cdata>
Keep cdata, by default they are removed.
A CDATA is something like :
<![CDATA[ my cdata ]]>
=item B<keep_pi>
Keep processing instructions.
A processing instruction is something like :
<?xml-stylesheet href="style.css"/>
=item B<keep_dtd>
scripts/xml-minifier view on Meta::CPAN
"process-xincludes" => \$opt{process_xincludes},
"remove-blanks-start" => \$opt{remove_blanks_start},
"remove-blanks-end" => \$opt{remove_blanks_end},
"remove-spaces-line-start" => \$opt{remove_spaces_line_start},
"remove-spaces-line-end" => \$opt{remove_spaces_line_end},
"remove-indent" => \$opt{remove_spaces_line_start},
"remove-empty-text" => \$opt{remove_empty_text},
"remove-cr-lf-everywhere" => \$opt{remove_cr_lf_everywhere},
"remove-spaces-everywhere" => \$opt{remove_spaces_everywhere},
"keep-comments" => \$opt{keep_comments},
"keep-cdata" => \$opt{keep_cdatas},
"keep-pi" => \$opt{keep_pi},
"keep-dtd" => \$opt{keep_dtd},
"ignore-dtd" => \$opt{ignore_dtd},
"no-prolog" => \$opt{no_prolog},
"version=s" => \$opt{version},
"encoding=s" => \$opt{encoding},
"aggressive" => \$opt{aggressive},
"agressive" => \$opt{aggressive},
"destructive" => \$opt{destructive},
"insane" => \$opt{insane},
scripts/xml-minifier view on Meta::CPAN
--remove-spaces-line-end remove spaces/tabs after text (each line)
--remove-indent remove spaces/tabs before text (each line
--remove-empty-text remove (pseudo) empty text
--remove-cr-lf-everywhere remove cr and lf everywhere
--keep-comments keep comments
--keep-cdata keep cdata
--keep-pi keep processing instructions
--keep-dtd keep dtd
--ignore-dtd ignore dtd
--no-prolog remove prolog (version and encoding)
--version specify version for the xml
scripts/xml-minifier view on Meta::CPAN
=item B<--remove-cr-lf-everywhere>
Remove carriage returns and line feed everywhere (inside text !).
For instance <tag>foo\nbar</tag> will become <tag>foobar</tag>
Very aggressive and therefore lossy compression.
=item B<--keep-comments>
Keep comments, by default they are removed. A comment is like <!-- comment -->
=item B<--keep-cdata>
Keep cdata, by default they are removed. A CDATA is like <![CDATA[ my cdata ]]>
=item B<--keep-pi>
Keep processing instructions. A processing instruction is like <?xml-stylesheet href="style.css"/>
=item B<--keep-dtd>
Keep DTD.
=item B<--ignore-dtd>
t/16_keep.t view on Meta::CPAN
<tag>
</tag></catalog>
END
my $keepdtd = << "END";
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.1.2//EN" "http://www.oasis-open.org/docbook/xml/4.0/docbookx.dtd" [<!ELEMENT element-name EMPTY>]><catalog xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xi="http://www.w3.org/2001/XInclude"><bo...
</tag></catalog>
END
my $keepcdata = << "END";
<catalog xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xi="http://www.w3.org/2001/XInclude"><book/>
<![CDATA[ ...]]>
<tag>
</tag></catalog>
END
t/16_keep.t view on Meta::CPAN
# Answer - PI and DTD are first level children and we always remove cr lf in and between first level children
# Removed comment only remove comment, not text around therefore not the carriage return after it
#
# Question - We can have a pi not first level child ?
# Answer - Nothing seems to say the contrary (neither xmllint nor doc... OK I only spent 5 mins to try to find doc about it xD)
chomp $maxi;
chomp $keepcomments;
chomp $keeppi;
chomp $keepdtd;
chomp $keepcdata;
is(minify($maxi, no_prolog => 1, keep_comments => 1, ignore_dtd => 1), $keepcomments, "Keep comments");
is(minify($maxi, no_prolog => 1, keep_pi => 1, ignore_dtd => 1), $keeppi, "Keep pi");
is(minify($maxi, no_prolog => 1, keep_dtd => 1, ignore_dtd => 1), $keepdtd, "Keep dtd");
is(minify($maxi, no_prolog => 1, keep_cdata => 1, ignore_dtd => 1), $keepcdata, "Keep cdata");
done_testing;
t/18_siblingscdata.t view on Meta::CPAN
use warnings;
use Test::More 0.98;
use XML::Minify qw(minify);
my $maxi = << "END";
<root> Not empty <![CDATA[ mytext ]]> <keepblanks> </keepblanks> <![CDATA[ mytext ]]> </root>
END
my $minikeepcdata = << "END";
<root> Not empty <![CDATA[ mytext ]]> <keepblanks> </keepblanks> <![CDATA[ mytext ]]> </root>
END
my $minidropcdata = << "END";
<root> Not empty <keepblanks> </keepblanks></root>
END
chomp $maxi;
chomp $minikeepcdata;
chomp $minidropcdata;
is(minify($maxi, no_prolog => 1, keep_cdata => 1), $minikeepcdata, "Keep cdata, nothing can be done");
is(minify($maxi, no_prolog => 1, keep_cdata => 0), $minidropcdata, "Remove cdata therefore can clean some blanks");
done_testing;
t/19_siblingspi.t view on Meta::CPAN
use warnings;
use Test::More 0.98;
use XML::Minify qw(minify);
my $maxi = << "END";
<root> Not empty <![CDATA[ mytext ]]> <keepblanks> </keepblanks> <![CDATA[ mytext ]]> </root>
END
my $minikeepcdata = << "END";
<root> Not empty <![CDATA[ mytext ]]> <keepblanks> </keepblanks> <![CDATA[ mytext ]]> </root>
END
my $minidropcdata = << "END";
<root> Not empty <keepblanks> </keepblanks></root>
END
chomp $maxi;
chomp $minikeepcdata;
chomp $minidropcdata;
is(minify($maxi, no_prolog => 1, keep_cdata => 1), $minikeepcdata, "Keep cdata, nothing can be done");
is(minify($maxi, no_prolog => 1, keep_cdata => 0), $minidropcdata, "Remove cdata therefore can clean some blanks");
done_testing;
t/20_preserveattributes.t view on Meta::CPAN
<tag key="value">
</tag></catalog>
END
my $keepdtd = << "END";
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.1.2//EN" "http://www.oasis-open.org/docbook/xml/4.0/docbookx.dtd" [<!ELEMENT element-name EMPTY>]><catalog xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xi="http://www.w3.org/2001/XInclude" foo...
</tag></catalog>
END
my $keepcdata = << "END";
<catalog xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xi="http://www.w3.org/2001/XInclude" foo="bar"><book bar="baz"/>
<![CDATA[ ...]]>
<tag key="value">
</tag></catalog>
END
chomp $maxi;
chomp $keepcomments;
chomp $keeppi;
chomp $keepdtd;
chomp $keepcdata;
is(minify($maxi, no_prolog => 1, keep_comments => 1, ignore_dtd => 1), $keepcomments, "Keep comments");
is(minify($maxi, no_prolog => 1, keep_pi => 1, ignore_dtd => 1), $keeppi, "Keep pi");
is(minify($maxi, no_prolog => 1, keep_dtd => 1, ignore_dtd => 1), $keepdtd, "Keep dtd");
is(minify($maxi, no_prolog => 1, keep_cdata => 1, ignore_dtd => 1), $keepcdata, "Keep cdata");
done_testing;
t/21_nestedcontent.t view on Meta::CPAN
use strict;
use warnings;
use Test::More 0.98;
use XML::Minify qw(minify);
my $cdataincomment = << "END";
<catalog xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xi="http://www.w3.org/2001/XInclude">
<book/>
<!-- <![CDATA[ ...]]> -->
</catalog>
END
my $commentincdata = << "END";
<catalog xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xi="http://www.w3.org/2001/XInclude">
<book/>
<![CDATA[ <!-- Comment --> ]]>
</catalog>
END
my $minikeepcdataincomment = << "end";
<catalog xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xi="http://www.w3.org/2001/XInclude"><book/>
<!-- <![CDATA[ ...]]> -->
</catalog>
end
my $minidropcdataincomment = << "end";
<catalog xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xi="http://www.w3.org/2001/XInclude"><book/></catalog>
end
my $minikeepcommentincdata = << "END";
<catalog xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xi="http://www.w3.org/2001/XInclude"><book/>
<![CDATA[ <!-- Comment --> ]]>
</catalog>
END
my $minidropcommentincdata = << "END";
<catalog xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xi="http://www.w3.org/2001/XInclude"><book/></catalog>
END
chomp $cdataincomment;
chomp $commentincdata;
chomp $minikeepcdataincomment;
chomp $minidropcdataincomment;
chomp $minikeepcommentincdata;
chomp $minidropcommentincdata;
is(minify($cdataincomment, no_prolog => 1, keep_comments => 1), $minikeepcdataincomment, "Keep cdata in comment");
is(minify($cdataincomment, no_prolog => 1, keep_comments => 0), $minidropcdataincomment, "Remove cdata with comment (1)");
is(minify($cdataincomment, no_prolog => 1, keep_comments => 1, keep_cdata => 0), $minikeepcdataincomment, "Keep cdata as protected by comment");
is(minify($cdataincomment, no_prolog => 1, keep_comments => 0, keep_cdata => 1), $minidropcdataincomment, "Remove cdata with comment (2)");
is(minify($commentincdata, no_prolog => 1, keep_cdata => 1), $minikeepcommentincdata, "Keep comment in cdata");
is(minify($commentincdata, no_prolog => 1, keep_cdata => 0), $minidropcommentincdata, "Remove comment with cdata (1)");
is(minify($commentincdata, no_prolog => 1, keep_cdata => 1, keep_comments => 0), $minikeepcommentincdata, "Keep comment as protected by cdata");
is(minify($commentincdata, no_prolog => 1, keep_cdata => 0, keep_comments => 1), $minidropcommentincdata, "Remove comment with cdata (2)");
done_testing;