XML-Twig
view release on metacpan or search on metacpan
on filehandles that were already open in >:utf-8 mode
spotted by Zed Pobre
http://rt.cpan.org/Ticket/Display.html?id=39849
- fixed: xml_pp now accepts all formating options available in XML::Twig
- fixed: RT #31664, element attributes are not preserving their order when
using elt->copy spotted, and fix provided by jbubbabrown
- fixed: RT #31832, wrapped link to xmltwig.com in L< > tag in the doc
spotted by Slaven Srezic
http://rt.cpan.org/Ticket/Display.html?id=31832
- fixed: RT #31833 doc fix, spotted by Slaven Srezic
- fixed: Makefile.PL doesn't nag the poor tester anymore when running with
$AUTOMATED_TESTING set
- fixed: bug calling set_text when using XML::Twig::XPath, spotted by Ted Sung
- fixed: improved speed when parsing big elements, RT#35672, reported by Seth
Viebrock (fi is to explicitely return null from the character handler,
instead of the text already parsed... a few hundred thousand times)
http://rt.cpan.org/Ticket/Display.html?id=35672
- fixed: RT #47257, minor doc bug, spotted by David Steinbrunner
http://rt.cpan.org/Ticket/Display.html?id=47257
- fixed: bug in navigation conditions of the form elt[text()=~ /text with 'or' or 'and'/]
- improved: speed, somewhat
-improved: put the project on github: http://github.com/mirod/xmltwig
3.32 - 2007-11-13 - minor maintenance release with a bug fix
- fixed: change to the regexp that parses XPath-like conditions so
it can accept leading non-ascii letters ([^\W\d] does not
work), not used in perl 5.005
- fixed: set use utf8 (except in 5.005), which gets rid of the dreaded
"SWASHNEW" error in 5.6.*, fixed things that then broke in 5.6.
3.31 - 2007-11-07 - minor maintenance release, fixing some tests
- fixed: fixes to stop tests from failing in various configurations
3.30 - 2007-11-06
- fixed: a couple of bugs in namespace handling, spotted by
Shlomo Yonas (see https://rt.cpan.org/Ticket/Display.html?id=27617
and http://www.perlmonks.org/?node_id=624830)
- added: the XML::Twig::Elt fields method which returns a list of
fields
- added: the normalize method in XML::Twig and XML::Twig::Elt,
which merge together consecutive pcdata elements. As much as
possible (so far after a cut, delete or erase), the twig is
kept normalized, eg there are no consecutive #PCDATA elements
in it. Suggestion of someone whose name (and emails) I can't
find at the moment.
- added: the indented_a / cvs format for pretty_print, that makes the
output friendly to line-oriented version control tools, as described
in http://tinyurl.com/2kwscq (RT #24954). Thanks to Sjur Moshagen
for a patch that I adapted to the current version.
- fixed: bug RT #25113: system entities were not properly resolved
if the XML file was not in the current directory. Thanks to
Dave Charness for the patch.
- added: the XML::Twig method finish_now that terminates parsing
immediately, without checking the rest of the XML. This feature was
half suggested by Nick Clayton
- added: the -s option to xml_split, which splits when the given
size is reached for a file, suggested by Radek Saturka.
- added: the -g option to xml_split, which groups elements to be
split, suggested and tested by Dhirendra Singh Kholia.
- added: the safe_parsefile_html and safe_parseurl_html methods,
and a --html option to xml_grep. Suggested by Bill Ricker.
- improved: by default xml_grep now skips non well-formed files, the
--strict option makes it die when it finds one
- fixed: a bunch of bugs in xml_grep
- fixed: a warning when using optional modules with a version
number that includes an _, spotted and fix suggested by
Bill Ricker.
- fixed: test failure on cygwin, thanks to Erik Rantapaa for the
patch.
- fixed: a bunch of typos in docs, RT #25836, spotted and fixed by David
Steinbrunner
- improved: re-use of XML::Twig objects for repetitive parsing. It
looks like it should be OK now , but I am sure I haven't tested
all cases yet (especially when DTDs and entities are involved).
- improved: HTML parsing; XML::Twig now tries to find the proper
encoding for the document (that's not done by HTML::TreeBuilder
at the moment).
-fixed: XML::Twig::Elt purge and flush methods now only purge/flush up to
the element, not up to the current element in the twig (duh!)
- fixed: bug in handlers of the form elt[string(subelt)="foo"] and
elt[string(subelt)=1] which did not work at all
- fixed: bug in parameter entity output, spotted by BenHopkins on
perlmonks (see http://www.perlmonks.org/?node_id=618360)
- fixed: bug in xml_string: options were not used
can now be called from a regular handler (it
always could but the docs did not say so,
thanks to kudra for noticing this). It can
also be called to ignore a parent of the current
element. There were bugs there, and the tree
was not built properly
- added: error message when an XPath query with
a leading / is used on a node that does not
belong to a whole twig (because it's been cut
or because the twig itself went out of scope)
- improved: when parsing HTML with error_context set, the
HTML is indented, in order to give better error
report
3.26 - 2006-07-01
- added: argument to -i in the Makefile to prevent
problem in win32
- added: XML::Twig::Elt former_next_sibling,
former_prev_sibling and former_parent methods
- squashed a memory leak when parsing html
(forgot to call delete on the HTML::Tree object)
- fixed: bug that caused XML::Twig to hang if
there was a syntax error in a predicate
(RT#19499, reported by Dan Dascalescu)
-improved: made start_tag and end_tag more consistent: they
now both return the empty string for comments,
PIs... (reported by Dan Dascalescu)
- added: parsefile_inplace and parsefile_html_inplace
methods (thanks to GrandFather on perlmonks)
- added: support to add css stylesheet in the
add_stylesheet method (thanks to Georgi Sotirov)
- patched tests to work on Win32
- added: set_inner_xml inner_xml and set_inner_html
methods
3.25 - 2006-05-10
- patched to work with perl 5.005!
- fixed: a bug in xml_pp when pretty printing a
file in place in a different file system
3.24 - 2006-05-09
- added: loading the text of entities stored in
separate files (using SYSTEM) when the (awfully
named!) expand_external_ents option is used.
Thanks to jhx for spotting this.
- changed: set_cdata, set_pi and set_comment so that
if you call them on an element of the wrong kind,
everything works as expected, instead of swallowing
silently the data. Bug spotted by cmccutcheon
- fixed: a whole bunch of things to make the module
run and the tests pass on VMS, thanks to Peter
(Stig) Edwards who reported bug RT #18655 and
provided a patch.
- fixed: bug on get_xpath( '/root[1]') expressions,
RT #18789 spotted by memfrob.
- added: the add_stylesheet method, that... adds a
stylesheet (xsl type is supported, let me know if
other types are needed) to a document.
- improved: allowed pasting PI/Comment elements before or after
the root of a document (see discussion at
http://perlmonks.org/index.pl?node_id=538550).
Thanks to rogue90 for noticing the problem, and to
Tanktalus for finding the best way to solve it.
- added: aliased unwrap to erase (ie added the unwrap method
to XML::Twig::Elt, identical to the existing erase)
suggested by Chris Burbridge.
- fixed: bug RT #17522: flushing twice at the end of
the the parse would output the last fragment twice.
Spotted by Harco de Hilster.
- fixed: bug RT #17500: parsing a pipe when using
the UTF8 perlIO layer (through PERL_UNICODE or -C)
now raises an error, found by Nikolaus Rath.
cwimproved: made the tests pass when the UTF8 perlIO layer is
used. At this point potential problems when parsing
non-UTF8 XML in this configuration are not trapped.
3.23 - 2006-01-23
- added: autoflush: there is no more need for the
last $twig->flush after the parsing, it is done
automatically at the end of the parsing, with the
same arguments as the first flush on the twig.
This can be turned of by setting $twig->{twig_autoflush}
to 0.
WARNING: if you finished the output with a direct
print instead of a flush, then this change will
cause a bug. Hopefully this should not be the case
and is easily fixable.
- fixed: bug RT #17145 where get_xpath('//root/elt[1]/child')
would produce a fatal error if there were no elt
element under root. Spotted by Dan Dascalescu.
- fixed: bug RT #17064 (comments and PIs after the
root element were not properly processed), spotted
by Dan Dascalescu.
- added: tests for xml_split and xml_merge and fixed
a few bugs in the process
- added: the -i option to xml_split and xml_merge,
that use XInclude instead of PIs (preliminary
support, the XInclude namespace is not declared
for example).
- added the XML::Twig and XML::Twig::Elt trim method
that trims an element in-place
-added the XML::Twig last_elt method and the XML::Twig::Elt
last_descendant method
- added: more tests
3.17 - 2005-03-16
- improved: documentation, mostly to point better to
the resources at http://www.xmltwig.com
-fixed: a few tests that would fail under perl 5.6.*
and Solaris (t/test_safe_encode.t and t/test_bug_3.15.t),
see RT bug # 11844, thanks to Sven Neuhaus
- changed: the licensing terms in the README to match the
ones in the main module (same as Perl), see RT bug #11725
- added: a test on XML::SAX::Writer version number to
avoid failing tests with old versions (<0.39)
- improved: xml_split
3.16 - 2005-02-11
- added: the xml_split/xml_merge tools
- fixed: PI handler behaviour when used in twig_roots mode
- fixed: bug that prevented the DTD to be output
when update_DTD mode is on, no DTD is present but
entities have been created
- added: level(<n>) trigger for handlers
- fixed: bug that prevented the output_filter to be
called when printing an element. Spotted thanks to
Louis Strous.
- fixed: bug in the nsgmls pretty printer that output
invalid XML (an extra \n was added in the end tag)
found by Lee Goddard
- fixed: test 284 in test_additional to make it pass
in RedHat's version of perl 5.8.0, thanks to
rdhayes for debugging and fixing that test.
- improved: first shot at getting Pis and comments back in the
proper place, even in 'keep' mode. At the moment
using set_pcdata (or set_cdata) removes all
embedded comments/pis
- fixed: a bug with pi's in keep mode (pi's would not
be copied if they were within an element) found by
Pascal Sternis
- added: a fix to get rid of spurious warnings, sent
by Anthony Persaud
- added: the remove_cdata option to the XML::Twig new
method, that will output CDATA sections as regular
(escaped) PCDATA
- added: the index option to the XML::Twig new method,
and the associated XML::Twig index method, which
generates a list of element matching a condition
during parsing
- added: the XML::Twig::Elt first_descendant method
- fixed: bug with the keep_encoding option where
attributes were not parsed when the element name was
followed by more than one space (spotted by Gerald
Sedrati-Dinet),
see https://rt.cpan.org/Ticket/Display.html?id=8137
- fixed: a bug where whitespace at the beginning of an
element could be dropped (if followed by an element
before any other character). Now whitespace is
dropped only if it includes a \n
- added: feature: when load_DTD is used, default
attributes are now filled
- fixed: bug on xmlns in path expression trigger
(would not replace prefixes in path expressions),
spotted by amonroy on perlmonks, see
http://perlmonks.org/index.pl?node_id=386764
- optimized: XML::Twig text, thanks to Nick Lassonde
for the patch
- fixed: bug that generated an empty line before some
comments (pointed out by Tanya Huang)
- fixed: tests to check XML::Filter::BufferText version
(1.00 has a bug in the CDATA handling that makes XML::Twig
tests fail).
- added: new options --nowrap and --exclude (-v) to xml_grep
- fixed: warning in tests under 5.8.0 (spotted by Ed Avis)
- improved: skipped HTML::Entities tests in 5.8.0 (make test for this
module seem to fail on my system, it might be the same
elsewhere)
- fixed: bug RT #6067 (problems with non-standard versions of
Scalar::Utils which do not include weaken)
- fixed: bug RT #6092 (error when using safe output filter)
- fixed: bug when using map_xmlns, tags in default namespace
were not output
3.15 - 2004-04-05
- fixed: tests now pass on more systems (thanks to Ed Avis for his testing)
Also when exporting SAX you should consider setting Twig to a
mode where all aspects of the XML are treated as nodes by XML::Twig,
by setting the following options when you create the twig:
comments => 'process', pi => 'process', keep_spaces => 1
- improved: twig_print_outside_roots now supports a file handle ref as argument:
the untouched part of the tree will be output to the filehandle:
- added: the 'indented_c' style that gives a slightly more compact pretty
print than 'indented': the end tags are on the same line as the
preceding text (suggestion of Hugh Myers)
- added: option in get_xpath (aka find_nodes) to apply the query to
a list of elements
- added: processing of conditions on the current node in get_xpath:
my @result= get_xpath( q{.[@att="val"]});
This is of course mostly useful with the previous option.
The idea stemmed from a post from Liam Quin to the perl-xml list
- added: XML::Twig xml_version, set_xml_version, standalone, set_standalone
methods on the XML declaration
- fixed: bug in change_gi (which simply did not work at all), found
by Ron Hayden.
- fixed: bug in space handling with CDATA (spaces before the CDATA section
were moved to within the section), comments and PI's
- fixed: bug in parse_url (exit was not called at the end of the child),
found by David Kulp
- improved: cleanup a bit the code that parses xpath expressions (still some work
to be done on this though), fixed a bug with last, found by Roel de Cock
- fixed: the SYNOPSIS (parsefile is used to parse files, spotted by e.sammer)
- fixed: bug in pretty printing (reported by Zhu Zhou)
- fixed: bugin the install: the Makefile now uses the same perl used
to perl Makefile.PL to run speedup and check_optional_modules
(reported by Ralf Santos)
- fixed: bugs in pretty printing when using flush, trying to figure out
as well as possible if an element contains other elements or text
(there is still a gotcha, see the BUGS section in the docs)
- fixed: bug that caused the XML declaration and the DTD not to be reset
between parses
- improved: the conversion functions (errors are now reported when the
function is created and not when it is first used)
- added: the output_encoding option to XML::Twig->new, which allows
specifying an encoding for the output: the conversion filter is
created using Encode (perl 5.8.0) Text::Iconv or Unicode::* The
XML declaration is also updated
- added: #CDATA and #ENT can now be used in handler expressions
- added: XML::Twig::Elt remove_cdata method, which turns CDATA sections
into regular PCDATA elements
- improved: set_asis can now be used to output CDATA sections un-escaped (and without
the CDATA section markers)
3.04 - 2002-04-01
- fixed: handlers for XML::Parser 2.27 so the module can pass the tests
3.03 - 2002-03-26
- fixed: bugs in entity handling in twig_roots mode
- added: the ignore_elts option, to skip completely elements
- improved: enhanced the XPath-like syntax in navigation and get_xpath
methods: added operators (>, < ...)
- fixed: [RT 168]: setTwigHandler failed when no handler was already set
(thanks to Jerry)
- improved: turned %valid_option into a package global so AnyData can access it
- fixed: bug in sprint that prevented it from working with filters
- fixed: bug in erase when erasing an empty element that was the
last child of its parent ([RT390], thanks to Julian Arnold)
- fixed: copy now correctly copies the asis status of elements
- fixed:typos on the docs (thanks to Shlomo Yona)
- added: tests (for erase and entities in twig_roots mode)
3.02 - 2002-01-16
- fixed: tweaked speedup to replace constructs that did not work in
perl 5.005003
3.01 - 2002-01-09
- fixed: the directory name in the tar file
3.00 - 2002-01-09
- COMPATIBILITY
WARNING: THIS CHANGE IS NOT BACKWARD COMPATIBLE
But it is The Right Thing To Do
In normal mode (when KeepEncoding is not used) the XML data is
now stored as parsed by XML::Parser, ie the base entities are
expanded. The "print" methods (print, sprint and flush, plus the
new xml_string, pcdata_xml_string and att_xml_string) return the
data in XML-escaped form: & and < are escaped in PCDATA and
&, < and the quote (" by default) are turned to & < and
" (or ' if the quote is '). The "text" methods (text,
att and pcdata) return the stored text as is.
So if you want to output XML you should use the "print" methods
and if you want to output text you should use the "text" methods.
Note that this breaks the trick consisting in adding tags to the
content of an element: $elt->prefix( "<b>") no longer adds a <b>
tag before an element. $elt->print will now output "<b>...".
(but you can still use it by marking those elements as 'asis').
It also fixes the annoying ' thingie that used to replace '
in the data.
When the KeepEncoding option is used this is not true, the data
is stored asis, base entities are kept un-escaped.
Note that KeepEncoding is a global setting, if you use several twigs,
some with KeepEncoding and some without then you will have to manually
set the option using the set_keep_encoding method, otherwise the last
XML::Twig::new call will have set it
In addition when the KeepEncoding option is used the start tag is
parsed using a custom function parse_start_tag, which works only
for 1-byte encodings (it is regexp-based). This method can be
overridden using the ParseStartTag (or parse_start_tag) option
when creating the twig. This function takes the original string as
input and returns the gi and the attributes (in a hash).
If you write a function that works for multi-byte encodings I would
very much appreciate if you could send it back to me so I can add it
to the module, so other users can benefit from it.
An additional option ExpansExternalEnts will expand external entity
references to their text (in the output, the text stored is &ent;).
- added: when handlers (twig_handlers or start_tag_handlers) are called
$_ is set to the element node, so quick hacks look better:
my $t= new XML::Twig( twig_handlers =>
{ elt => sub { print $_->att( 'id'), ": ", $_->text, "\n"; } }
);
- added: XML::Twig dispose method which properly reclaims all the memory
used by the object (useful if you don't have WeakRef installed)
- added: XML::Twig and XML::Twig::Elt ignore methods, which can be called
from a start_tag_handlers handler and cause the element (or the
current element if called on a twig) to be ignored by the
parsing
- added: XML::Twig parse_start_tag option that overrides the default function
used to parse start tags when KeepEncoding is used
- added: XML::Twig::Elt xml_string, pcdata_xml_string and att_xml_string
all return an XML-escaped string for an element (including
sub-elements and their tags but not the enclosing tags for the
element), a #PCDATA element and an attribute
- added: XML::Twig::Elt methods tag and set_tag, equivalent respectively
to gi and set_gi
- added: XML::Twig and XML::Twig::Elt set_keep_encoding methods can be used
to set the keep_encoding value if you use several twigs with
different keep_encoding options
- improved: option names for XML::Twig::new are now checked (a warning is output
if the option is not a valid one);
- improved: when using pretty_print nice or indented keep_spaces_in is now checked
so the elements within an element listed in keep_spaces_in are not
indented
- added: XML::Twig::Elt insert_new_elt method that does a new and a paste
- added: XML::Twig::Elt split_at method splits a #PCDATA element in 2
- added: XML::Twig::Elt split method splits all the text descendants of an
element, on a regep, wrapping text captured in brackets in the
regexp in a specified element, all elements are returned
- added: XML::Twig::Elt mark method is similar to the split method, except
that only newly created elements (matched by the regexp) are
returned
- added: XML::Twig::Elt get_type method returns #ELT for elements and the gi
(#PCDATA, #CDATA...) otherwise
- added: XML::Twig::Elt is_elt returns the gi if the element is a real element
and 0 if it is #PCDATA, #CDATA...
- added: XML::Twig::Elt contains_only_text returns 1 if the element contains no
"real" element (is_field is another name for it)
- added: First implementation of the output_filter option which filters the
text before it is output by the print, sprint, flush and text methods
(only works for print at the moment, and still under test with various
versions of XML::Parser). Standard filters are also available
Example:
#!/bin/perl -w
use strict;
use XML::Twig;
my $t = new XML::Twig(output_filter => 'latin1');
$t->parse( \*DATA);
$t->print;
__DATA__
<?xml version="1.0" encoding="ISO-8859-1"?>
<docù atté="valuè">Un homme soupçonné d'être impliqué dans
la mort d'un motard de la police, renversé
</docù>
( run in 0.577 second using v1.01-cache-2.11-cpan-39bf76dae61 )