view release on metacpan or search on metacpan
t/README.txt
t/attr1.xml
t/attr2.xml
t/attr3.xml
t/bad_root.xml
t/build_dom.t
t/chk1.xml
t/chk_batch.t Main test case for XML::Checker
t/dom_astress.t dom_*.t are test cases for XML::DOM
t/dom_attr.t
t/dom_cdata.t
t/dom_example.t
t/dom_extent.dtd
t/dom_extent.ent
t/dom_extent.t
t/dom_jp_astress.t
t/dom_jp_attr.t
t/dom_jp_cdata.t
t/dom_jp_example.t
t/dom_jp_minus.t
t/dom_jp_modify.t
t/dom_jp_print.t
t/dom_minus.t
t/dom_modify.t
t/dom_print.t
t/dom_template.t
t/entity.xml
t/fail.xml
html/XML/Checker.html view on Meta::CPAN
</STRONG></A><DD>Used by XML::DOM::Element::check() to initialize (and cleanup) the
context stack when checking a single element.
</DL><H2><A NAME="PerlSAX_interface">PerlSAX interface
</A></H2><P>XML::Checker now also supports the PerlSAX interface, so you can use XML::Checker
wherever you use PerlSAX handlers.
<P>XML::Checker implements the following methods: start_document, end_document,
start_element, end_element, characters, processing_instruction, comment,
start_cdata, end_cdata, entity_reference, notation_decl, unparsed_entity_decl,
entity_decl, element_decl, attlist_decl, doctype_decl, xml_decl
<P>Not implemented: set_document_locator, ignorable_whitespace
<P>See PerlSAX.pod for details. (It is called lib/PerlSAX.pod in the libxml-perl
distribution which can be found at CPAN.)
<P><HR><H1><A NAME="CAVEATS">CAVEATS
</A></H1><P>This is an alpha release. Almost everything is subject to change.
html/XML/DOM/Node.html view on Meta::CPAN
</STRONG></A><DD>Return an integer indicating the node type. See XML::DOM constants.
<DT><A NAME="getNodeName_"><STRONG><P>getNodeName
</STRONG></A><DD>Return a property or a hardcoded string, depending on the node type.
Here are the corresponding functions or values:
<PRE> Attr getName
AttDef getName
AttlistDecl getName
CDATASection "#cdata-section"
Comment "#comment"
Document "#document"
DocumentType getNodeName
DocumentFragment "#document-fragment"
Element getTagName
ElementDecl getName
EntityReference getEntityName
Entity getNotationName
Notation getName
ProcessingInstruction getTarget
html/XML/Filter/DetectWS.html view on Meta::CPAN
be <I>ignorable</I>. The ignorable whitespace is passed to the PerlSAX Handler with
the <B>ignorable_whitespace</B> handler, provided that the Handler implements this
method. (Otherwise it is passed to the characters handler.)
If the <B>SkipIgnorableWS</B> is set, the ignorable whitespace is simply
discarded.
<P>XML::Filter::DetectWS also takes xml:space attributes into account. See below
for details.
<P>CDATA sections are passed in the standard PerlSAX way (i.e. with surrounding
start_cdata and end_cdata events), unless the Handler does not implement these
methods. In that case, the CDATA section is simply passed to the characters
method.
<P><HR><H1><A NAME="Constructor_Options">Constructor Options
</A></H1><DL><DT><A NAME="_SkipIgnorableWS_Default_0_"><STRONG><P>* SkipIgnorableWS (Default: 0)
</STRONG></A><DD>When set, detected ignorable whitespace is discarded.
<DT><A NAME="_Handler_"><STRONG><P>* Handler
html/XML/Filter/DetectWS.html view on Meta::CPAN
<P>The idea is that the API of DetectWS will be extended, so that you can
specify/override e.g. which elements should behave as if xml:space="preserve"
were set, and/or which elements should behave as if the <!ELEMENT> model was
defined a certain way, etc.
<P>Please send feedback!
<P>The current implementation also detects whitespace after an element-start tag,
whitespace before an element-end tag.
It also detects whitespace before an element-start and after an element-end tag
and before or after comments, processing instruction, cdata sections etc.,
but this needs to be reimplemented.
In either case, the detected whitespace is split off into its own PerlSAX
characters event and an extra property 'Loc' is added. It can have 4 possible
values:
<DL><DT><A NAME="_1_WS_START_whitespace_immediately_after_element_start_tag_"><STRONG><P>* 1 (WS_START) - whitespace immediately after element-start tag
</STRONG></A><BR><STRONG>* 2 (WS_END) - whitespace just before element-end tag
</STRONG><BR><STRONG>* 3 (WS_ONLY) - both WS_START and WS_END, i.e. it's the only text found between the start and end tag and it's all whitespace
</STRONG><BR><STRONG>* 0 (WS_INTER) - none of the above, probably before an element-start tag,
after an element-end tag, or before or after a comment, PI, cdata section etc.
</STRONG><DD>Note that WS_INTER may not be that useful, so this may change.
</DL><P><HR><H1><A NAME="xml_space_attribute">xml:space attribute
</A></H1><P>The XML spec states that: A special attribute
named xml:space may be attached to an element
to signal an intention that in that element,
white space should be preserved by applications.
In valid documents, this attribute, like any other, must be
lib/XML/Checker.pm view on Meta::CPAN
{
$self->CData ($data);
}
else
{
$self->Char ($data);
}
}
# PerlSAX API
sub start_cdata
{
$_[0]->{InCDATA} = 1;
}
# PerlSAX API
sub end_cdata
{
$_[0]->{InCDATA} = 0;
}
sub Char
{
my ($self, $text) = @_;
my $context = $self->{Context};
# NOTE: calls to isWS may set this to 1.
$INSIGNIF_WS = 0;
$context->[0]->Char ($self, $text);
}
# Treat CDATASection same as Char (Text)
sub CData
{
my ($self, $cdata) = @_;
my $context = $self->{Context};
$context->[0]->Char ($self, $cdata);
# CDATASection can never be insignificant whitespace
$INSIGNIF_WS = 0;
#?? I'm not sure if this assumption is correct
}
# PerlSAX API
sub comment
{
my ($self, $hash) = @_;
lib/XML/Checker.pm view on Meta::CPAN
=back
=head2 PerlSAX interface
XML::Checker now also supports the PerlSAX interface, so you can use XML::Checker
wherever you use PerlSAX handlers.
XML::Checker implements the following methods: start_document, end_document,
start_element, end_element, characters, processing_instruction, comment,
start_cdata, end_cdata, entity_reference, notation_decl, unparsed_entity_decl,
entity_decl, element_decl, attlist_decl, doctype_decl, xml_decl
Not implemented: set_document_locator, ignorable_whitespace
See PerlSAX.pod for details. (It is called lib/PerlSAX.pod in the libxml-perl
distribution which can be found at CPAN.)
=head1 CAVEATS
This is an alpha release. Almost everything is subject to change.
lib/XML/DOM.pm view on Meta::CPAN
{
import XML::DOM::CharacterData qw( :DEFAULT :Fields );
import XML::DOM::Node qw( :DEFAULT :Fields );
XML::DOM::def_fields ("", "XML::DOM::CharacterData");
}
use XML::DOM::DOMException;
sub getNodeName
{
"#cdata-section";
}
sub getNodeType
{
CDATA_SECTION_NODE;
}
sub cloneNode
{
my $self = shift;
lib/XML/DOM.pm view on Meta::CPAN
sub to_expat
{
my ($self, $iter) = @_;
$iter->CData ($self->getData);
}
sub _to_sax
{
my ($self, $doch, $dtdh, $enth) = @_;
$doch->start_cdata;
$doch->characters ( { Data => $self->getData } );
$doch->end_cdata;
}
######################################################################
package XML::DOM::Comment;
######################################################################
use vars qw{ @ISA @EXPORT_OK %EXPORT_TAGS %HFIELDS };
BEGIN
{
import XML::DOM::CharacterData qw( :DEFAULT :Fields );
lib/XML/DOM/Node.pod view on Meta::CPAN
Return an integer indicating the node type. See XML::DOM constants.
=item getNodeName
Return a property or a hardcoded string, depending on the node type.
Here are the corresponding functions or values:
Attr getName
AttDef getName
AttlistDecl getName
CDATASection "#cdata-section"
Comment "#comment"
Document "#document"
DocumentType getNodeName
DocumentFragment "#document-fragment"
Element getTagName
ElementDecl getName
EntityReference getEntityName
Entity getNotationName
Notation getName
ProcessingInstruction getTarget
lib/XML/Filter/DetectWS.pm view on Meta::CPAN
sub { $handler->ignorable_whitespace (@_) };
}
else
{
$handlers{ignorable_whitespace} = $handlers{characters};
}
$handlers{ws} = $handlers{characters};
#?? were should whitespace go?
# NOTE: 'cdata' is not a valid PerlSAX callback
if (UNIVERSAL::can ($handler, 'start_cdata') &&
UNIVERSAL::can ($handler, 'end_cdata'))
{
$handlers{cdata} = sub {
$handler->start_cdata;
$handler->characters (@_);
$handler->end_cdata;
}
}
else # pass CDATA as regular characters
{
$handlers{cdata} = $handlers{characters};
}
$self->{Callback} = \%handlers;
}
sub start_cdata
{
my ($self, $event) = @_;
$self->{InCDATA} = 1;
}
sub end_cdata
{
my ($self, $event) = @_;
$self->{InCDATA} = 0;
}
sub entity_reference
{
my ($self, $event) = @_;
lib/XML/Filter/DetectWS.pm view on Meta::CPAN
my $elem = pop @{ $self->{ParentStack} };
delete $elem->{State};
}
sub characters
{
my ($self, $event) = @_;
if ($self->{InCDATA})
{
# NOTE: 'cdata' is not a valid PerlSAX callback
$self->push_event ('cdata', $event);
my $parent = $self->{ParentStack}->[-1];
$parent->{State} |= ENDS_IN_NON_WS unless $parent->{State} == PRESERVE_WS;
return;
}
my $text = $event->{Data};
return unless length ($text);
my $state = $self->{ParentStack}->[-1]->{State};
lib/XML/Filter/DetectWS.pm view on Meta::CPAN
be I<ignorable>. The ignorable whitespace is passed to the PerlSAX Handler with
the B<ignorable_whitespace> handler, provided that the Handler implements this
method. (Otherwise it is passed to the characters handler.)
If the B<SkipIgnorableWS> is set, the ignorable whitespace is simply
discarded.
XML::Filter::DetectWS also takes xml:space attributes into account. See below
for details.
CDATA sections are passed in the standard PerlSAX way (i.e. with surrounding
start_cdata and end_cdata events), unless the Handler does not implement these
methods. In that case, the CDATA section is simply passed to the characters
method.
=head1 Constructor Options
=over 4
=item * SkipIgnorableWS (Default: 0)
When set, detected ignorable whitespace is discarded.
lib/XML/Filter/DetectWS.pm view on Meta::CPAN
The idea is that the API of DetectWS will be extended, so that you can
specify/override e.g. which elements should behave as if xml:space="preserve"
were set, and/or which elements should behave as if the <!ELEMENT> model was
defined a certain way, etc.
Please send feedback!
The current implementation also detects whitespace after an element-start tag,
whitespace before an element-end tag.
It also detects whitespace before an element-start and after an element-end tag
and before or after comments, processing instruction, cdata sections etc.,
but this needs to be reimplemented.
In either case, the detected whitespace is split off into its own PerlSAX
characters event and an extra property 'Loc' is added. It can have 4 possible
values:
=over 4
=item * 1 (WS_START) - whitespace immediately after element-start tag
=item * 2 (WS_END) - whitespace just before element-end tag
=item * 3 (WS_ONLY) - both WS_START and WS_END, i.e. it's the only text found between the start and end tag and it's all whitespace
=item * 0 (WS_INTER) - none of the above, probably before an element-start tag,
after an element-end tag, or before or after a comment, PI, cdata section etc.
=back
Note that WS_INTER may not be that useful, so this may change.
=head1 xml:space attribute
The XML spec states that: A special attribute
named xml:space may be attached to an element
to signal an intention that in that element,
lib/XML/Filter/Reindent.pm view on Meta::CPAN
$self->push_event ('end_document', $event || {});
$self->flush (0); # send remaining events
}
sub push_event
{
my ($self, $type, $event) = @_;
$event->{EventType} = $type;
if ($type =~ /^(characters|comment|processing_instruction|entity_reference|cdata)$/)
{
my $indent_kids = $self->{ParentStack}->[-1]->{IndentChildren} ? 1 : 0;
$event->{Indent} = $indent_kids ? $self->{Level} : undef;
}
my $q = $self->{EventQ};
push @$q, $event;
$self->flush (4); # keep 4 events on the stack (maybe 3 is enough)
}
lib/XML/Filter/SAXT.pm view on Meta::CPAN
use vars qw( %SAX_HANDLERS );
%SAX_HANDLERS = ( DocumentHandler =>
[ "start_document",
"end_document",
"start_element",
"end_element",
"characters",
"processing_instruction",
"comment",
"start_cdata",
"end_cdata",
"entity_reference",
"set_document_locator" # !! passes {Locator=>$perlsax}
],
DTDHandler =>
[ "notation_decl",
"unparsed_entity_decl",
"entity_decl",
"element_decl",
"attlist_decl",
lib/XML/Handler/BuildDOM.pm view on Meta::CPAN
sub entity_reference # was Default
{
my $self = $_[0];
my $name = $_[1]->{Name};
$self->{Element}->appendChild (
$self->{Document}->createEntityReference ($name));
undef $self->{LastText};
}
sub start_cdata
{
my $self = shift;
$self->{InCDATA} = 1;
}
sub end_cdata
{
my $self = shift;
$self->{InCDATA} = 0;
}
sub comment
{
my $self = $_[0];
local $XML::DOM::IgnoreReadOnly = 1;
lib/XML/Handler/Composer.pm view on Meta::CPAN
else # regular text
{
my $esc = $self->{PreserveWS} ?
$self->{Escape}->{TextPreserveNL} :
$self->{Escape}->{Text};
$self->print (&$esc ($event->{Data}));
}
}
sub start_cdata
{
my $self = shift;
$self->{InCDATA} = 1;
$self->print ("<![CDATA[");
}
sub end_cdata
{
my $self = shift;
$self->{InCDATA} = 0;
$self->print ("]]>");
}
sub comment
{
my ($self, $event) = @_;
samples/REC-xml-19980210.xml view on Meta::CPAN
The target names "<code>XML</code>", "<code>xml</code>", and so on are
reserved for standardization in this or future versions of this
specification.
The
XML <termref def='dt-notation'>Notation</termref> mechanism
may be used for
formal declaration of PI targets.
</p>
</div2>
<div2 id='sec-cdata-sect'>
<head>CDATA Sections</head>
<p><termdef id="dt-cdsection" term="CDATA Section"><term>CDATA sections</term>
may occur
anywhere character data may occur; they are
used to escape blocks of text containing characters which would
otherwise be recognized as markup. CDATA sections begin with the
string "<code><![CDATA[</code>" and end with the string
"<code>]]></code>":
<scrap lang="ebnf">
<undefined/>
</c>
<q req_id="unreferenced_id"/>
</p>
<n nmtoken="1bad nmtoken"/>
<n nmtoken="valid_nmtoken"/>
<bad_id bad_idref="1bad ID"/>
<bad_id/>
<bad_ent bad_ent="1bad ENTITY"/>
<bad_ent/>
<![CDATA[ some cdata ]]>
</root>