libxml-enno

 view release on metacpan or  search on metacpan

MANIFEST  view on Meta::CPAN

t/README.txt
t/attr1.xml
t/attr2.xml
t/attr3.xml
t/bad_root.xml
t/build_dom.t
t/chk1.xml
t/chk_batch.t			Main test case for XML::Checker
t/dom_astress.t			dom_*.t are test cases for XML::DOM
t/dom_attr.t
t/dom_cdata.t
t/dom_example.t
t/dom_extent.dtd
t/dom_extent.ent
t/dom_extent.t
t/dom_jp_astress.t
t/dom_jp_attr.t
t/dom_jp_cdata.t
t/dom_jp_example.t
t/dom_jp_minus.t
t/dom_jp_modify.t
t/dom_jp_print.t
t/dom_minus.t
t/dom_modify.t
t/dom_print.t
t/dom_template.t
t/entity.xml
t/fail.xml

html/XML/Checker.html  view on Meta::CPAN

</STRONG></A><DD>Used by XML::DOM::Element::check() to initialize (and cleanup) the 
context stack when checking a single element.

</DL><H2><A NAME="PerlSAX_interface">PerlSAX interface

</A></H2><P>XML::Checker now also supports the PerlSAX interface, so you can use XML::Checker
wherever you use PerlSAX handlers.

<P>XML::Checker implements the following methods: start_document, end_document,
start_element, end_element, characters, processing_instruction, comment,
start_cdata, end_cdata, entity_reference, notation_decl, unparsed_entity_decl,
entity_decl, element_decl, attlist_decl, doctype_decl, xml_decl

<P>Not implemented: set_document_locator, ignorable_whitespace

<P>See PerlSAX.pod for details. (It is called lib/PerlSAX.pod in the libxml-perl 
distribution which can be found at CPAN.)

<P><HR><H1><A NAME="CAVEATS">CAVEATS

</A></H1><P>This is an alpha release. Almost everything is subject to change.

html/XML/DOM/Node.html  view on Meta::CPAN

</STRONG></A><DD>Return an integer indicating the node type. See XML::DOM constants.

<DT><A NAME="getNodeName_"><STRONG><P>getNodeName

</STRONG></A><DD>Return a property or a hardcoded string, depending on the node type.
Here are the corresponding functions or values:

<PRE> Attr			getName
 AttDef			getName
 AttlistDecl		getName
 CDATASection		"#cdata-section"
 Comment		"#comment"
 Document		"#document"
 DocumentType		getNodeName
 DocumentFragment	"#document-fragment"
 Element		getTagName
 ElementDecl		getName
 EntityReference	getEntityName
 Entity			getNotationName
 Notation		getName
 ProcessingInstruction	getTarget

html/XML/Filter/DetectWS.html  view on Meta::CPAN

be <I>ignorable</I>. The ignorable whitespace is passed to the PerlSAX Handler with
the <B>ignorable_whitespace</B> handler, provided that the Handler implements this 
method. (Otherwise it is passed to the characters handler.)
If the <B>SkipIgnorableWS</B> is set, the ignorable whitespace is simply
discarded.

<P>XML::Filter::DetectWS also takes xml:space attributes into account. See below
for details.

<P>CDATA sections are passed in the standard PerlSAX way (i.e. with surrounding
start_cdata and end_cdata events), unless the Handler does not implement these
methods. In that case, the CDATA section is simply passed to the characters 
method.

<P><HR><H1><A NAME="Constructor_Options">Constructor Options

</A></H1><DL><DT><A NAME="_SkipIgnorableWS_Default_0_"><STRONG><P>* SkipIgnorableWS (Default: 0)

</STRONG></A><DD>When set, detected ignorable whitespace is discarded.

<DT><A NAME="_Handler_"><STRONG><P>* Handler

html/XML/Filter/DetectWS.html  view on Meta::CPAN

<P>The idea is that the API of DetectWS will be extended, so that you can
specify/override e.g. which elements should behave as if xml:space="preserve" 
were set, and/or which elements should behave as if the &lt;!ELEMENT&gt; model was
defined a certain way, etc.

<P>Please send feedback!

<P>The current implementation also detects whitespace after an element-start tag,
whitespace before an element-end tag. 
It also detects whitespace before an element-start and after an element-end tag
and before or after comments, processing instruction, cdata sections etc.,
but this needs to be reimplemented.
In either case, the detected whitespace is split off into its own PerlSAX
characters event and an extra property 'Loc' is added. It can have 4 possible
values:

<DL><DT><A NAME="_1_WS_START_whitespace_immediately_after_element_start_tag_"><STRONG><P>* 1 (WS_START) - whitespace immediately after element-start tag

</STRONG></A><BR><STRONG>* 2 (WS_END) - whitespace just before element-end tag

</STRONG><BR><STRONG>* 3 (WS_ONLY) - both WS_START and WS_END, i.e. it's the only text found between the start and end tag and it's all whitespace

</STRONG><BR><STRONG>* 0 (WS_INTER) - none of the above, probably before an element-start tag,
after an element-end tag, or before or after a comment, PI, cdata section etc.

</STRONG><DD>Note that WS_INTER may not be that useful, so this may change.

</DL><P><HR><H1><A NAME="xml_space_attribute">xml:space attribute

</A></H1><P>The XML spec states that: A special attribute
named xml:space may be attached to an element
to signal an intention that in that element,
white space should be preserved by applications.
In valid documents, this attribute, like any other, must be 

lib/XML/Checker.pm  view on Meta::CPAN

    {
	$self->CData ($data);
    }
    else
    {
	$self->Char ($data);
    }
}

# PerlSAX API
sub start_cdata
{
    $_[0]->{InCDATA} = 1;
}

# PerlSAX API
sub end_cdata
{
    $_[0]->{InCDATA} = 0;
}

sub Char
{
    my ($self, $text) = @_;
    my $context = $self->{Context};

    # NOTE: calls to isWS may set this to 1.
    $INSIGNIF_WS = 0;

    $context->[0]->Char ($self, $text);
}

# Treat CDATASection same as Char (Text)
sub CData
{
    my ($self, $cdata) = @_;
    my $context = $self->{Context};

    $context->[0]->Char ($self, $cdata);

    # CDATASection can never be insignificant whitespace
    $INSIGNIF_WS = 0;
#?? I'm not sure if this assumption is correct
}

# PerlSAX API
sub comment
{
    my ($self, $hash) = @_;

lib/XML/Checker.pm  view on Meta::CPAN


=back

=head2 PerlSAX interface

XML::Checker now also supports the PerlSAX interface, so you can use XML::Checker
wherever you use PerlSAX handlers.

XML::Checker implements the following methods: start_document, end_document,
start_element, end_element, characters, processing_instruction, comment,
start_cdata, end_cdata, entity_reference, notation_decl, unparsed_entity_decl,
entity_decl, element_decl, attlist_decl, doctype_decl, xml_decl

Not implemented: set_document_locator, ignorable_whitespace

See PerlSAX.pod for details. (It is called lib/PerlSAX.pod in the libxml-perl 
distribution which can be found at CPAN.)

=head1 CAVEATS

This is an alpha release. Almost everything is subject to change.

lib/XML/DOM.pm  view on Meta::CPAN

{
    import XML::DOM::CharacterData qw( :DEFAULT :Fields );
    import XML::DOM::Node qw( :DEFAULT :Fields );
    XML::DOM::def_fields ("", "XML::DOM::CharacterData");
}

use XML::DOM::DOMException;

sub getNodeName
{
    "#cdata-section";
}

sub getNodeType
{
    CDATA_SECTION_NODE;
}

sub cloneNode
{
    my $self = shift;

lib/XML/DOM.pm  view on Meta::CPAN


sub to_expat
{
    my ($self, $iter) = @_;
    $iter->CData ($self->getData);
}

sub _to_sax
{
    my ($self, $doch, $dtdh, $enth) = @_;
    $doch->start_cdata;
    $doch->characters ( { Data => $self->getData } );
    $doch->end_cdata;
}

######################################################################
package XML::DOM::Comment;
######################################################################
use vars qw{ @ISA @EXPORT_OK %EXPORT_TAGS %HFIELDS };

BEGIN
{
    import XML::DOM::CharacterData qw( :DEFAULT :Fields );

lib/XML/DOM/Node.pod  view on Meta::CPAN

Return an integer indicating the node type. See XML::DOM constants.

=item getNodeName

Return a property or a hardcoded string, depending on the node type.
Here are the corresponding functions or values:

 Attr			getName
 AttDef			getName
 AttlistDecl		getName
 CDATASection		"#cdata-section"
 Comment		"#comment"
 Document		"#document"
 DocumentType		getNodeName
 DocumentFragment	"#document-fragment"
 Element		getTagName
 ElementDecl		getName
 EntityReference	getEntityName
 Entity			getNotationName
 Notation		getName
 ProcessingInstruction	getTarget

lib/XML/Filter/DetectWS.pm  view on Meta::CPAN

	    sub { $handler->ignorable_whitespace (@_) };
    }
    else
    {
	$handlers{ignorable_whitespace} = $handlers{characters};
    }

    $handlers{ws} = $handlers{characters};    
#?? were should whitespace go?

    # NOTE: 'cdata' is not a valid PerlSAX callback
    if (UNIVERSAL::can ($handler, 'start_cdata') &&
	UNIVERSAL::can ($handler, 'end_cdata'))
    {
	$handlers{cdata} = sub {
	    $handler->start_cdata;
	    $handler->characters (@_);
	    $handler->end_cdata;
	}
    }
    else	# pass CDATA as regular characters
    {
	$handlers{cdata} = $handlers{characters};
    }

    $self->{Callback} = \%handlers;
}

sub start_cdata
{
    my ($self, $event) = @_;

    $self->{InCDATA} = 1;
}

sub end_cdata
{
    my ($self, $event) = @_;

    $self->{InCDATA} = 0;
}

sub entity_reference
{
    my ($self, $event) = @_;
    

lib/XML/Filter/DetectWS.pm  view on Meta::CPAN

    my $elem = pop @{ $self->{ParentStack} };
    delete $elem->{State};
}

sub characters
{
    my ($self, $event) = @_;

    if ($self->{InCDATA})
    {
	# NOTE: 'cdata' is not a valid PerlSAX callback
	$self->push_event ('cdata', $event);
	
	my $parent = $self->{ParentStack}->[-1];
	$parent->{State} |= ENDS_IN_NON_WS unless $parent->{State} == PRESERVE_WS;
	return;
    }

    my $text = $event->{Data};
    return unless length ($text);

    my $state = $self->{ParentStack}->[-1]->{State};

lib/XML/Filter/DetectWS.pm  view on Meta::CPAN

be I<ignorable>. The ignorable whitespace is passed to the PerlSAX Handler with
the B<ignorable_whitespace> handler, provided that the Handler implements this 
method. (Otherwise it is passed to the characters handler.)
If the B<SkipIgnorableWS> is set, the ignorable whitespace is simply
discarded.

XML::Filter::DetectWS also takes xml:space attributes into account. See below
for details.

CDATA sections are passed in the standard PerlSAX way (i.e. with surrounding
start_cdata and end_cdata events), unless the Handler does not implement these
methods. In that case, the CDATA section is simply passed to the characters 
method.

=head1 Constructor Options

=over 4

=item * SkipIgnorableWS (Default: 0)

When set, detected ignorable whitespace is discarded.

lib/XML/Filter/DetectWS.pm  view on Meta::CPAN

The idea is that the API of DetectWS will be extended, so that you can
specify/override e.g. which elements should behave as if xml:space="preserve" 
were set, and/or which elements should behave as if the <!ELEMENT> model was
defined a certain way, etc.

Please send feedback!

The current implementation also detects whitespace after an element-start tag,
whitespace before an element-end tag. 
It also detects whitespace before an element-start and after an element-end tag
and before or after comments, processing instruction, cdata sections etc.,
but this needs to be reimplemented.
In either case, the detected whitespace is split off into its own PerlSAX
characters event and an extra property 'Loc' is added. It can have 4 possible
values:

=over 4

=item * 1 (WS_START) - whitespace immediately after element-start tag

=item * 2 (WS_END) - whitespace just before element-end tag

=item * 3 (WS_ONLY) - both WS_START and WS_END, i.e. it's the only text found between the start and end tag and it's all whitespace

=item * 0 (WS_INTER) - none of the above, probably before an element-start tag,
after an element-end tag, or before or after a comment, PI, cdata section etc.

=back

Note that WS_INTER may not be that useful, so this may change.

=head1 xml:space attribute

The XML spec states that: A special attribute
named xml:space may be attached to an element
to signal an intention that in that element,

lib/XML/Filter/Reindent.pm  view on Meta::CPAN


    $self->push_event ('end_document', $event || {});
    $self->flush (0);	# send remaining events
}

sub push_event
{
    my ($self, $type, $event) = @_;

    $event->{EventType} = $type;
    if ($type =~ /^(characters|comment|processing_instruction|entity_reference|cdata)$/)
    {
	my $indent_kids = $self->{ParentStack}->[-1]->{IndentChildren} ? 1 : 0;
	$event->{Indent} =  $indent_kids ? $self->{Level} : undef;
    }

    my $q = $self->{EventQ};
    push @$q, $event;

    $self->flush (4);	# keep 4 events on the stack (maybe 3 is enough)
}

lib/XML/Filter/SAXT.pm  view on Meta::CPAN

use vars qw( %SAX_HANDLERS );

%SAX_HANDLERS = ( DocumentHandler => 
		  [ "start_document",
		    "end_document",
		    "start_element",
		    "end_element",
		    "characters",
		    "processing_instruction",
		    "comment",
		    "start_cdata",
		    "end_cdata",
		    "entity_reference",
		    "set_document_locator"	# !! passes {Locator=>$perlsax}
		    ],

		  DTDHandler => 
		  [ "notation_decl",
		    "unparsed_entity_decl",
		    "entity_decl",
		    "element_decl",
		    "attlist_decl",

lib/XML/Handler/BuildDOM.pm  view on Meta::CPAN

sub entity_reference # was Default
{
    my $self = $_[0];
    my $name = $_[1]->{Name};
    
    $self->{Element}->appendChild (
			    $self->{Document}->createEntityReference ($name));
    undef $self->{LastText};
}

sub start_cdata
{
    my $self = shift;
    $self->{InCDATA} = 1;
}

sub end_cdata
{
    my $self = shift;
    $self->{InCDATA} = 0;
}

sub comment
{
    my $self = $_[0];

    local $XML::DOM::IgnoreReadOnly = 1;

lib/XML/Handler/Composer.pm  view on Meta::CPAN

    else # regular text
    {
	my $esc = $self->{PreserveWS} ? 
	    $self->{Escape}->{TextPreserveNL} :
	    $self->{Escape}->{Text};

	$self->print (&$esc ($event->{Data}));
    }
}

sub start_cdata
{
    my $self = shift;
    $self->{InCDATA} = 1;

    $self->print ("<![CDATA[");
}

sub end_cdata
{
    my $self = shift;
    $self->{InCDATA} = 0;

    $self->print ("]]>");
}

sub comment
{
    my ($self, $event) = @_;

samples/REC-xml-19980210.xml  view on Meta::CPAN

The target names "<code>XML</code>", "<code>xml</code>", and so on are
reserved for standardization in this or future versions of this
specification.
The 
XML <termref def='dt-notation'>Notation</termref> mechanism
may be used for
formal declaration of PI targets.
</p>
</div2>
 
<div2 id='sec-cdata-sect'>
<head>CDATA Sections</head>
 
<p><termdef id="dt-cdsection" term="CDATA Section"><term>CDATA sections</term>
may occur 
anywhere character data may occur; they are
used to escape blocks of text containing characters which would
otherwise be recognized as markup.  CDATA sections begin with the
string "<code>&lt;![CDATA[</code>" and end with the string
"<code>]]&gt;</code>":
<scrap lang="ebnf">

t/fail.xml  view on Meta::CPAN

   <undefined/>
  </c>
  <q req_id="unreferenced_id"/>
 </p>
 <n nmtoken="1bad nmtoken"/>
 <n nmtoken="valid_nmtoken"/>
 <bad_id bad_idref="1bad ID"/>
 <bad_id/>
 <bad_ent bad_ent="1bad ENTITY"/>
 <bad_ent/>
 <![CDATA[ some cdata ]]>
</root>



( run in 0.331 second using v1.01-cache-2.11-cpan-ec4f86ec37b )