view release on metacpan or search on metacpan
0.66 Fri Jan 23 16:03:03 2004
- Fixed double hyperlinking issue with MKDoc::XML::Tagger
0.65 Fri Jan 23 13:08:33 2004
- Fixed some backwards compatibility issues
0.64 Fri Jan 23 12:28:27 2004
- Fixed "Use of uninitialized value in join or string" warnings in MKDoc::XML::Decode
0.63
- Added more comprehensive XHTML entities decoding
0.62 Thu Nov 6 11:54:49 2003
- Added current MKDoc backwards compatibility
0.61 Thu Oct 16 13:44:40 2003
- Fixed not dying on <p foo=bar>
- <p foobar>
- <p foo bar> ...
0.60 Thu Oct 9 16:54:44 2003
- MKDoc whitelist: updated test case t/402_MKDoc_XML_Stripper_def.t
- Fixed error reporting in MKDoc::XML::TreeBuilder
- Removed "noframes" element from xhtml 1.0 transitional
0.53 Tue Sep 30 10:49:06 2003
- Fixed a structural bug in MKDoc::XML::Token
- Renamed all is_XXX methods to XXX
0.52 Thu Sep 25 14:19:49 2003
- Fixed a bug in MKDoc::XML::Tagger which would happen with Perl 5.6.1
- Added MKDoc::XML::Stripper XHTML specs contributed by Patrick
- Amended MKDoc::XML::Stripper POD accordingly
- Fixed missing dependencies
0.51 Fri Sep 19 16:48:10 2003
- Changed lib/MKDoc/XML.pm and Makefile.PL so that it can be picked
up by Petal's Makefile.PL
0.5 Fri Sep 19 16:37:12 2003
- Improved MKDoc::XML::Encode and MKDoc::XML::Decode
- Added definition loading to MKDoc::XML::Stripper
lib/MKDoc/XML.pm view on Meta::CPAN
general purpose XML unless you know what you're doing.
=back
=head1 WHAT'S IN THE BOX
=head2 XML tokenizer
L<MKDoc::XML::Tokenizer> splits your XML / XHTML files into a list of
L<MKDoc::XML::Token> objects using a single regex.
=head2 XML tree builder
L<MKDoc::XML::TreeBuilder> sits on top of L<MKDoc::XML::Tokenizer> and builds
parsed trees out of your XML / XHTML data.
=head2 XML stripper
L<MKDoc::XML::Stripper> objects removes unwanted markup from your XML / HTML
data. Useful to remove all those nasty presentational tags or 'style'
attributes from your XHTML data for example.
=head2 XML tagger
L<MKDoc::XML::Tagger> module matches expressions in XML / XHTML documents and
tag them appropriately. For example, you could automatically hyperlink certain
glossary words or add <abbr> tags based on a dictionary of abbreviations and
acronyms.
=head2 XML entity decoder
L<MKDoc::XML::Decode> is a pluggable, configurable entity expander module which
currently supports html entities, numerical entities and basic xml entities.
lib/MKDoc/XML/Decode.pm view on Meta::CPAN
# $xml is now "Chris' Baloon"
my $xml = MKDoc::XML::Decode->process ("Chris' Baloon");
=head1 SUMMARY
MKDoc::XML::Decode is a very simple module with pluggable entity decoding mechanism.
At the moment there are three modules:
xml - Decodes ' " > < and &
xhtml - Decodes XHTML entities such as é
numeric - Decodes numeric entities such as A
That's it.
This module and its counterpart L<MKDoc::XML::Encode> are used by L<MKDoc::XML::Dumper>
to XML-encode and XML-decode litterals.
=head1 API
lib/MKDoc/XML/Stripper.pm view on Meta::CPAN
1;
__END__
=head1 NAME
MKDoc::XML::Stripper - Remove unwanted XML / XHTML tags and attributes
=head1 SYNOPSIS
use MKDoc::XML::Stripper;
my $stripper = new MKDoc::XML::Stripper;
$stripper->allow (qw /p class id/);
my $ugly = '<p class="para" style="color:red">Hello, <strong>World</strong>!</p>';
lib/MKDoc/XML/Stripper.pm view on Meta::CPAN
Should print:
<p class="para">Hello, World!</p>
=head1 SUMMARY
MKDoc::XML::Stripper is a class which lets you specify a set of tags and attributes
which you want to allow, and then cheekily strip any XML of unwanted tags and attributes.
In MKDoc, this is used so that editors use structural XHTML rather than presentational tags,
i.e. strip anything which looks like a <font> tag, a 'style' attribute or other tags
which would break separation of structure from content.
=head1 DISCLAIMER
B<This module does low level XML manipulation. It will somehow parse even broken XML
and try to do something with it. Do not use it unless you know what you're doing.>
lib/MKDoc/XML/Stripper.pm view on Meta::CPAN
Available definitions are:
=over
=item xhtml10frameset
=item xhtml10strict
=item xhtml10transitional
=item mkdoc16 - MKDoc 1.6. XHTML structural markup
=back
You can also load your own definition file, for instance:
$stripper->load_def ('my_def.txt');
Definitions are simple text files as follows:
# allow p with 'class' and id
lib/MKDoc/XML/Stripper.pm view on Meta::CPAN
Strips '/an/xml/file.xml' according to the rules that were given with the
allow() and disallow() methods and returns the result. Does not
modify '/an/xml/file.xml' in place.
=head1 NOTES
L<MKDoc::XML::Stripper> does not really parse the XML file you're giving to it
nor does it care if the XML is well-formed or not. It uses L<MKDoc::XML::Tokenizer>
to turn the XML / XHTML file into a series of L<MKDoc::XML::Token> objects
and strictly operates on a list of tokens.
For this same reason MKDoc::XML::Stripper does not support namespaces.
=head1 AUTHOR
Copyright 2003 - MKDoc Holdings Ltd.
Author: Jean-Michel Hiver
lib/MKDoc/XML/Stripper/xhtml10frameset.txt view on Meta::CPAN
# xhtml 1.0 frameset
# http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd
#
# <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
#
# lovingly expanded and sorted from the dtd by paul arzul (patricka@mkdoc.com) 2003/09/24
# a
# anchor
a accesskey
a charset
a class
lib/MKDoc/XML/Stripper/xhtml10strict.txt view on Meta::CPAN
# xhtml 1.0 strict
# http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd
#
# <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
#
# lovingly expanded and sorted from the dtd by paul arzul (patricka@mkdoc.com) 2003/09/24
# a
# anchor
a accesskey
a charset
a class
lib/MKDoc/XML/Stripper/xhtml10transitional.txt view on Meta::CPAN
# xhtml 1.0 transitional
# http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
#
# <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
#
# lovingly expanded and sorted from the dtd by paul arzul (patricka@mkdoc.com) 2003/09/24
# a
# anchor
a accesskey
a charset
a class
a coords
lib/MKDoc/XML/Tagger.pm view on Meta::CPAN
1;
__END__
=head1 NAME
MKDoc::XML::Tagger - Adds XML markup to XML / XHTML content.
=head1 SYNOPSIS
use MKDoc::XML::Tagger;
print MKDoc::XML::Tagger->process_data (
"<p>Hello, World!</p>",
{ _expr => 'World', _tag => 'strong', class => 'superFort' }
);
lib/MKDoc/XML/Tagger.pm view on Meta::CPAN
=head2 my $result = MKDoc::XML::Tagger->process_file ('some/file.xml', @expressions);
Same as process_data(), except it takes its data from 'some/file.xml'.
=head1 NOTES
L<MKDoc::XML::Tagger> does not really parse the XML file you're giving to it
nor does it care if the XML is well-formed or not. It uses L<MKDoc::XML::Tokenizer>
to turn the XML / XHTML file into a series of L<MKDoc::XML::Token> objects
and strictly operates on a list of tokens.
For this same reason MKDoc::XML::Tagger does not support namespaces.
=head1 AUTHOR
Copyright 2003 - MKDoc Holdings Ltd.
Author: Jean-Michel Hiver
lib/MKDoc/XML/Tagger/Preserve.pm view on Meta::CPAN
# -------------------------------------------------------------------------------------
# MKDoc::XML::Tagger::Preserve
# -------------------------------------------------------------------------------------
# Author : Jean-Michel Hiver.
# Copyright : (c) MKDoc Holdings Ltd, 2003
#
# This module uses MKDoc::XML::Tagger, except it preserves specific tags to prevent
# them from being tagged twice. At the moment the module uses regexes to do that so it
# might not be very generic but it should at least work for XHTML <a> tags.
# -------------------------------------------------------------------------------------
package MKDoc::XML::Tagger::Preserve;
use MKDoc::XML::Tagger;
use strict;
use warnings;
use utf8;
our @Preserve = ();
t/700_MKDoc_XML_TreePrinter.t view on Meta::CPAN
use Test::More 'no_plan';
use strict;
use warnings;
use MKDoc::XML::TreeBuilder;
use MKDoc::XML::TreePrinter;
{
my $xml = <<'EOF';
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<foo>This is a <br /> quite good <span class="important">test</span>.
We should see if the method which grabs descendant nodes is:
<?some_pi?>
<ul>
<li>OK</li>
<li>Kind of OK</li>
<li>Completely Fubar</li>
</ul>
<!-- some comment -->
t/data/prova.xhtml view on Meta::CPAN
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:tal="http://purl.org/petal/1.0/">
<head>
<meta name="generator" content="HTML Tidy, see www.w3.org" />
<title>Petal test</title>
<base href="baseh" tal:attributes="href baseh" />
<link rel="stylesheet" href="css/admin.css" type="text/css" />
</head>
t/data/sample.xml view on Meta::CPAN
<!-- warning, this XML is entirely for well-formness tests.
it's not supposed to be valid in any way.
-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:dctype="http://purl.org/dc/dcmitype/">
<!-- let's have a comment -->
<dctype:Text rdf:about="http://mkdoc.com/">