App-DocKnot
view release on metacpan or search on metacpan
lib/App/DocKnot/Spin/Text.pm view on Meta::CPAN
Text marked as C<*bold*> using the standard asterisk notation will be
surrounded by C<< <strong> >> tags, if the asterisks appear to be marking bold
text rather than serving as wildcards or some other function.
App::DocKnot::Spin::Text produces output (at least in the absence of any
lurking bugs) which complies with the XHTML 1.0 Transitional standard. The
input and output character set is assumed to be UTF-8.
=head1 CLASS METHODS
=over 4
=item new(ARGS)
Create a new App::DocKnot::Spin::Text object. A single converter object can
be reused to convert multiple files provided that they have the same options.
ARGS should be a hash reference with one or more of the following keys, all of
which are optional:
=over 4
=item output
The path to the root of the output tree when converting a tree of files. This
will be used to calculate relative path names for generating inter-page links
using the provided C<sitemap> argument. If C<sitemap> is given, this option
should also always be given.
=item modified
Add a last modified subheader to the document. This will always be done if an
RCS C<Id> string is present in the input. Otherwise, a last modified
subheader based on the last modification date of the input file will be added
if the input is a file and this option is set to a true value. The default is
false.
=item sitemap
An App::DocKnot::Spin::Sitemap object. This will be used to create inter-page
links. For inter-page links, the C<output> argument must also be provided.
=item style
The URL to the style sheet to use. The appropriate HTML will be added to the
C<< <head> >> section of the resulting document.
=item title
The HTML page title to use. This will also be used as the C<< <h1> >> heading
if the document doesn't contain one, but will not override a heading found in
the document (only the HTML C<< <title> >> attribute).
=back
=back
=head1 INSTANCE METHODS
=over 4
=item spin_text_file([INPUT[, OUTPUT]])
Convert a single text file to HTML. INPUT is the path of the input file and
OUTPUT is the path of the output file. OUTPUT or both INPUT and OUTPUT may be
omitted, in which case standard input or standard output, respectively, will
be used.
If OUTPUT is omitted, App::DocKnot::Spin::Text will not be able to obtain
sitemap information even if a sitemap was provided, and therefore will not add
inter-page links.
=back
=head1 NOTES
I wrote this program because every other text to HTML converter that I've seen
made specific assumptions about the document format and wanted you to write
like it wanted you to write rather than like the way you wanted to write.
This program instead wants you to write like I write, which from my
perspective is an improvement.
I don't claim that this is the be-all and end-all of text to HTML converters,
as I don't believe such a beast exists. I do believe it's pretty close to
being the be-all and end-all of text to HTML converters for text that I
personally have written, since I've written into it a lot of knowledge of the
sorts of text formatting conventions that I use. If you happen to use the
same ones, you may be delighted with this module. If you don't, you'll
probably be very frustrated with it.
In any case, I took to this project the perspective that whenever there was
something this program couldn't handle, I wanted to make it smarter rather
than change the input. I've mostly been successful at that, so far.
=head1 CAVEATS
This program attempts to intuit structure from an unstructured markup format.
It therefore relies on a whole bunch of fussy heuristics, poorly-understood
assumptions, and sheer blind luck. To fully document the boundary cases of
this program would take more time and patience than I care to invest; see the
source code if you're curious. This is not a predictable or easily
documentable program. Instead, it attempts to do what I mean without bugging
me about it.
There is therefore, at least currently, no way to control or adjust parameters
in this program without editing it. I may someday add that, but I'm leery of
it, since the code complexity would start increasing exponentially if I tried
to let people tweak everything. I've given up on more than one text to HTML
converter because it had more options than B<ls> and expected you to try to
figure out which ones should be used for a document yourself.
English month names are used for the last modification dates, and the
resulting HTML always declares that the document is in English. This could be
made configurable if anyone wishes.
=head1 AUTHOR
Russ Allbery <rra@cpan.org>
=head1 COPYRIGHT AND LICENSE
Copyright 1999-2002, 2004-2005, 2008, 2010, 2013, 2021-2024 Russ Allbery
<rra@cpan.org>
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
( run in 0.508 second using v1.01-cache-2.11-cpan-39bf76dae61 )