Alvis-Convert

 view release on metacpan or  search on metacpan

bin/html2alvis  view on Meta::CPAN


__END__

=head1 NAME
    
    html2alvis - HTML to Alvis XML converter
    
=head1 SYNOPSIS
    
    html2alvis [options] [source directory ...]

  Options:

    --html-ext                 HTML file identifying filename extension
    --meta-ext                 meta file identifying filename extension
    --out-dir                  output directory
    --N-per-out-dir            # of records per output directory
    --meta-encoding            the encoding of the meta files
    --html-encoding            the encoding of all HTML files
    --html-encoding-from-meta  take the encoding of the HTML files from
                               the meta files (attribute 'detected-charset')
    --[no]original             include original document?
    --help                     brief help message
    --man                      full documentation
    --[no]warnings             warnings output flag
    
=head1 OPTIONS
    
=over 8

=item B<--html-ext>

    Sets the HTML file identifying filename extension. 
    Default value: 'html'.

=item B<--meta-ext>

    Sets the  meta file identifying filename extension.
    The meta file syntax is

          <feature name>\t<feature value>\n

    Special features are url,title,date,detectedCharSet.
    Default value: 'meta'.

=item B<--out-dir>

    Sets the output directory. Default value: '.'.

=item B<--N-per-out-dir>

    Sets the # of records per output directory. Default value: 1000.

=item B<--meta-encoding>

    Specifies the encoding of all meta files. Default value 'iso-8859-1'.

=item B<--html-encoding>

    Specifies the encoding of all HTML files. Default value 'iso-8859-1'.
    Default: undef (meaning 'guess').

=item B<--html-encoding-from-meta>

    Specifies whether the encoding of an HTML file should be read from
    the corresponding meta file. If no information is given there,
    --html-encoding is used, if that is not given, the encoding is guessed.
    Default: no.

=item B<--[no]original>

    Shall the original document be included in the output? Default
    value: yes.

=item B<--help>

    Prints a brief help message and exits.

=item B<--man>

    Prints the manual page and exits.

=item B<--[no]warnings>

    Output (or suppress) warnings. Default value: yes.

=back

=head1 DESCRIPTION

    Goes recursively through the files under the source directory
    and converts them to Alvis XML files. Meta information (such
    as the URL or the detected character set, title of the document
    etc.) can be given in a separate meta file, one per each document,
    recognized by the shared basename. E.g. the HTML document is
    called foo.original and the meta information is in foo.meta.
    In this case html2alvis should be called like this:
   
          html2.alvis --html-ext original --meta-ext meta  
    
=cut




( run in 0.891 second using v1.01-cache-2.11-cpan-39bf76dae61 )