EBook-Ishmael

 view release on metacpan or  search on metacpan

Changes  view on Meta::CPAN

1.04  Mar 28 2025
  - Added the -e|--encoding option: Specify the output encoding for outputted
    text.
    - Also added the ISHMAEL_ENCODING environment variable.
  - Improve UTF8-handling for plain text ebook formats (text, zTXT, PalmDoc).
  - Added support for the chawan web browser as a potential HTML formatter.
  - When specifying cover image output, '.-' should be used now instead of '.*'
    for suffix substitution. '.*' is now deprecated, but will still be
    supported. However, it will be removed at some point in a future release.
  - Die when stdout ('-') is given as output argument to --image|-g.
  - '<body>' tags are no longer included in HTML/XHTML html dump.
  - Added additional test environment variables to force enable/disable tests
    for optional ebook formats.
    - TEST_CBR, TEST_CB7, TEST_CHM
  - Fix typos in documentation.

1.03  Mar 21 2025
  - Add support for the KF8/AZW3 ebook format.
  - Fix raw text dumping in Mobis.
  - Fix Mobi HTML cleanup.
  - ishmael verifies Mobi image records actually contain image data.

Changes  view on Meta::CPAN

      single quotes instead of double quotes. This should mean that arguments
      with shell metacharacters should not cause unwanted behavior.
    - ishmael no longer relies on an EPUB's metadata file to specify the 'dc'
      namespace, which should fix reading some unconventionally formatted
      EPUBs.
    - ishmael now converts CP1252-encoded Mobis to UTF-8.
    - Unix time handling has been fixed for PDB-based formats (Mobi, AZW,
      PalmDoc, zTXT).
    - ishmael no longer recognizes unset creation/modification dates in
      PDB-based formats.
    - Fixed HTML/XHTML identification heuristics.
    - Fix documentation typos.
    - Fix test typos.
  Improvements:
    - Format identification heuristics have been optimized.

0.07  Feb 25 2025
  - Added -r/--raw option: Dumps the raw, unformatted text contents of a given
    ebook.
  - Added -c/--cover option: Dump the cover image of a given ebook if one is
    present.
    - As a result, pdftopng is an additional dependency if one wishes to dump
      PDF covers. pdftopng should be included with most versions of
      poppler-utils.
    - MIME::Base64 was also added as a dependency, although it should be
      included with Perl core.
  - XHTML is now considered a seperate format from HTML (although its class is
    derived from the HTML class, so it should act mostly the same except for
    being called XHTML rather than HTML).
  - Recognize some more FictionBook2 metadata.
  - Improve some format heuristics.
    - FictionBook2, HTML, XHTML
  - When reading EPUBs, try not to dump items that are not under the
    "application/xhtml+xml" media type.
  - Moved PDB modules out of EBook namespace.
  - Removed EBook::Ishmael::EBook::Skeleton.

0.06  Feb 22 2025
  - When ran with no arguments, queequeg reads input from stdin.

lib/EBook/Ishmael/EBook.pm  view on Meta::CPAN

use EBook::Ishmael::EBook::HTML;
use EBook::Ishmael::EBook::KF8;
use EBook::Ishmael::EBook::Mobi;
use EBook::Ishmael::EBook::PalmDoc;
use EBook::Ishmael::EBook::PDF;
use EBook::Ishmael::EBook::Text;
use EBook::Ishmael::EBook::XHTML;
use EBook::Ishmael::EBook::zTXT;

our %EBOOK_FORMATS = map { lc $_ => "EBook::Ishmael::EBook::$_" } qw(
	CB7 CBR CBZ CHM Epub FictionBook2 HTML KF8 Mobi PalmDoc PDF Text XHTML zTXT
);

sub ebook_id {

	my $file = shift;

	open my $fh, '<', $file
		or die "Failed to open $file for reading: $!\n";
	binmode $fh;

lib/EBook/Ishmael/TextBrowserDump.pm  view on Meta::CPAN

=over 4

=item browser

The specific browser you would like to use for the dumping. See above for a list
of valid browsers. If not specified, defaults to the first browser
C<browser_dump()> finds installed on your system.

=item xhtml

Bool specifying whether the input file is XHTML or not. Defaults to C<0>.

=item width

Specify the width of the formatted text. Defaults to C<80>.

=back

=head1 GLOBAL VARIABLES

=head2 $EBook::Ishmael::TextBrowserDump::CAN_DUMP

t/data/gpl3.html  view on Meta::CPAN

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN">
<html lang="en">
<head>
<meta name="generator" content=
"HTML Tidy for HTML5 for Linux version 5.8.0">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>GNU General Public License v3.0 - GNU Project - Free
Software Foundation (FSF)</title>
<link rel="alternate" type="application/rdf+xml" href=
"http://www.gnu.org/licenses/gpl-3.0.rdf">
</head>

t/data/gpl3.xhtml  view on Meta::CPAN

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <title>GNU General Public License v3.0 - GNU Project - Free Software Foundation (FSF)</title>
 <link rel="alternate" type="application/rdf+xml"
       href="http://www.gnu.org/licenses/gpl-3.0.rdf" /> 
</head>
<body>



( run in 0.612 second using v1.01-cache-2.11-cpan-49f99fa48dc )