EBook-Ishmael
view release on metacpan or search on metacpan
1.04 Mar 28 2025
- Added the -e|--encoding option: Specify the output encoding for outputted
text.
- Also added the ISHMAEL_ENCODING environment variable.
- Improve UTF8-handling for plain text ebook formats (text, zTXT, PalmDoc).
- Added support for the chawan web browser as a potential HTML formatter.
- When specifying cover image output, '.-' should be used now instead of '.*'
for suffix substitution. '.*' is now deprecated, but will still be
supported. However, it will be removed at some point in a future release.
- Die when stdout ('-') is given as output argument to --image|-g.
- '<body>' tags are no longer included in HTML/XHTML html dump.
- Added additional test environment variables to force enable/disable tests
for optional ebook formats.
- TEST_CBR, TEST_CB7, TEST_CHM
- Fix typos in documentation.
1.03 Mar 21 2025
- Add support for the KF8/AZW3 ebook format.
- Fix raw text dumping in Mobis.
- Fix Mobi HTML cleanup.
- ishmael verifies Mobi image records actually contain image data.
single quotes instead of double quotes. This should mean that arguments
with shell metacharacters should not cause unwanted behavior.
- ishmael no longer relies on an EPUB's metadata file to specify the 'dc'
namespace, which should fix reading some unconventionally formatted
EPUBs.
- ishmael now converts CP1252-encoded Mobis to UTF-8.
- Unix time handling has been fixed for PDB-based formats (Mobi, AZW,
PalmDoc, zTXT).
- ishmael no longer recognizes unset creation/modification dates in
PDB-based formats.
- Fixed HTML/XHTML identification heuristics.
- Fix documentation typos.
- Fix test typos.
Improvements:
- Format identification heuristics have been optimized.
0.07 Feb 25 2025
- Added -r/--raw option: Dumps the raw, unformatted text contents of a given
ebook.
- Added -c/--cover option: Dump the cover image of a given ebook if one is
present.
- As a result, pdftopng is an additional dependency if one wishes to dump
PDF covers. pdftopng should be included with most versions of
poppler-utils.
- MIME::Base64 was also added as a dependency, although it should be
included with Perl core.
- XHTML is now considered a seperate format from HTML (although its class is
derived from the HTML class, so it should act mostly the same except for
being called XHTML rather than HTML).
- Recognize some more FictionBook2 metadata.
- Improve some format heuristics.
- FictionBook2, HTML, XHTML
- When reading EPUBs, try not to dump items that are not under the
"application/xhtml+xml" media type.
- Moved PDB modules out of EBook namespace.
- Removed EBook::Ishmael::EBook::Skeleton.
0.06 Feb 22 2025
- When ran with no arguments, queequeg reads input from stdin.
lib/EBook/Ishmael/EBook.pm view on Meta::CPAN
use EBook::Ishmael::EBook::HTML;
use EBook::Ishmael::EBook::KF8;
use EBook::Ishmael::EBook::Mobi;
use EBook::Ishmael::EBook::PalmDoc;
use EBook::Ishmael::EBook::PDF;
use EBook::Ishmael::EBook::Text;
use EBook::Ishmael::EBook::XHTML;
use EBook::Ishmael::EBook::zTXT;
our %EBOOK_FORMATS = map { lc $_ => "EBook::Ishmael::EBook::$_" } qw(
CB7 CBR CBZ CHM Epub FictionBook2 HTML KF8 Mobi PalmDoc PDF Text XHTML zTXT
);
sub ebook_id {
my $file = shift;
open my $fh, '<', $file
or die "Failed to open $file for reading: $!\n";
binmode $fh;
lib/EBook/Ishmael/TextBrowserDump.pm view on Meta::CPAN
=over 4
=item browser
The specific browser you would like to use for the dumping. See above for a list
of valid browsers. If not specified, defaults to the first browser
C<browser_dump()> finds installed on your system.
=item xhtml
Bool specifying whether the input file is XHTML or not. Defaults to C<0>.
=item width
Specify the width of the formatted text. Defaults to C<80>.
=back
=head1 GLOBAL VARIABLES
=head2 $EBook::Ishmael::TextBrowserDump::CAN_DUMP
t/data/gpl3.html view on Meta::CPAN
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN">
<html lang="en">
<head>
<meta name="generator" content=
"HTML Tidy for HTML5 for Linux version 5.8.0">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>GNU General Public License v3.0 - GNU Project - Free
Software Foundation (FSF)</title>
<link rel="alternate" type="application/rdf+xml" href=
"http://www.gnu.org/licenses/gpl-3.0.rdf">
</head>
t/data/gpl3.xhtml view on Meta::CPAN
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>GNU General Public License v3.0 - GNU Project - Free Software Foundation (FSF)</title>
<link rel="alternate" type="application/rdf+xml"
href="http://www.gnu.org/licenses/gpl-3.0.rdf" />
</head>
<body>
( run in 0.612 second using v1.01-cache-2.11-cpan-49f99fa48dc )