EBook-Ishmael
view release on metacpan or search on metacpan
2930313233343536373839404142434445464748491.04 Mar 28 2025
- Added the -e|--encoding option: Specify the output encoding
for
outputted
text.
- Also added the ISHMAEL_ENCODING environment variable.
- Improve UTF8-handling
for
plain text ebook formats (text, zTXT, PalmDoc).
- Added support
for
the chawan web browser as a potential HTML formatter.
- When specifying cover image output,
'.-'
should be used now instead of
'.*'
for
suffix substitution.
'.*'
is now deprecated, but will still be
supported. However, it will be removed at some point in a future release.
- Die
when
stdout (
'-'
) is
given
as output argument to --image|-g.
-
'<body>'
tags are
no
longer included in HTML/XHTML html
dump
.
- Added additional test environment variables to force enable/disable tests
for
optional ebook formats.
- TEST_CBR, TEST_CB7, TEST_CHM
- Fix typos in documentation.
1.03 Mar 21 2025
- Add support
for
the KF8/AZW3 ebook
format
.
- Fix raw text dumping in Mobis.
- Fix Mobi HTML cleanup.
- ishmael verifies Mobi image records actually contain image data.
108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146
single quotes instead of double quotes. This should mean that arguments
with
shell metacharacters should not cause unwanted behavior.
- ishmael
no
longer relies on an EPUB
's metadata file to specify the '
dc'
namespace, which should fix reading some unconventionally formatted
EPUBs.
- ishmael now converts CP1252-encoded Mobis to UTF-8.
- Unix
time
handling
has
been fixed
for
PDB-based formats (Mobi, AZW,
PalmDoc, zTXT).
- ishmael
no
longer recognizes unset creation/modification dates in
PDB-based formats.
- Fixed HTML/XHTML identification heuristics.
- Fix documentation typos.
- Fix test typos.
Improvements:
- Format identification heuristics have been optimized.
0.07 Feb 25 2025
- Added -r/--raw option: Dumps the raw, unformatted text contents of a
given
ebook.
- Added -c/--cover option: Dump the cover image of a
given
ebook
if
one is
present.
- As a result, pdftopng is an additional dependency
if
one wishes to
dump
PDF covers. pdftopng should be included
with
most versions of
poppler-utils.
- MIME::Base64 was also added as a dependency, although it should be
included
with
Perl core.
- XHTML is now considered a seperate
format
from HTML (although its class is
derived from the HTML class, so it should act mostly the same except
for
being called XHTML rather than HTML).
- Recognize some more FictionBook2 metadata.
- Improve some
format
heuristics.
- FictionBook2, HTML, XHTML
- When reading EPUBs,
try
not to
dump
items that are not under the
"application/xhtml+xml"
media type.
- Moved PDB modules out of EBook namespace.
- Removed EBook::Ishmael::EBook::Skeleton.
0.06 Feb 22 2025
- When ran
with
no
arguments, queequeg reads input from stdin.
lib/EBook/Ishmael/EBook.pm view on Meta::CPAN
171819202122232425262728293031323334353637our
%EBOOK_FORMATS
=
map
{
lc
$_
=>
"EBook::Ishmael::EBook::$_"
}
qw(
CB7 CBR CBZ CHM Epub FictionBook2 HTML KF8 Mobi PalmDoc PDF Text XHTML Zip
zTXT
)
;
sub
ebook_id {
my
$file
=
shift
;
open
my
$fh
,
'<'
,
$file
or
die
"Failed to open $file for reading: $!\n"
;
binmode
$fh
;
lib/EBook/Ishmael/TextBrowserDump.pm view on Meta::CPAN
189190191192193194195196197198199200201202203204205206207208209=over 4
=item browser
The specific browser you would like to use for the dumping. See above for a list
of valid browsers. If not specified, defaults to the first browser
C<browser_dump()> finds installed on your system.
=item xhtml
Bool specifying whether the input file is XHTML or not. Defaults to C<0>.
=item width
Specify the width of the formatted text. Defaults to C<80>.
=back
=head1 GLOBAL VARIABLES
=head2 $EBook::Ishmael::TextBrowserDump::CAN_DUMP
t/data/gpl3.html view on Meta::CPAN
1234567891011<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Strict//EN"
>
<html lang=
"en"
>
<head>
<meta name=
"generator"
content=
"HTML Tidy for HTML5 for Linux version 5.8.0"
>
<meta http-equiv=
"Content-Type"
content=
"text/html; charset=utf-8"
>
<title>GNU General Public License v3.0 - GNU Project - Free
Software Foundation (FSF)</title>
<
link
rel=
"alternate"
type=
"application/rdf+xml"
href=
</head>
t/data/gpl3.xhtml view on Meta::CPAN
1234567891011<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Strict//EN"
<head>
<meta http-equiv=
"Content-Type"
content=
"text/html; charset=utf-8"
/>
<title>GNU General Public License v3.0 - GNU Project - Free Software Foundation (FSF)</title>
<
link
rel=
"alternate"
type=
"application/rdf+xml"
</head>
<body>
( run in 0.533 second using v1.01-cache-2.11-cpan-3cd7ad12f66 )