EBook-Ishmael

 view release on metacpan or  search on metacpan

Changes  view on Meta::CPAN

29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
1.04  Mar 28 2025
  - Added the -e|--encoding option: Specify the output encoding for outputted
    text.
    - Also added the ISHMAEL_ENCODING environment variable.
  - Improve UTF8-handling for plain text ebook formats (text, zTXT, PalmDoc).
  - Added support for the chawan web browser as a potential HTML formatter.
  - When specifying cover image output, '.-' should be used now instead of '.*'
    for suffix substitution. '.*' is now deprecated, but will still be
    supported. However, it will be removed at some point in a future release.
  - Die when stdout ('-') is given as output argument to --image|-g.
  - '<body>' tags are no longer included in HTML/XHTML html dump.
  - Added additional test environment variables to force enable/disable tests
    for optional ebook formats.
    - TEST_CBR, TEST_CB7, TEST_CHM
  - Fix typos in documentation.
 
1.03  Mar 21 2025
  - Add support for the KF8/AZW3 ebook format.
  - Fix raw text dumping in Mobis.
  - Fix Mobi HTML cleanup.
  - ishmael verifies Mobi image records actually contain image data.

Changes  view on Meta::CPAN

108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
      single quotes instead of double quotes. This should mean that arguments
      with shell metacharacters should not cause unwanted behavior.
    - ishmael no longer relies on an EPUB's metadata file to specify the 'dc'
      namespace, which should fix reading some unconventionally formatted
      EPUBs.
    - ishmael now converts CP1252-encoded Mobis to UTF-8.
    - Unix time handling has been fixed for PDB-based formats (Mobi, AZW,
      PalmDoc, zTXT).
    - ishmael no longer recognizes unset creation/modification dates in
      PDB-based formats.
    - Fixed HTML/XHTML identification heuristics.
    - Fix documentation typos.
    - Fix test typos.
  Improvements:
    - Format identification heuristics have been optimized.
 
0.07  Feb 25 2025
  - Added -r/--raw option: Dumps the raw, unformatted text contents of a given
    ebook.
  - Added -c/--cover option: Dump the cover image of a given ebook if one is
    present.
    - As a result, pdftopng is an additional dependency if one wishes to dump
      PDF covers. pdftopng should be included with most versions of
      poppler-utils.
    - MIME::Base64 was also added as a dependency, although it should be
      included with Perl core.
  - XHTML is now considered a seperate format from HTML (although its class is
    derived from the HTML class, so it should act mostly the same except for
    being called XHTML rather than HTML).
  - Recognize some more FictionBook2 metadata.
  - Improve some format heuristics.
    - FictionBook2, HTML, XHTML
  - When reading EPUBs, try not to dump items that are not under the
    "application/xhtml+xml" media type.
  - Moved PDB modules out of EBook namespace.
  - Removed EBook::Ishmael::EBook::Skeleton.
 
0.06  Feb 22 2025
  - When ran with no arguments, queequeg reads input from stdin.

lib/EBook/Ishmael/EBook.pm  view on Meta::CPAN

17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
 
our %EBOOK_FORMATS = map { lc $_ => "EBook::Ishmael::EBook::$_" } qw(
        CB7 CBR CBZ CHM Epub FictionBook2 HTML KF8 Mobi PalmDoc PDF Text XHTML Zip
        zTXT
);
 
sub ebook_id {
 
        my $file = shift;
 
        open my $fh, '<', $file
                or die "Failed to open $file for reading: $!\n";
        binmode $fh;

lib/EBook/Ishmael/TextBrowserDump.pm  view on Meta::CPAN

189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
=over 4
 
=item browser
 
The specific browser you would like to use for the dumping. See above for a list
of valid browsers. If not specified, defaults to the first browser
C<browser_dump()> finds installed on your system.
 
=item xhtml
 
Bool specifying whether the input file is XHTML or not. Defaults to C<0>.
 
=item width
 
Specify the width of the formatted text. Defaults to C<80>.
 
=back
 
=head1 GLOBAL VARIABLES
 
=head2 $EBook::Ishmael::TextBrowserDump::CAN_DUMP

t/data/gpl3.html  view on Meta::CPAN

1
2
3
4
5
6
7
8
9
10
11
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN">
<html lang="en">
<head>
<meta name="generator" content=
"HTML Tidy for HTML5 for Linux version 5.8.0">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>GNU General Public License v3.0 - GNU Project - Free
Software Foundation (FSF)</title>
<link rel="alternate" type="application/rdf+xml" href=
</head>

t/data/gpl3.xhtml  view on Meta::CPAN

1
2
3
4
5
6
7
8
9
10
11
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <title>GNU General Public License v3.0 - GNU Project - Free Software Foundation (FSF)</title>
 <link rel="alternate" type="application/rdf+xml"
</head>
<body>



( run in 0.533 second using v1.01-cache-2.11-cpan-3cd7ad12f66 )