EBook-Tools
view release on metacpan or search on metacpan
lib/EBook/Tools/Mobipocket.pm view on Meta::CPAN
} while($bitpos < length($data) * 8);
# debug(2,"DEBUG: returning from depth ",$depth," with:");
# debug(2," '",excerpt_line($text),"'");
return $text;
}
=head2 C<unpack_mobi_language($data)>
Takes as an argument 4 bytes of data. If less data is provided, the
sub croaks. If more, a debug warning is provided, but the sub
continues.
In scalar context returns a language string mostly (but not entirely)
conformant to the IANA language subtag registry codes.
In list context, returns the language string, an unknown code integer,
a region code integer, and a language code integer, with the last
three being directly unpacked values.
See C<%mobilangcodes> for an exact map of values. Note that the
bottom two bits of the region code appear to be unused (i.e. the
values are all multiples of 4). The unknown code integer appears to
be unused, and is generally zero.
The original implementation by Mobipocket may have been via
Microsoft's .NET CultureInfo class. See:
L<http://msdn.microsoft.com/en-us/library/system.globalization.cultureinfo(VS.71).aspx>
=cut
sub unpack_mobi_language
{
my $data = shift;
my $subname = ( caller(0) )[3];
debug(2,"DEBUG[",$subname,"]");
croak($subname,"(): no language data provided")
unless($data);
croak($subname,"(): language data is too short (only ",length($data),
" bytes, need 4\n")
if(length($data) < 4);
debug(1,$subname,"(): expected 4 bytes of data, but received ",
length($data))
if(length($data) > 4);
my ($unknowncode,$regioncode,$languagecode) = unpack('nCC',$data);
my $language = parse_mobi_language($languagecode,$regioncode);
my @returnlist = ($language,$unknowncode,$regioncode,$languagecode);
if(wantarray) { return @returnlist; }
else { return $returnlist[0]; }
}
########## END CODE ##########
=head1 BUGS AND LIMITATIONS
=over
=item * Unpacking DRM-protected text isn't supported. Although
infrastructure may be added later to make use of external helpers and
plugins, direct DRM support will never be added to the main code for
legal reasons.
=item * Repacking a .prc without fully extracting to OPF and
completely converting back isn't supported. This will have to be
implemented before an interface to perform minor metadata alterations
can be implemented.
=item * Mobipocket HUFF/CDIC decoding (used mostly on dictionaries)
isn't well documented.
=item * Not all Mobipocket data is understood, so a conversion from
OPF to Mobipocket .prc back to OPF will not result in all data being
retained. Patches welcome.
=item * Mobipocket INDX, DATP, FCIS, and FLIS records are not
understood and are completely ignored
=item * Mobipocket EXTH subjectcode records may not end up attached to
the correct subject element if the number of subject records differs
from the number of subjectcode records. This is because the
Mobipocket format leaves the EXTH subjectcode records completely
unlinked from the subject records, and there is no way to detect if a
subject with no associated subjectcode comes before a subject with an
associated subjectcode.
Fortunately, this should rarely be a problem with real data, as
Mobipocket Creator only allows a single subject to be set, and the
only other way to have a subjectcode attached to a subject is to
manually edit the OPF file and insert an additional dc:Subject element
with a BASICCode attribute.
Mobipocket has indicated that they may move data currently in their
custom elements and attributes to the standard <meta> elements in a
future release, so this problem may become moot then.
=back
=head1 AUTHOR
Zed Pobre <zed@debian.org>
=head1 LICENSE AND COPYRIGHT
Copyright 2008 Zed Pobre
Licensed to the public under the terms of the GNU GPL, version 2
=cut
1;
( run in 0.789 second using v1.01-cache-2.11-cpan-39bf76dae61 )