PDF-OCR2
view release on metacpan or search on metacpan
tesseract
Installing tesseract can be tricky. I don't know of a rpm or debian
package for this one. You'll very likely have to install this from
source. Make sure you have gcc-c++ and automake installed on your
system- id you don't know you can proceed, but if you suffer any errors,
simple go back, install gcc-c++ and automake, and try again.
You may be able to simply install the SVN version of tesseract this way:
$ cd /tmp
$ svn checkout http://tesseract-ocr.googlecode.com/svn/trunk/ tesseract-ocr
$ cd tesseract-ocr
$ ./runautoconf
$ mkdir build-directory
$ cd build-directory
$ ../configure
$ make
$ make install
For more info, see google project on ocr, they use tesseract.
INSTALL PERL MODULES
Ideally, you could simply say:
cpan PDF::OCR2
And, voila- done. And potentially, this might work. If no, I suggest to
install perl modules in following similar order..
perl modules install order
$ cpan PDF::API2
$ cpan CAM::PDF
$ cpan PDF::Burst
$ cpan PDF::GetImages
$ cpan Image::OCR::Tesseract
$ cpan PDF::OCR2
Image::OCR::Tesseract
If the command 'cpan Image::OCR::Tesseract' fails.. You will need to
download the package and install manually from distro.
$ cd /tmp
$ wget http://search.cpan.org/src/LEOCHARRE/Image-OCR-Tesseract-1.22/
$ tar -xvf Image-OCR-T(tab completion)
$ cd Image-OCR-T(tab completion)
$ perl Makefile.PL # or you can do perl t/00(tab completion)
This will check for image libraries and ocr engine. You will need to
have already installed imagemagick and tesseract, as mentioned in this
document.
Make sure you are getting the latest version of Image::OCR::Tesseract,
the above example is for version 1.22. I update frequently- so make
sure. You can search for the latest version by going to
http://search.cpan.org and search for 'Image::OCR::Tesseract'.
There are INSTALL.* readme files in the package Image::OCR::Tesseract
that may want to look through.
Image::Magick
Should already be available ( via previously installing imagemagick ).
BUGS
I am very open to corrections, suggestions, hints, tips, criticism. I am
not a know-it-all, I have been able to do and share some useful things
because of what I learn every day from my peers. Please contact the
AUTHOR.
AUTHOR
Leo Charre leocharre at cpan dot org
( run in 1.598 second using v1.01-cache-2.11-cpan-39bf76dae61 )