This day I took on the challenge of setting up a Amazon Micro (Free Tier) machine to run a simple web service for OCR using Tesseract (http://code.google.com/p/tesseract-ocr/).
There is a default web service setup with params:
img: uri to the jpg image that you want to transform
callback: used for crossdomain services
format: deafults to json but you can also supply txt to get just raw text
To build a machine you can follow these steps or if you are interested in the image let us know and I will contact you with more information.
sudo yum -y update
sudo yum -y install libtiff libtiff-devel libjpeg-devel libpng-devel gcc gcc-c++ libtool
cd /tmp/
sudo wget http://www.leptonica.org/source/leptonlib-1.67.tar.gz
sudo gunzip leptonlib-1.67.tar.gz
sudo tar -xvf leptonlib-1.67.tar
cd leptonlib-1.67/
sudo ./configure
sudo make
=== WAIT 10 min ===
sudo make install
cd /tmp/
sudo wget http://tesseract-ocr.googlecode.com/files/tesseract-3.00.tar.gz
sudo gunzip tesseract-3.00.tar.gz
sudo tar -xvf tesseract-3.00.tar
cd tesseract-3.00/
sudo ./runautoconf
sudo ./configure
sudo make
=== WAIT 20 min ===
sudo make install
#===================
#Install English Data
#===================
cd /usr/local/share/tessdata
sudo wget http://tesseract-ocr.googlecode.com/files/eng.traineddata.gz
sudo gzip -d eng.traineddata.gz
=====================
Additonal Tools:
=====================
sudo yum -y install httpd php
sudo chkconfig httpd on
sudo service httpd start
cd /var/www/html/
sudo wget http://tools.silverbiology.com/tesseract/index.txt
sudo mv index.txt index.php
sudo mkdir tmp
sudo chown apache:apache tmp
TEST:
your-machine-address.amazonaws.com/?img=http://www.biosurvey.ou.edu/bebb/imgs/BigLabel.jpg
/?img=http://www.anbg.gov.au/bryophyte/illustrations/flecker-macro.jpg&format=txt
Sample results from Image:
{
success: true
, img: http://www.biosurvey.ou.edu/bebb/imgs/BigLabel.jpg
, value: "PLANTS OF OKLAHOMA ROBERT BEBB HERBARIUM The Umversuly ol oklahoma Oklahoma County Scrophulariaoeae Penslemon oklahomensis Penn. SE Comer ol Tinker AFB. T11N RZW Sec. 26. Topography: rolling upland. Habitat: Mixed-Grass Prairie. Herbaceous perennial, 2-3 dm (all. Flowers while. F L Johnson TNK017 4 May |994 r»~n.~»¢y¢v»~~r~un»»,a|».¢»qus.-M, "
}
Wednesday, December 7, 2011
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment