ocr + fedora core and a big book..

Gregory Machin gregory.machin at gmail.com
Mon Jan 16 08:03:20 UTC 2006


I agree with you, but the boss wants ocr.. I think i will leave hime to
figure is out I have to much coding to do .. lol ...

thanks for the input .. have a grate day ..

On 1/13/06, Bill Rugolsky Jr. <brugolsky at telemetry-investments.com> wrote:
>
> On Fri, Jan 13, 2006 at 10:47:02AM +0000, Paul F. Johnson wrote:
> > Grab a copy of gocr, compile and install (it's not in FE which is odd).
> > When you scan, ensure it's at as high a resolution as possible (minimum
> > in my experience of 300 dpi) and grey scaled.
> >
> > Use either gimp or xsane to grab the scan and tell gocr to do it's
> > business.
> >
> > OCR is not an exact science and you will really need to sit down and go
> > through the scanned text to ensure that the numbers scanned are correct
> > (very easy to spot, you may have @ instead of 0, l for 1 and the such).
> > Save the file generated. You may then need to either write a script to
> > delimit using " " as the target or feed it into emacs and then search
> > and replace " " for "," - save.
>
> Sadly, in my (limited) experience, none of the free software solutions
> such as Gocr or Clara OCR is really up to the task.  The leading
> proprietary packages are vastly superior.  Some of them have free 30-day
> evaluations.
>
> With a proper setup for lots of automated training, Clara might be able
> to do the job.  Especially if you do some image morphology (using, e.g.,
> GIMP) to clean up the scans.  But you'll have to do some serious work.
>
> A tried and true technique that avoids using proprietary software
> is to simply pay multiple people to type the whole thing, and then
> reconcile the differences (or use majority voting). :-)
>
> Regards,
>
>         Bill Rugolsky
>



--
Gregory Machin
greg at linuxpro.co.za
gregory.machin at gmail.com
www.linuxpro.co.za
www.exponent.co.za
Web Hosting Solutions
Scalable Linux Solutions
www.iberry.info (support and admin)

+27 72 524 8096
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/fedora-list/attachments/20060116/122e7433/attachment-0001.htm>


More information about the fedora-list mailing list