[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: pdf documents



Yes, but unless I'm badly mistaken, it is very old and doesn't support directly extracting images from pdf files. You would still need to install the xpdf package to get the pdfimages utility so you can process the images as single files. I read about the OCR package you describe but I'm fairly sure it's old and unmaintained. Maybe someone was going to take over development, I'm not sure. I've noticed that most pdf files are text and don't have page images, or if they do, the images are pictures so would be useless anyway. Also, what is the accuracy rate for this OCR package? What about accessibility?

Matt Barnes wrote:
Tesseract is an OCR and can convert pdf's and images to text. I haven't gotten around to installing it and trying it out, but it seems like the OCR of choice, located here:
http://sourceforge.net/project/showfiles.php?group_id=158586


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]