Convert PDF to Text?

Doncho N. Gunchev gunchev at gmail.com
Sun Apr 22 13:20:28 UTC 2007


On Sunday 2007-04-22 00:31:51 Keith G. Robertson-Turner wrote:
> I have some PDF documents that are photocopied text documents (embedded
> image, rather than text glyphs). When I open these with Evince, I am
> able to copy and paste the actual text. At first I though this was some
> kind of OCR process, but then I realised it's actually the document
> itself, which has the original text embedded in it (OCRed and embedded
> during the original scan).
>
> Is there any command I can use to extract the text from these PDF
> documents in a batch? I have a couple of thousand documents that need
> converting.
>
> Just curious, since if Evince can obviously do it (manually) then the
> necessary library components (at least) must be installed (FC6).
>

kwrite from koffice can read and edit .pdf  files (quite well), so you should 
be able to save it as plain text. I guess that with dcop you can make a 
script to do this with multiple files for you.

-- 
Regards,
  Doncho




More information about the fedora-list mailing list