Extracting ASCII text from a PDF Document

Kirk Reiser kirk at braille.uwo.ca
Thu Aug 12 12:37:43 UTC 2010


What happens when you run pdftotext on the file?

On Thu, 12 Aug 2010, Martin McCormick wrote:

> I have a PDF document that does have embedded ASCII text in it.
> It plays fine on a Macintosh that has no OCR software on it but
> uses Voiceover. Voiceover just runs on ASCII so the ASCII is
> there.
>
> 	I need to use the file on a Debian system so I hope I am
> just using a2ps and pstotext wrong.
>
> 	if one uses pstotext on this document, it immediately
> errors out. If I use a2ps and give it -o outfilename.ps, a2ps
> runs but I may be producing an image file as there is no text
> from the document, talk about sound and fury signifying nothing.
>
> 	If one runs pstotext on that output file, one gets a
> single form feed for each page and nothing else.
>
> 	The PDF document is not protected.
>
> 	Any suggestions as to how to extract the text are
> welcome. Thanks.
>
> Martin McCormick
>
> _______________________________________________
> Blinux-list mailing list
> Blinux-list at redhat.com
> https://www.redhat.com/mailman/listinfo/blinux-list
>

--
Kirk Reiser				The Computer Braille Facility
e-mail: kirk at braille.uwo.ca		University of Western Ontario
phone: (519) 661-3061




More information about the Blinux-list mailing list