Scan document -> OCR-> Searchable PDF

You can ask general questions, share opinions or advices about doPDF.
Post Reply
anon
Posts: 38
Joined: Tue Jun 03, 2008 8:42 am

Post by anon »

I have scanned in a letter, but I want to create a PDF so that I can post it on a web site, but I also want the PDF to be 'searchable'.
I have used Microsoft Office Document Imaging and it has created a TIFF file and the OCR has been performed. But when I use doPDF, it creates the PDF as an image and not Searchable PDF. So If I do find the word "Dear", it does not find anythign because it treats it like an image.
Any solutions?


Claudiu (Softland)
Posts: 1565
Joined: Thu May 23, 2013 7:19 am

Post by Claudiu (Softland) »

I don't know if MODI allows you to save in an .rtf format (or .doc) after performing OCR, but if it doesn't you could copy/paste the OCR-ed document into word and convert it with doPDF from there - this should make it searchable. When you convert TIFF to PDF it will be converted as an image, that's why you have to use the rtf solution.

Follow us to stay updated:

anon
Posts: 38
Joined: Tue Jun 03, 2008 8:42 am

Post by anon »

Microsoft Office Document Imaging has only option to save as TIFF or .mdi file (2003 ed). So no option to save as RTF file. However, it does have the option to save as a Word file. However, there is a problem with approach is that I loose the entire document e.g letter head. All it does it copy the text. As an example, if you got a letter from the President, you would want to retain the 'original' document e.g. preseve letter head etc...


Claudiu (Softland)
Posts: 1565
Joined: Thu May 23, 2013 7:19 am

Post by Claudiu (Softland) »

As an example, if you got a letter from the President, you would want to retain the 'original' document e.g. preseve letter head etc
I would say it depends which president you got that from :).
Anyway, I can't think of another solution for having the scanned document converted. It would be tedious, but if you have the text saved in a word document you could include the letter head as an image in the word document (cropping it from your scan image).

Follow us to stay updated:

kryzstoff
Posts: 5
Joined: Fri Oct 31, 2008 5:31 am

Post by kryzstoff »

anon; you should look at OpenOffice.org for a free and simple solution -- whilst Sun's flexible suite isn't much to look at, it's every bit as powerful as Microsoft Office and with it's PDF editing features, even more so.
P.S. (of course, doPDF is what you'll need for all your non-office documents, eg. printing from the internet, CAD programs, etcetera :-)


ochin
Posts: 1
Joined: Fri Feb 06, 2009 12:12 pm

Post by ochin »

I think you have your answer in the following tutorial:

http://www.wac.ohio-state.edu/pdf/scan/pdffromscan.html


Post Reply