About OCR international issues

Microsoft Office Document Imaging

Show All

About OCR international issues

When optical character recognition (OCR) is performed on a scanned document, text is recognized using sophisticated pattern-recognition software that compares scanned text characters with a built-in dictionary of character shapes and sequences. The dictionary supplies all uppercase and lowercase letters, punctuation, and accent marks used in the selected language.

By default, Microsoft Office Document Imaging uses the dictionary for the same language that your computer's operating system uses. You can easily change the dictionary to scan documents in other languages.

Note   Office Document Imaging does not provide OCR dictionaries for Hebrew or Arabic.

The OCR language dictionary is used in the following scenarios.

Scanning new documents using Office Document Imaging

Running OCR on previously scanned documents

Adding foreign language text to the index for fast file searches