About OCR international issues

Microsoft Office Document Imaging

optical character recognition (OCR) is performed on a scanned document, text is recognized using sophisticated pattern-recognition software that compares scanned text characters with a built-in dictionary of character shapes and sequences. The dictionary supplies all uppercase and lowercase letters, punctuation, and accent marks used in the selected language.

By default, Microsoft Office Document Imaging uses the dictionary for the same language that your other Microsoft Office applications use. You can easily change the dictionary to scan documents in other languages.

The OCR language dictionary is used in the following scenarios.

ShowScanning new documents using Office Document Imaging

ShowRunning OCR on previously scanned documents

ShowAdding foreign language text to the index for fast file searches