About OCR international issues

Microsoft Office Document Imaging

previous page next page

About OCR international issues

When optical character recognition (OCR) is performed on a scanned document, text is recognized using sophisticated pattern-recognition software that compares scanned text characters with a built-in dictionary of character shapes and sequences. The dictionary supplies all uppercase and lowercase letters, punctuation, and accent marks used in the selected language.

By default, Microsoft Office Document Imaging uses the dictionary for the same language that your computer's operating system uses. You can easily change the dictionary to scan documents in other languages.

Note Office Document Imaging does not provide OCR dictionaries for Hebrew or Arabic.

The OCR language dictionary is used in the following scenarios.

Scanning new documents using Office Document Imaging

By default, OCR is performed automatically after scanning. When you click Scan New Document on the File menu, the Microsoft Office Document Scanning dialog box lists a number of scanning presets designed to maximize scanning efficiency for different purposes. Each preset allows you to specify a different OCR language to be used whenever that preset is used.

In the Microsoft Office Document Scanning dialog box, click Preset options and then click either Create new preset or Edit selected preset. On the Processing tab of the Preset Options dialog box, select the language you want from the OCR Language list.

Running OCR on previously scanned documents

You can specify an OCR language dictionary if you want to run OCR manually. Click Options on the View menu and select the dictionary you want from the OCR Language list. Then click Recognize Text Using OCR on the File menu.

Adding foreign language text to the index for fast file searches

Indexing is a special service that enables fast file searches on your computer. Text found in files on your computer is added to the index, which also stores a reference to the file where the text was found. Text in any Tagged Image File Format (TIFF) files on your computer is added to the index by default.

To change the language dictionary used to index TIFF files on which OCR has not already been performed, click Options on the View menu and then click Indexing Service. In the Indexing Service dialog box, select the dictionary you want from the OCR Language list.

Tip

You can create special presets for scanning foreign-language documents. On the File menu, click Scan New Document, and then select a preset from the list to use as the basis for your new preset. Click Preset options, and then click Create new preset.

Type a name for the new preset and click OK. On the Processing tab of the Preset Options dialog box, select the language you want from the OCR Language list.

You can create a shortcut for your new preset. On the General tab, click Create Shortcut after selecting the preset options you want.

previous page start next page