OCR Training Interface Tutorial

NI OCR Training Interface

OCR Training Interface Tutorial

This tutorial walks you through the steps for training characters with OCR using a specific set of images. This tutorial illustrates the various steps required to train varying representations of the same characters.

This tutorial includes the following sections:

Training Characters with OCR

Complete the following steps to train characters with OCR:

  1. Launch the OCR Training Interface.
  2. Select File»Open Images.
  3. Complete one of the following sets of steps to open the tutorial images:
    • NI Vision and Vision Assistant
      1. Navigate to <Vision>\Images\OCR Tutorial, where <Vision> is the location to which you installed the NI Vision Development Module.
      2. Select the following images:
        Tip  You can select multiple image files by pressing the <Ctrl> key and clicking each file. Enable the Select all files checkbox to open all images in the directory you specified.
        • NIOCRExample1.tif
        • NIOCRExample2.tif
        • NIOCRExample3.tif
        • NIOCRExample4.tif
        • NIOCRExample5.tif
      3. Click Open.
    • Vision Builder AI
      1. Navigate to <Vision Builder AI>\DemoImg\OCR, where <Vision Builder AI> is the location to which you installed Vision Builder AI.
      2. Select the following images:
        Tip  You can select multiple image files by pressing the <Ctrl> key and clicking each file. Enable the Select all files checkbox to open all images in the directory you specified.
        • NIOCRExample1.tif
        • NIOCRExample2.tif
        • NIOCRExample3.tif
        • NIOCRExample4.tif
        • NIOCRExample5.tif
      3. Click Open.
  4. Use the navigation buttons to locate NIOCRExample1.tif.
  5. Verify that the Rotated Rectangle Tool is selected on the OCR Training Interface toolbar. On the image, draw a region of interest (ROI) around the characters.
    Tip  When you draw the ROI, exclude the barcode, but ensure that the region you draw is large enough to encompass possible locations of characters in the other images you analyze.

  6. OCR segments objects in the ROI, displaying them in blue and drawing character bounding rectangles around them, according to the settings on each of the tabs at the bottom of the dialog box.

    On the Train/Read tab, Text Read displays recognized characters and the substitution character based on the character set file you are using. If you have not opened a character set file or trained any characters, Text Read displays the substitution character for each of the segmented objects in the ROI. For example, in the previous illustration, Text Read contains six substitution characters. Any object that is surrounded by a character bounding rectangle is a segmented object.

    You can specify the substitution character in the Read Options tab.

  7. Enter BACE30 in Correct String, then click Train.
  8. Use the navigation buttons to locate NIOCRExample2.tif file.

    When analyzing this image, OCR uses the ROI you drew on the NIOCRExample1.tif image. Because the barcode in the NIOCRExample2.tif file is positioned differently, the ROI encompasses part of it, and OCR segments portions of the barcode.

  9. Enable the Reject Particles Touching ROI checkbox on the Threshold tab to configure OCR to segment the characters correctly without segmenting any portion of the barcode.
  10. Set the Remove Particles (Erosions) control to 1.

    Setting this control to 1 performs one erosion on the image. An erosion decreases the size of the objects in the image by removing a layer of pixels along the boundary of the particle. Performing an erosion helps to separate the characters in the image from the background of the image.

  11. Enter B8F1E9 in Correct String, then click Train.
  12. Navigate to the NIOCRExample3.tif file.

    When analyzing this image, OCR uses the ROI you drew on the NIOCRExample1.tif file and the settings you modified to correctly segment the characters in the NIOCRExample2.tif file.

  13. Verify that the objects in the ROI are segmented correctly.

  14. Enter B7ODF4 in Correct String, then click Train.
  15. Navigate to the NIOCRExample4.tif file.

    When analyzing this image, OCR uses the ROI you drew on the NIOCRExample1.tif file and the settings you modified to correctly segment the characters in the NIOCRExample3.tif file. Because the character size and spacing in the NIOCRExample4.tif file are different from the character size and spacing in the other three files you have used to train characters, OCR incorrectly segments the two instances of the letter A.

  16. Use the Results tab to review character specifications for the two instances of the letter A.
  17. Select the Size & Spacing tab. Based on the character size statistics you viewed in the Results tab, change Max for the Bounding Rect Width. For example, in the following image, Max is set to 50 for Bounding Rect Width.

  18. Enter B85AA6 in Correct String, then click Train.
  19. Navigate to the NIOCRExample5.tif file.

    When analyzing this image, OCR uses the ROI you drew on the NIOCRExample1.tif file and the settings you modified to correctly segment the characters in the NIOCRExample4.tif file.

  20. Verify that the objects are segmented correctly.

  21. Enter B8CE72 in Correct String, then click Train.

Reviewing Character Specifications

Use the Results tab to view statistics about the characters OCR segmented in the ROI.

The following list includes descriptions of each of the columns on the Results tab.

  • Character—Lists the index of each of the characters in the current ROI.
  • Class—Lists, by index, the character value for each of the characters you trained in the current ROI.
  • Left and Top—List, by index, the x and y coordinates of the top left corner of each of the character bounding rectangles in the current ROI. The status bar in the image viewer displays the Left and Top coordinates of the character bounding rectangles as you move the mouse over the image.
  • Width and Height—List, by index, the width and height of each of the characters you trained in the current ROI.
  • Size—Lists, by index, the size of the character in pixels.
  • Class. Score(classification score)—Lists, by index, a value that indicates the degree to which the assigned character class represents the object value better than other character classes in the character set. The range of values is 0 to 1000.
  • Verif. Score (verification score)—Lists, by index, a value that indicates how closely a character matched its reference character in the character set. The range of values is 0 to 1000. A value of 1000 indicates a perfect match between an object and the reference character.
    Note  You can set a reference character for each class. When you set a reference character, OCR returns a verification score that indicates how closely the character matched the reference character for its class. Use the verification score to perform optical character verification. OCR returns a value of 0 for the verification score if you do not set a reference for a class.

Saving the Character Set File

Now that you have trained several characters, you are ready to save the character set file. Each character set file contains the current state of the OCR parameters as well as the characters you have trained.

  1. Use the navigation buttons to scroll through each of the images you used in this tutorial. Review each image and the information in Text Read to ensure the characters are trained correctly.

    Perform this step to ensure that the parameters you have set in OCR enable you to train an optimum number of characters. Because you will use this character set file in a reading procedure, you must ensure that it contains a broad spectrum of characters and parameters that configure OCR to correctly segment characters on a broad variety of images.

  2. Use the Edit Character Set File tab to review and modify the character set you trained.
  3. Select File»Save Character Set File, enter NIOCRTutorial.abc in File Name for the character set file, and click OK to save the character set.
    Note  You can use the NIOCRTutorial.abc character set file in the examples that are available with OCR.