Download Print this page

NUANCE OMNIPAGE PRO 6 - REFERENCE MANUAL FOR WINDOWS Reference Manual

For windows
Hide thumbs Also See for OMNIPAGE PRO 6 - REFERENCE MANUAL FOR WINDOWS:

Advertisement

Quick Links

OmniPage Pro
Version 6 for Windows
Reference Manual
1

Advertisement

loading
Need help?

Need help?

Do you have a question about the OMNIPAGE PRO 6 - REFERENCE MANUAL FOR WINDOWS and is the answer not in the manual?

Questions and answers

Summary of Contents for NUANCE OMNIPAGE PRO 6 - REFERENCE MANUAL FOR WINDOWS

  • Page 1 OmniPage Pro Version 6 for Windows Reference Manual...
  • Page 2 How to Use the Documentation Use this online Reference manual to find specific information about any OmniPage feature. It describes all the commands and settings, how to use True Page, how to improve performance, and how to troubleshoot common problems. This information is also available in OmniPage’s online Help system.
  • Page 3 CAERE CORPORATION 100 Cooper Court Los Gatos, California 95030-3321 European Offices: CAERE GmbH Innere Wiener Strasse 5 81667 Munich, Germany OmniPage Pro Version 6 (PC Windows version) Copyright© 1995, 1997 Caere Corporation All rights reserved. CAERE®, OmniPage®, OmniPage Professional, Image Assistant®, AnyPage, AnyFax, 3D OCR, and True Page are trademarks of Caere Corporation.
  • Page 4 Chapter 1 Installation Please read this section carefully! It includes: • What’s in the Package • System Requirements • Saving a User Dictionary Before Installation • Setting up a Windows Swap File • Installing the Software • Starting OmniPage Pro What’s in the Package Your OmniPage Pro 6.0 package includes: •...
  • Page 5 Saving a User Dictionary Before Installation To view all 24 bits of color (millions of colors) in 24-bit color images, you need a 24-bit video card. • A compatible scanner if you plan to scan documents. See the list of supported scanners in the Release Notes. Install your scanner and test it according to the manufacturer’s instructions before using it with OmniPage.
  • Page 6 Setting up a Windows Swap File The Edit User Dictionary dialog box appears. Click Import..The Import Text File dialog box appears. Select the user dictionary you saved as a text file and click OK. The information in the old user dictionary is added to the new user dictionary.
  • Page 7 Installing the Software This dialog displays the location, size, and type of swap file. The swap file should be at least 8192KB. Click the Change button to expand the dialog box. Select a new drive in the Drive list if you want to locate the swap file some place other than the default drive.
  • Page 8 Installing the Software Click Continue to start installation or type your desired location and then click Continue. A dialog box warns that all executable files will be deleted from your current omnipro directory if you have one. • Click Continue to continue. •...
  • Page 9 Installing the Software You can install a scanner driver anytime after Scan Manager installation if you click No. See the next section for instructions. Click Yes to install a scanner driver now. The Scan Manager Installation dialog box appears. Locate and select your scanner in the List of Scanners list box. Click Install.
  • Page 10 Installing the Software Click Close when you are done. Restart Windows. You cannot use the Direct Input feature until you restart Windows. See Chapter 5, Direct Input, for information. An OmniPage Pro and a Scan Manager icon are added to the Caere Applications program group.
  • Page 11 Starting OmniPage Pro Insert the Scan Manager disk when prompted. The dialog box expands to show a list of available scanner drivers. Select a scanner in the List of Scanners list box. Click Install. The scanner appears in the Installed Scanners list box. Select the scanner in the Installed Scanners list box that you want to be the default scanner.
  • Page 12 Starting OmniPage Pro The Product Registration dialog appears the first time you launch OmniPage. See the next section for instructions on how to register OmniPage. A scanner message may appear when you close this dialog. See “Scanner Message on Launch” on page 227. Registering OmniPage You can use OmniPage for 25 sessions without registering it.
  • Page 13 Starting OmniPage Pro The OmniPage Window The OmniPage window and AutoOCR™ toolbar appear after the Registration dialog box closes. The AutoOCR Toolbar Process buttons Shortcut command buttons Status text Refer to Chapter 2, Tutorials, for an overview of OmniPage tools and recognition techniques.
  • Page 14 Chapter 2 Tutorials This chapter contains eight tutorials. The tutorials take you through practical exercises for everyday documents such as multi-column pages and spreadsheets. They also cover more advanced concepts such as how to use manual zoning and using deferred page recognition to maximize efficiency.
  • Page 15 Launch OmniPage Tutorial 1 — Introduction to OmniPage This tutorial gives you a brief introduction to OmniPage. It contains the following sections: • Launch OmniPage • What is Optical Character Recognition (OCR)? • The OCR Process • Scan the Quick Scan Page Sample •...
  • Page 16 What is Optical Character Recognition (OCR)? What is Optical Character Recognition (OCR)? Optical character recognition (OCR) is the process of converting an image file to editable text or graphics. An image is an electronic picture of text and/or graphics. You acquire an image in two ways: •...
  • Page 17 Scan the Quick Scan Page Sample Setting recognition zones • Select Auto Zones in the Zone button’s drop-down list to have OmniPage define the page areas to be recognized. • Select Manual Zones to draw the zones yourself. Performing OCR on the zoned page areas •...
  • Page 18 Scan the Quick Scan Page Sample • OmniPage makes three recognition passes over the page: cyan, light blue, and dark blue. Each of these three stages is discussed in more detail in later tutorials. OmniPage opens the recognized page in a maximized text window.
  • Page 19 Settings Panel Overview You can retype the highlighted word if necessary while the Verification window is still open. This is a quick way to edit text without using the spell checker. Click anywhere in the text window to close the Verification window.
  • Page 20 Settings Panel Overview You can also click with the right mouse button on the Zone and OCR process buttons when they are active to open the Settings Panel to the corresponding settings. These two buttons are active after a document has been loaded or scanned.
  • Page 21 Scanning With the Default Settings Tutorial 2 — Basic Text Recognition This tutorial takes you through basic scanning, zoning, and OCR exercises with OmniPage. It contains the following exercises: • Scanning With the Default Settings • Change a Document’s Fonts During OCR •...
  • Page 22 Scanning With the Default Settings Auto Brightness with AnyPage/HP AccuPage 2 is the default. (HP stands for Hewlett-Packard.) This setting works well with most types of pages. The default is Manual Brightness if you have a black-and-white scanner. Click the Zones icon. The default setting is Multiple Columns.
  • Page 23 Scanning With the Default Settings OmniPage scans the page and opens the image in the zone window. Click the Zone button. OmniPage determines column flow and automatically draws zones. This shows how text and graphics will be ordered during OCR. Numbered zones indicate recognition order.
  • Page 24 Scanning With the Default Settings OmniPage performs three OCR passes over the document: a cyan pass for initial recognition; a blue pass as text is analyzed and corrected; and, a dark blue pass for final recognition. The Character window displays characters as OCR takes place. The Character Window The recognized document opens in a new maximized text window.
  • Page 25 Scanning With the Default Settings If the text is not ordered correctly, you may have misaligned the page in your scanner. Realign the page and try scanning again. Choose Tile Vertical or Tile Horizontal in the Windows menu. The text and zone windows tile for easy viewing. Compare the recognized document in the text window to the scanned image in the zone window.
  • Page 26 Scanning With the Default Settings The Check Recognition window appears. It displays the image and text of any questionable or unrecognizable word. Correct any errors in the text. If the word is misspelled: • Correct the spelling in the Change To edit box and click Change. OmniPage may list one or more suggestions in the Change To drop-down list.
  • Page 27 Scanning With the Default Settings Save the Document You will save the document as a Caere Document (a special OmniPage format), reopen it, and save it as a word-processing file. Save as a Caere Document Click the Save As... button or choose Save As... in the File menu. The Save As dialog box opens.
  • Page 28 Scanning With the Default Settings Reopen the Document Choose Open Document... in the File menu. The Open dialog box appears. Select Caere files[*.MET] in the List Files of Type drop-down list if it is not selected already. Locate and open the file multi.met. The text window opens maximized.
  • Page 29 Change a Document’s Fonts During OCR Type a new name for the file in the File Name text box if you like. Click OK. Leave the document open for the next exercise. Change a Document’s Fonts During OCR In the previous exercise, OmniPage retained font formatting but mapped the fonts to ones preselected in the Fonts settings panel.
  • Page 30 Change a Document’s Fonts During OCR Click the Fonts icon and observe the settings. • The default Serif Proportional setting is Times New Roman. (A Serifs seriffed font has short lines, or serifs, on the ends of the strokes of a letter.) The body text in the True Page sample is already Times New Roman and so would not change during OCR.
  • Page 31 Change a Document’s Fonts During OCR Font and paragraph formatting are retained but page layout is not. Text is displayed in one column with the graphic at the end. The fonts match the selections in the Fonts settings panel. Arial becomes Helvetica Times New Roman becomes Century Schoolbook...
  • Page 32 Ignore All Formatting Ignore All Formatting You may decide you do not need any formatting at all, just the recognized text itself. You will use the Ignore All Formatting OCR option to strip away font and paragraph formatting during recognition and assign one font and point size to the recognized text.
  • Page 33 True Page Recognition Click the OCR button. Click Yes in the dialog box that asks if you want to replace the text. OmniPage re-recognizes the page and displays the recognized text in the text window. All text is now 10-point Arial. Formatting has been discarded and all text is Arial 10-point (or whichever font and point size you chose).
  • Page 34 True Page Recognition Click with your right mouse button on the OCR button to open the Settings Panel to the OCR settings. Select True Page - Retain All Page Formatting. Use this option when you want to duplicate page layout as closely as possible.
  • Page 35 True Page Recognition Working With Frames Because Multiple Columns was the default zoning method, True Page automatically creates frames around recognized text and graphic zones to preserve a side-by-side column structure. Frames You can resize frames and move them around to modify your document’s page layout.
  • Page 36 Deselect Retain Graphics Place your cursor inside a text zone so that it turns into a four- way arrow. Moving the frame Hold down the mouse button and drag the zone to any location on the page. Choose Select Recognized Zones in the Edit menu again. All frames are deselected.
  • Page 37 Save a Settings File Click with your right mouse button on the OCR button to open the Settings Panel to the OCR settings. Deselect Retain Graphics. Select True Page - Retain All Page Formatting if it is not selected already. Click Close.
  • Page 38 Save a Settings File Save the Settings Click the Settings Panel button or choose Settings Panel... in the Settings menu. Select the following options in each settings panel: • Scanner: Manual Brightness • Zones: One Zone • OCR: Ignore All Formatting Note that none of these settings is a default setting.
  • Page 39 Load an Image File The Load Settings dialog box appears. Locate and select the file test.set. Click OK. The Settings Panel settings change to match the settings file you just loaded. Click the Scanner, Zone, and OCR icons to verify that their respective settings have changed.
  • Page 40 Load an Image File Click the AUTO button. The Load Image dialog box appears. Select TIFF files[*.TIF] in the List Files of Type drop-down list. Locate and select the test.tif file. The file was placed in the c:\omnipro\data directory during installation.
  • Page 41 Load an Image File Click the AUTO button. The Load Image dialog box appears. Select a file format in the List Files of Type drop-down list. Select a file to load and click Add. The file appears in the Selected Files list box. Repeat for each file you want to load.
  • Page 42 Export a Graphic Tutorial 3 — Working With Graphics OmniPage can export a scanned page or pages as one or more graphic- format files. It can also find individual graphic zones on each page and export them as graphic-format files. This tutorial contains a tutorial on how to Export a Graphic You will use the True Page Sample in this tutorial.
  • Page 43 Export a Graphic Click Close. Scan the Page Place the True Page sample in your scanner making sure it is aligned correctly. Click the AUTO button. OmniPage scans, zones, and recognizes the document. Export the Graphic Zone Choose Export Image... in the File menu. The Export Image dialog box opens.
  • Page 44 Overcoming Recognition Difficulties Click OK. The recognized graphic zone on the page is exported in the file format you chose. You can open it in most image-editing programs. Tutorial 4 — Evaluating a Page A complex page may require more attention on your part for accurate OCR to take place.
  • Page 45 Overcoming Recognition Difficulties Click the Settings Panel button in the toolbar. Select the following settings: • Scanner: Auto Brightness with AnyPage/HP AccuPage2 This is a good setting for shaded backgrounds. • If you have a black-and-white scanner, set Manual Brightness to the center of the slider.
  • Page 46 Overcoming Recognition Difficulties OmniPage scans, zones, and recognizes the page. The recognized page opens in the text window. Unwanted graphic element Caere logo recognized as text Your results may be different than those pictured above depending on your scanner. The line above the newsletter title may be not recognized at all, for example.
  • Page 47 Overcoming Recognition Difficulties Scroll down the page to the A Little Background article. Dark shading recognized as a graphic zone Unwanted text element OmniPage had trouble with this section because the extremely dark background could be interpreted as part of a graphic. The lack of distinct contrast also interfered with the program’s ability to distinguish characters.
  • Page 48 When to Use Manual Zoning These fixes require you to use manual zoning. You will recognize portions of the page, specify zone contents for the logo and text, and learn other manual zoning techniques in the course of this tutorial. When to Use Manual Zoning Use manual zoning in the following circumstances: •...
  • Page 49 Manual Zones — Recognize Portions of a Page The zones disappear and the automatic zone tools change to manual zone tools. Use the arrow buttons Select zone contents. to rotate the image. Zoom tool: zoom your view of the page in and out. Draw Zones tool: draw zones for recognition.
  • Page 50 Manual Zones — Specify Zone Contents 10 Draw zones around the three side-by-side columns, avoiding the lines, as illustrated in the picture. This is where zooming in your view of the page is especially of help. Do not draw a zone around the A Little Background article. You will zone this separately in another exercise.
  • Page 51 Manual Zoning — Reorder Text OmniPage re-recognizes the document according to the zones you drew. The logo now appears in the text window as a graphic. Caere logo recognized as graphic Leave the document open for the next exercise. Manual Zoning — Reorder Text After you scan a document, you may decide to reorder the text before or after recognition to save yourself time editing the document.
  • Page 52 Manual Zoning — Reorder Text Click the left column. It is now labeled 3 and will be recognized third. Click with your right mouse button on the OCR button to open the Settings Panel to the OCR options. Select Retain Font and Paragraph Formatting. This setting allows you to see the reordered text in the text window.
  • Page 53 Scanning and the Brightness Setting The text window opens to display the newly reordered text. Scanning and the Brightness Setting The scanner brightness setting you choose in the Scanner settings panel can strongly affect page recognition. 3D OCR with HP AccuPage 2/AnyPage and Auto Brightness with AnyPage/HP AccuPage 2 are both good scanner settings to choose for shaded areas.
  • Page 54 Scanning and the Brightness Setting Click with your right mouse button on the Image button to open the Settings Panel to the Scanner options. Select Manual Brightness. The number range that appears in the text box on the right depends on what kind of scanner you have. Drag the slider box to the left on the slider (toward Lighten).
  • Page 55 Scanning and the Brightness Setting Look at the image to see how the brightness setting affected scanning. Brightness setting too dark Brightness setting too light Brightness setting just right • Set the brightness to a lighter setting if your image still has shading behind the article as does the left image, above, and rescan.
  • Page 56 Scanning and the Brightness Setting Zone and Recognize the Article Click the Draw Zones tool. Draw a zone around just the A Little Background article. Click the OCR button. Observe the Character window during OCR. Shaded background dots would hinder recognition Brightness Brightness setting Brightness setting...
  • Page 57 Scanning and the Brightness Setting OmniPage displays the text in the text window after OCR. Scroll down the page to locate the article in the text window. You should find few, if any, recognition errors once you have scanned with the proper brightness setting. Continue to adjust the scanner brightness setting in the Settings Panel and rescan the page if there are numerous errors.
  • Page 58 Recognize a Memo With a Table Tutorial 5 — Scanning a Single Column or Table So far in these tutorials, you have scanned two different multiple-column documents with various settings. You may need to scan spreadsheets, tables, or memos. Although these also have multiple columns, these documents usually rely on tabs to maintain formatting.
  • Page 59 Recognize a Memo With a Table This option is best for preserving tabbing or columns of characters such as are on the table on the sample page. Click the OCR icon. Select Retain Font and Paragraph Formatting. This option preserves the formatting of the page but not its layout as True Page would.
  • Page 60 Create a Zone Contents File Note that OmniPage preserves the table and other even spacing with tabs. The red tildes on the page mean OmniPage did not recognize some of the specialized characters in the document. You can double-click each tilde in the text window to open the Verification window and see the original image.
  • Page 61 Create a Zone Contents File You need to enter all the numbers in the table. You must also enter any characters. If you just entered numbers, OmniPage would not be able to recognize the letters with this zone contents file. Type the characters 0123456789ABCDTL- (hyphen).
  • Page 62 Create a Zone Contents File Draw zones around the sections of the page as shown in this picture: Alphanumeric Alphanumeric Alphanumeric Finance Alphanumeric Click in the zone around the table to make it active. Select Finance in the Zone Contents drop-down list. Click the OCR button.
  • Page 63 Create a Zone Template Create a Zone Template The Single Column or Table Page sample is a fictional example of a weekly report — one that always has similar information in the same place on the page. This is known as a standardized form. You can create a zone template to use on standardized form instead of drawing the same zones each time.
  • Page 64 Create a Zone Template Click Yes in the dialog box that asks if you want to replace the current zones. Click the Zone button. OmniPage draws zones on the page image according to the zone template you just saved. Click each zone and observe the setting in the Zone Contents drop-down list to verify that your zone template is correct.
  • Page 65 Scan a Document With Special Characters Tutorial 6 — Train OCR OmniPage automatically recognizes characters commonly found in most documents. Other documents may contain characters OmniPage has not yet learned to recognize such as copyright and trademark symbols, and mathematical symbols such as pi ( . You can train OmniPage to recognize special characters and create a training file to use on similar documents.
  • Page 66 Scan a Document With Special Characters Click AUTO. OmniPage scans, zones, and recognizes the document. and then displays the recognized text in the text window. View the Recognized Text Compare the text in the text window to the page you scanned. OmniPage replaced unrecognizable characters with red tildes.
  • Page 67 Train OCR to Recognize Special Characters Train OCR to Recognize Special Characters You will train OmniPage to recognize several characters in this exercise. See “Scan a Document With Special Characters” on page 65 if you did not leave the document open. Re-recognize the Document Select Train OCR in the OCR button drop-down list.
  • Page 68 Train OCR to Recognize Special Characters The Specify Character dialog box appears. It displays the symbol as it appeared in the scanned document. Locate the registered trademark symbol in the Extended ANSI list box on the left. Double-click the symbol. It appears in the Character edit box.
  • Page 69 Train OCR to Recognize Special Characters Specify other characters in the same way such as the lowercase ü and é, and the copyright (©) symbol. Characters OmniPage believes it identified correctly are listed alphabetically below the suspect characters. Check for common errors, such as a 5 being recognized as the letter S.
  • Page 70 Scan Multiple Pages and Defer OCR Tutorial 7 — Deferring OCR Compared to the time it takes to scan and zone a page, OCR can be time- consuming. You might find it more efficient to scan a stack of pages (especially if you have an ADF) or load multiple images all at once, zone them, and then defer recognition to a later time.
  • Page 71 Finish Current Document If you do not have an ADF, place the Quick Scan Page sample in your scanner. Click the AUTO button. The first page in the stack is scanned and zoned, and then the next page. If you do not have an ADF, place the True Page sample in your scanner now and click AUTO.
  • Page 72 Finish Current Document The Save As dialog box appears. Select a file type in the Save Files as Type drop-down list. Microsoft Word for Windows is selected in the example above. Select a location for your saved file. Select Create one file per page. This save option creates two separate files after OCR.
  • Page 73 Finish Deferred Documents Finish Deferred Documents You may decide to defer OCR, close the open documents or OmniPage, and finish processing later. You must save the open documents in order to reopen and recognize them. Scan and Save the Pages Follow the steps in the section “Scan Multiple Pages and Defer OCR”...
  • Page 74 Finish Deferred Documents The Finish Deferred Documents Dialog Box Choose Finish Deferred Documents... in the Process menu. The Finish Deferred Documents dialog box appears. The file you saved to the input directory appears in the Files to Finish list box. This is where OmniPage looks by default for deferred files.
  • Page 75 Finish Deferred Documents (A Network button also appears in this dialog box if you use Windows For Workgroup with network enabled.) • Locate and select the output directory as the location to save the file if it is not already selected by default. •...
  • Page 76 Register an Application You will use the Quick Scan Page sample in this tutorial. Register an Application You must register an application before using it to initiate Direct Input. Once an application is registered with Direct Input, the Direct Input... command appears in its File menu above the Exit command.
  • Page 77 Launch Direct Input The application moves into the Registered Applications list box. Select and move as many applications as you like. Click OK when you are done. OmniPage immediately places the Direct Input... command in the File menu of the registered application(s). Choose Exit in the File menu.
  • Page 78 Direct Input Mode Some applications, such as Word and Notepad, allow you to launch multiple copies of the application. The Direct Input... command only appears in the first copy of the application launched. OmniPage launches in Direct Input mode and the Direct Input window appears.
  • Page 79 Direct Input Mode Click the Direct Input icon to observe the settings. Click Use Defaults. Click Yes if a dialog box appears to confirm your choice. The default output formatting option is Retain Font and Paragraph Formatting. This setting retains font types and styles, and paragraph order and formatting in recognized text.
  • Page 80 Direct Input Mode Tutorials 80...
  • Page 81 Chapter 3 Commands and Settings This chapter explains how to use the OmniPage commands and settings, all of which are located within eight menus and a toolbar. This chapter contains the following sections: • The Toolbar • The File Menu •...
  • Page 82 The Toolbar The Toolbar The toolbar has four process buttons and several shortcut command buttons. Process buttons Shortcut command buttons AUTO Image Zone button button button button Use the toolbar to access the three basic steps of the optical character recognition (OCR) process: Acquiring a page image to recognize.
  • Page 83 The Toolbar The toolbar is different in Direct Input mode. See Chapter 5, Direct Input, for detailed information. Each button is described next in the order it appears on the toolbar. AUTO Button The AUTO button is the first button in the toolbar. It performs the same operations as the Auto command in the Process menu.
  • Page 84 The Toolbar Scan Image Select Scan Image to scan a page in your scanner. This command only appears in the drop-down list if you have installed the Scan Manager. Select your default scanner in the Scan Manager before scanning (see “Scan Manager Installation”...
  • Page 85 The Toolbar Auto Zones Select Auto Zones in the drop-down list to have OmniPage automatically draw and order zones for text recognition on the current page image. OmniPage uses the selected Zones option in the Settings Panel: Multiple Columns, Single Column or Table, or One Zone. For more information about each of these options, see “Zones Options”...
  • Page 86 The Toolbar OCR Button The OCR button is the fourth button in the toolbar. This button contains the same commands, Perform OCR, Defer OCR, and Train OCR, that are in the cascading menu under the Process Settings command in the Process menu.
  • Page 87 The Toolbar Train OCR Select Train OCR to create a character training file (*.trn) that assists OmniPage during text recognition and allows better recognition of special characters. A character training file is a set of pre-recognized text characters that OmniPage compares with the characters in the page image during recognition.
  • Page 88 The File Menu The File Menu The File menu lets you manage OmniPage file operations. File menu commands include: • Open Document • Close Document • Mail (MAPI mail systems only) • Save • Save As • Export Image • Revert to Saved •...
  • Page 89 The File Menu Opening a Caere Document or Image File Choose Open Document... in the File menu. The Open Document dialog box appears. Select the type of file to open in the List Files of Type drop-down list. Files of that type appear in the File Name list box. Double-click a file or select it and click OK.
  • Page 90 The File Menu Close Document Choose Close Document to stop working on a document but leave OmniPage running. If the current document has not been saved or has changed since the last save, a prompt appears asking if you want to save the document before closing.
  • Page 91 The File Menu Saving a File Choose Save As... in the File menu. The Save As dialog box appears. Select a file type in the Save Files as Type drop-down list. See “Supported Output File Formats” on page 238 for a list of supported file formats.
  • Page 92 The File Menu Caere Document (*.met) OmniPage creates a Caere Document the first time you scan a document or open an image. A Caere Document can have up to 256 pages. Each page includes the original image and can vary to include zones and recognized text.
  • Page 93 The File Menu For example, if you want to scan several stacks of pages at once, insert blank pages to separate each batch. OmniPage saves the first stack as one file, detects a blank page, saves the next stack as one file, detects a blank page, and so on.
  • Page 94 The File Menu Exporting an Image File You can export an image file after a document has been scanned or loaded. Choose Export Image... in the File menu. The Export Image dialog box appears. Select a file type in the Save Files as Type drop-down list. See “Supported Output File Formats”...
  • Page 95 The File Menu Save Options You can select one of two Save Options. • Select Save Current Page Only if you want OmniPage to save only the current page image as a file. • Select Save All Pages if you want OmniPage to create a separate file for each page in your document and automatically increment file names starting with 001.
  • Page 96 The File Menu Graphic File Name The way you match the Save Options and Image Options affects the length of the file name. The file name form is used as an example of how a file would be named in the following combinations of save and image options. •...
  • Page 97 The File Menu Get Accuracy Info Choose Get Accuracy Info... for a statistical report on recognition accuracy. Accuracy information is valuable for comparing the effect of different settings on recognition accuracy. For example, if you are not sure about which Scanner settings panel options to choose, you can compare the recognition accuracy percentages of different options.
  • Page 98 The File Menu Number of Spelling Replacements This is the number of words that were corrected automatically by the Language Analyst. These words are blue in the recognized document. Recognition Time This is the time it took to break the page down into text and graphics and perform recognition.
  • Page 99 The File Menu Choose Save Settings... in the File menu. The Save Settings dialog box appears. Caere Settings (*.set) is the only selection in the Save Files of Type drop-down list. Type a name for your file in the File Name edit box. Select a location for your file.
  • Page 100 The File Menu Caere Settings (*.set) is the only selection in the Save Files of Type drop-down list. Locate and select the settings file to open. Click OK. The settings are loaded immediately into the Settings Panel. To save a settings file, choose Save Settings... in the File menu. See “Deleting *.set, *.trn, *.ud, *.zcn, and *.zon Files”...
  • Page 101 The File Menu Select a location for your file. Omnipro\data is the default directory. OmniPage only looks here for zone template files so you must save the file here. Click OK. Saved templates appear in the drop-down list under the Zone button. You can also choose Use Template...
  • Page 102 The File Menu Saving Recognized Text as a Novell Envoy Runtime File Choose Publish to Envoy... in the File menu. The Print dialog box appears. The Envoy driver is selected automatically. Click OK. The Save Envoy Runtime As dialog box appears. Envoy Runtime Files (*.exe) is the selection in the Save Files as Type drop-down list.
  • Page 103 The File Menu Opening a Novell Envoy Runtime File An Envoy file (*.exe) is self-opening: it includes a scaled-down version of the Novell Envoy application. You can open the file on the same kind of computer it was created on (Macintosh or PC) even without Envoy or OmniPage installed.
  • Page 104 The Edit Menu The Edit Menu The Edit menu lets you revise text in the text window and work with images in the zone window. Edit menu commands include: • Cut • Copy • Paste • Clear/Clear All Zones • Select All in Page •...
  • Page 105 The Edit Menu Text copied to the Clipboard may be pasted anywhere (except into a graphic) in a document. The text remains on the Clipboard until new text is cut or copied. Paste Choose Paste to place cut or copied text from the Clipboard into the recognized document.
  • Page 106 The Edit Menu Check Recognition Choose Check Recognition... to check for errors in a recognized document. This command is also available as a button in the toolbar. OmniPage uses the currently selected main and user dictionaries and language(s) to check recognition. The Check Recognition operation stops •...
  • Page 107 The Edit Menu Click this to add a word to the current User Dictionary and go on to find the next error. Other occurrences of the word in the current document will be checked if the word was a “suspect” (green). OmniPage accepts future occurrences of the word when you use the same user dictionary for future documents.
  • Page 108 The Edit Menu Saving the original page images slows down processing slightly and uses more disk space. Verifying an Image Double-click any word in the text window or select a word and choose Verify Image in the Edit menu. The Verification window appears showing the original image of the selected area of text.
  • Page 109 The Edit Menu • Select Match Whole Word Only to find only the words that exactly match the length of the search word. Compound words that contain the search word within them will not be found. For example, if you select this option and search for light, OmniPage will not flag lightbulb.
  • Page 110 The Edit Menu Delete Recognized Zone Choose Delete Recognized Zone to delete a selected text or graphic zone in a recognized page. Each zone is surrounded by a frame. This command is available in the Edit menu when the text window is active. Delete a Text Zone Place your cursor in the zone to delete.
  • Page 111 The Edit Menu Resize a Zone Hold your cursor over a zone handle until it turns into a two-way arrow. Hold down the mouse button and drag to resize the zone. Move a Zone Place your cursor inside the zone until it turns into a four-way arrow.
  • Page 112 The Edit Menu Go to Page Choose Go To Page... to open the Go To Page dialog box and switch to another page in the current document. Both the text window and zone window display the selected page. In the Go To Page dialog box, you can select either First Page or Last Page, or type a page number in the Page Number edit box.
  • Page 113 The Format Menu The Format Menu The Format menu lets you format character and paragraph attributes while you edit a recognized document in the text window. Format menu commands include: • Font • Paragraph Files saved in ASCII or ANSI format do not retain any formatting other than spaces and carriage returns.
  • Page 114 The Format Menu Font Select a font in the Font list box. You can type a letter in the Font edit box to skip to the fonts beginning with that letter. Font Style Select Regular to return selected characters to an unformatted state. Bold, italic, and underlined characteristics disappear.
  • Page 115 The Format Menu Choose Paragraph... in the Format menu. The Paragraph dialog box appears. Select an option in the Spacing drop-down list. • Select Single for single-spaced lines. • Select Double for double-spaced lines. • Select Triple for triple-spaced lines. Select an option in the Alignment drop-down list.
  • Page 116 The Format Menu Click the appropriate Tab-setting button (left, center, right, or decimal). Left-aligned tab button Decimal-aligned tab button Center-aligned tab button Right-aligned tab button Click the area in the upper half of the ruler where you want to place the tab stop. Repeat steps 1 through 3 to continue setting tabs.
  • Page 117 The Process Menu The Process Menu The Process menu lets you perform each step of the OCR process as well as other OCR-related functions. Process menu commands include: • Auto/Stop • Scan Image/Load Image • Auto Zones/Manual Zones/Use Template • Perform OCR/Defer OCR/Train OCR •...
  • Page 118 The Process Menu How OmniPage scans, zones, and recognizes a page depends on the options chosen in the Settings Panel. When a document is already open to an unfinished page image, you can choose Auto to finish processing that page according to the selected processing commands.
  • Page 119 The Process Menu Scanning Multiple Pages You can scan pages one at a time to create a multiple-page Caere Document in OmniPage or scan a stack of pages if you have an automatic document feeder (ADF). A Caere Document can have 256 pages. Click the Image button or Auto after a document is scanned to scan or load another page and add it to the current open document.
  • Page 120 The Process Menu Loading an Image File Choose Load Image in the Process Settings cascading menu. The Load Image dialog box appears. Select a file type in the List Files of Type drop-down list. Files of that type appear under the File Name list box. Locate and select the file to load.
  • Page 121 The Process Menu Choose Auto in the Process Settings cascading menu. The Load Image dialog box appears. See the previous section for information on the dialog box buttons. Select a file type in the List Files of Type drop-down list. Files of that type appear under the File Name list box.
  • Page 122 The Process Menu • The Load Image dialog box appears if the open document has multiple pages and is not open to the last page. Choose an option and click OK. Auto Zones Choose Auto Zones to have OmniPage automatically draw and order zones on a scanned or loaded page image in the zone window.
  • Page 123 The Process Menu OmniPage uses the selected Zones option in the Settings Panel when zoning: Multiple Columns, Single Column or Table, or One Zone. For more information about each of these options, see “Zones Options” on page 163. You are prompted to delete the current zones before auto zoning if a page already has zones.
  • Page 124 The Process Menu Manual Zones Choose Manual Zones to draw, order, and specify zones manually on a scanned or loaded page image in the zone window. Use the arrow buttons Select zone contents. to rotate the image. Zoom tool: zoom your view of the page in and out.
  • Page 125 The Process Menu When you have enclosed the desired area, release the mouse button. Continue using the mouse to draw zones in the page image until you have finished. You can draw up to 64 separate zones of which 26 can be graphic zones.
  • Page 126 The Process Menu Moving Zones Click the Draw Zones tool. Click a zone to select it. Handles appear on the zone. Place your cursor inside a zone so that it changes to a four-way arrow. Hold down the mouse button, and drag the zone wherever you want it.
  • Page 127 The Process Menu Use Template Choose Use Template... to create zones by applying a previously created zone template file (*.zon). Use the Save Zone Template... command in the File menu to create a zone template. See “Save Zone Template” on page 100 for information.
  • Page 128 The Process Menu Use your right mouse button to click the OCR button when it is available and automatically open the Settings Panel to OCR options. If there are no zones on the page when you select Perform OCR, OmniPage automatically creates zones according to the selected Zone command.
  • Page 129 The Process Menu Train OCR Choose Train OCR to create a character training file (*.trn) that assists OmniPage during text recognition and allows better recognition of special characters. This command performs the same function as the Train OCR command under the OCR button. A character training file is a set of pre-recognized text characters that you create.
  • Page 130 The Process Menu Characters OmniPage believes it identified correctly are listed alphabetically below the suspect characters. You do not have to train OmniPage to recognize every character in the dialog box. OmniPage’s word-comparison ability enables it to identify most words correctly, even if a character within the word is unrecognizable.
  • Page 131 The Process Menu The specified character now appears below the suspect character. Gray suspect character Specified character The symbol turns gray to indicate that a character has been specified for it. OmniPage will identify this character correctly when it recognizes your document. Specify as many characters as necessary.
  • Page 132 The Process Menu The character set you just trained is appended to the selected file. A dialog box asks if you want to make this the current training file and recognize your document. • Click Yes to recognize your page image and apply the training file you just created.
  • Page 133 The Process Menu The currently selected Process Settings commands determine what image, zone, and OCR operations can be performed. OmniPage also uses the selected commands for automatic processing. Selected image command Selected zone command Selected OCR command For more information on each Process Settings command, refer to its entry in this section starting on page 117.
  • Page 134 The Process Menu You can perform recognition and save the document later, or perform recognition and save the document automatically. Perform OCR and Save Later Deselect Save Automatically if it is selected. Click OK. OmniPage recognizes your document and displays the text in the text window.
  • Page 135 The Process Menu Finish Deferred Documents Choose Finish Deferred Documents... to finish OCR of your deferred documents. This opens the Finish Deferred Documents dialog box, described in the next section of this topic. Choose Defer OCR in the Process menu or the OCR button drop-down list to defer page recognition during automatic processing.
  • Page 136 The Process Menu Finishing Deferred Documents Choose Finish Deferred Documents... in the Process menu. The Finish Deferred Documents dialog box appears. Change save options The deferred file is saved The deferred file is for a selected file. with this name after OCR. saved in this file format after OCR.
  • Page 137 The Process Menu which files are saved automatically after OCR. You can select another location if you wish. Click OK to return to the Finish Deferred Documents dialog box. • Click Add... to add additional files to the Files to Finish list box. (You cannot add a file if there are already 100 files in the Files to Finish list box.) See “Supported Input File Formats”...
  • Page 138 The Process Menu Click Add to select deferred files for recognition. This could be a file outside your input directory, or a file added to the input directory when the Auto Add New Files in Input Directory option is deselected. Click Add..
  • Page 139 The Process Menu can either click Add... to add one or more new files or Set Input Directory... to add all files in the input directory instead. Delete Deferred File After OCR Select this checkbox to delete all the original deferred files permanently from your hard drive after recognition.
  • Page 140 The Process Menu Save As Select a file in the Files To Finish list box and click Save As... to open the Save As dialog box. You can determine how the selected file will be saved. Select a file type and change the file name if you wish. The new file name appears in the Files to Finish list box under the Save As header.
  • Page 141 The Process Menu Set Output Directory You can choose the default location for finished files. Click Set Output Directory..The Set Output Directory dialog box appears. The dialog box opens to the omnipro\output directory by default. You can select another directory to which OmniPage will save your finished documents after OCR if desired.
  • Page 142 The Settings Menu The Settings Menu The Settings menu lets you modify system-wide settings. Settings menu commands include: • Settings Panel • Select Languages • Register Applications • Edit Training File • Edit Zone Contents File • Edit User Dictionary Settings Panel Choose Settings Panel...
  • Page 143 The Settings Menu Select Languages Choose Select Languages... to select one or more language character sets for text recognition. OmniPage can recognize additional characters (such as circumflexes, umlauts, etc.) unique to a particular language; thirteen languages are available. You may select more than one language at a time, but for faster recognition, use only the minimum number of languages that are necessary.
  • Page 144 The Settings Menu To register an application: Choose Register Applications in the Settings menu. The Register Applications dialog box appears. Applications that support Direct Input appear in the Unregistered Applications list box. Applications appear in the left list box whether or not they are installed on your system. An application must be installed, however, to use it to initiate Direct Input.
  • Page 145 The Settings Menu Edit Training File Choose Edit Training File... to edit an existing character training file. A character training file (*.trn) is a set of pre-recognized text characters that OmniPage compares to the characters in the page image during recognition.
  • Page 146 The Settings Menu The Train Character dialog box shows the existing characters in the training file, including the original images and the associated characters. You can edit characters, delete them, or append this file to another one. • Select a character in the dialog box and click Specify to change how it will be recognized by OmniPage.
  • Page 147 The Settings Menu Creating a New Zone Contents File Choose Edit Zone Contents File... in the Settings menu. The Select File dialog box appears. Click New. The Edit Zone Content File dialog box appears. The Zone Contents edit box contains the 94-character (typical keyboard) ASCII character set.
  • Page 148 The Settings Menu Editing a Zone Contents File Choose Edit Zone Contents File... in the Settings menu. The Select File dialog box appears. Select a file to edit and click OK. The Edit Zone Content File dialog box appears. The Zone Contents edit box contains characters you specified previously.
  • Page 149 The Settings Menu Edit User Dictionary Choose Edit User Dictionary... to create a new user dictionary (*.ud) or edit an existing one. The user.ud is the default user dictionary supplied with OmniPage. Creating or Editing a User Dictionary Choose Edit User Dictionary... in the Settings menu. The Select File dialog box appears.
  • Page 150 The Settings Menu • Click Delete All to delete all words from the dictionary. • Click Import... to add words from another application to your user dictionary. For example, you may want to add technical terms from a particular file. The Import Text File dialog box prompts you to enter the file name and directory of the file you want to import.
  • Page 151 The Register Menu The Register Menu There are no commands in this menu. It only appears in the menu bar if you have not registered your copy of OmniPage. Do not confuse this with the Register Applications... command in the Settings menu. Click the Register menu to open the OmniPage Product Registration dialog box.
  • Page 152 The Window Menu The Window Menu The Window menu provides options for viewing the OmniPage screen and your document. Window menu commands include: • Tile Horizontal • Tile Vertical • Cascade • Arrange Icons • Hide/Show Toolbar • Hide/Show Status Bar •...
  • Page 153 The Window Menu Hide/Show Status Bar Choose Hide Status Bar to hide the status bar located at the bottom of the window. Choose Show Status Bar to view the status bar again. Hide/Show Ruler Choose Hide Ruler to hide the text window ruler. Choose Show Ruler to view the ruler again.
  • Page 154 The Help Menu The Help Menu The Help menu provides access to the OmniPage online Help program and information about OmniPage. Help menu commands include: • Contents • Procedures • Using Help • About Contents Choose Contents for a list of the topics available in the OmniPage Help program.
  • Page 155 Chapter 4 The Settings Panel This chapter explains the way the settings you select in the Settings Panel affect OmniPage document processing. The following topics are included: • Settings Panel Overview • Scanner Options • Zones Options • OCR Options •...
  • Page 156 Settings Panel Overview Settings Panel Overview To open the Settings Panel, choose Settings Panel... in the Settings menu or click the Settings Panel button in the toolbar. Click each icon to access its settings. Each one of the seven icons in the scroll box on the left side of the Settings Panel represents a different set of options.
  • Page 157 Settings Panel Overview The Settings Panel changes to reflect the options of the icon that you click. You can select options and then click Close or leave the Settings panel open as a floating window. OmniPage retains the last-selected options until you select new ones.
  • Page 158 Scanner Options Scanner Options Click the Scanner icon in the Settings Panel to select options that control the way your scanner scans a page. Use your right mouse button to click the Image button in the toolbar and automatically open the Settings Panel to Scanner options. Page Select Page options to describe the page size and orientation of the document you want to recognize.
  • Page 159 Scanner Options Select Landscape for a horizontally oriented page. Select Flipped to automatically rotate a portrait page 180 degrees during the scan. Select Flipscape to automatically rotate a landscape page 180 degrees during the scan. Flipped and Flipscape options are useful if you are scanning pages in a book and need to turn the book upside down or sideways for certain pages.
  • Page 160 Scanner Options It then processes pages 6, 4, and 2. The resulting file consists of pages 1, 2, 3, 4, 5, and 6 in the correct order. You can divide a large batch of pages into several stacks for processing. OmniPage processes one side of the first stack and then the reverse side.
  • Page 161 Scanner Options When to Use Use 3D OCR and AnyPage/HP AccuPage 2 whenever you want the best possible recognition results. This setting is especially useful when you scan: • poor-quality pages • pages with very small type • pages with text on colored or shaded backgrounds When to Use Another Setting This setting is slower than the other two settings.
  • Page 162 Scanner Options right arrow on the slide. The number of settings available depends on the scanner you use. The number shows the Use the slider box or brightness setting. slider arrows to adjust Numbers vary depending brightness. on your scanner. •...
  • Page 163 Zones Options Zones Options Click the Zones icon in the Settings Panel to select the zoning method that determines the flow of text during recognition. Use your right mouse button to click the Zone button in the toolbar when it is active and automatically open the Settings Panel to Zones options. You can create zones manually, automatically, or with a template, but OmniPage still uses the selected zoning method in the Settings Panel to determine text flow within each zone.
  • Page 164 Zones Options Select Retain Graphics in the OCR settings panel when you select Multiple Columns to preserve your graphics. Otherwise, graphics are discarded. Multiple Columns and OCR Settings Select Multiple Columns along with True Page in the OCR settings panel if you want to reproduce page layout as closely as possible and retain font and paragraph formatting.
  • Page 165 Zones Options Single Column or Table and OCR Settings Select Single Column or Table along with True Page in the OCR settings panel to retain font and paragraph formatting, and separate distinct vertical spaces of five or more with tabs and distinct horizontal spaces with line returns.
  • Page 166 OCR Options distinct vertical spaces of five or more with tabs but ignore extra horizontal spacing. Select One Zone along with Ignore All Formatting in the OCR settings panel to ignore font and paragraph formatting but separate distinct vertical spaces of five or more with tabs. OCR Options Click the OCR icon in the Settings Panel to select input and output options that assist OmniPage during recognition and determine the format of the...
  • Page 167 OCR Options • Select Normal if the image you are recognizing has conventionally printed text characters. • Select Dot Matrix if the image you are recognizing has characters printed in draft mode by a 9-pin dot-matrix printer. Do not select Dot Matrix for pages printed in near-letter-quality mode or printed by a 24-pin dot-matrix printer.
  • Page 168 OCR Options Be sure to select the appropriate main and user dictionaries in the Spelling settings panel when you use the Language Analyst. Select the appropriate language(s) in the Select Languages dialog box (choose Select Languages... in the Settings menu). The Language Analyst shuts itself off automatically when it detects that the dictionary information is not improving recognition results.
  • Page 169 OCR Options Retain Graphics Select Retain Graphics if you want OmniPage to retain original graphics such as photographs or drawings in the recognized document. Select this to retain graphics How to Retain Graphics In addition to selecting Retain Graphics in the OCR settings panel, you must: •...
  • Page 170 OCR Options Files saved in ASCII or ANSI format do not retain any formatting other than spaces and carriage returns. See “Zones Options” on page 163 for information on how Zones and OCR settings panel options work together during OCR. True Page –...
  • Page 171 OCR Options What True Page Reproduces True Page attempts to reproduce the following during page recognition: • Relative text column positioning • Relative graphic positioning (select Retain Graphics in the OCR settings panel) • Margins • Tabs • Line Spacing •...
  • Page 172 OCR Options Retain Font and Paragraph Formatting Select this to retain font characteristics and paragraph formatting in the recognized document. This option does not retain the original page layout. For example, a multiple-column newspaper article would be formatted as one column in order of recognition instead as a page with side-by-side columns.
  • Page 173 OCR Options Ignore All Formatting Select this to use one universal font and font size in the recognized document. Choose a font and font size for recognized text in the Ignored Font Formats section of the Fonts settings panel. This option does not retain the original page layout. If you selected Retain Graphics, any graphics in your document would appear at the bottom of the page.
  • Page 174 Fonts Options Fonts Options Click the Fonts icon in the Settings Panel to select options for retaining or ignoring the original font styles. The drop-down menus display all the TrueType fonts installed on your system. If you select True Page or Retain Font and Paragraph Formatting , select fonts to map to the...
  • Page 175 Spelling Options • Sans-Serif and Monospaced Character spacing is the same for each character; letter strokes do not have finishing lines. OCR-A is an example of this font type. OmniPage detects font styles during recognition and formats recognized characters according to the font selected for that style. For example, if you selected Times New Roman in the Serif Proportional drop-down list, characters printed in Palatino (a serif proportional font used in this manual) would be reformatted as Times New Roman during recognition.
  • Page 176 Spelling Options Dictionaries You can select one main dictionary and one user (personal) dictionary. The Language Analyst uses the selected dictionaries during recognition to check questionable words against dictionary entries. The dictionaries are also used when checking recognition. Always select the appropriate dictionaries for your document.
  • Page 177 Direct Input Options Ignore Abbreviations OmniPage will ignore a capitalized letter followed by three or fewer lowercase letters and a period (for example, Mrs., Dr., etc.). Deselect Ignore Abbreviations if you want the abbreviations in your user dictionary to be checked or if you want to add abbreviations to your user dictionary.
  • Page 178 Direct Input Options Enable Direct Mode Select this option to enable Direct Input in registered applications. The Direct Input... command is placed in the File menu of an open, registered application when Direct Input mode is enabled. Deselect this option to remove the Direct Input...
  • Page 179 Preferences Options Ignore All Formatting Select this to use one universal font and font size in the recognized document. Choose a font and font size for recognized text in the Ignored Font Formats section of the Fonts settings panel. This option does not retain the original page layout. If you selected Retain Graphics, any graphics in your document appear at the bottom of the page.
  • Page 180 Preferences Options To save the page image, make sure Save Page Images in Caere Document is selected before you scan or load a page image. There are advantages to saving an image along with its recognized text as a Caere Document. You can always reopen a Caere Document and: •...
  • Page 181 Chapter 5 Direct Input This chapter explains how to initiate OCR processing from an open application and paste recognized text directly from OmniPage into that application. OmniPage has a special Direct Input mode that can be initiated from any compatible application. Most commands and settings in Direct Input mode are the same as those found in the regular OmniPage mode.
  • Page 182 Registering Applications Registering Applications An application must be registered before it can be used to access Direct Input mode. A variety of applications are compatible with Direct Input. Registering an Application Launch OmniPage if it is not already open. Choose Register Applications... in the Settings menu. This command is enabled only when Enable Direct Input is selected in the Direct Input settings panel.
  • Page 183 Using Direct Input from Another Application Select and move as many applications as you like. Click OK when you are done. OmniPage places the Direct Input... command in the File menu of the registered application(s) above the Exit command. The dialog box displays a list of compatible applications only. The list of available applications is static;...
  • Page 184 Using Direct Input from Another Application Accessing OmniPage in Direct Input Mode Align your document(s) in your scanner or ADF if you plan to scan. Open or switch to any registered application. Microsoft Word is used in this example. Choose Direct Input... in the program’s File menu. •...
  • Page 185 Using Direct Input from Another Application Select the appropriate process button settings and Settings Panel options for your document if you did not selected automatic processing to begin immediately. Click AUTO or the Image button to begin processing. OmniPage processes the page according to your settings. After processing, the Direct Input window closes and OmniPage returns to the state it was in before Direct Input mode began: open, iconized, or closed.
  • Page 186 Using Direct Input from Another Application • If you scan or load an image, recognize it, and then click the Image button, OmniPage automatically appends the newly loaded or scanned image to the currently open document. The page number button is disabled in Direct Input mode so OmniPage appends the new image to the last processed page.
  • Page 187 Selecting Settings for Direct Input Active Process Buttons You can click any button that is active (raised instead of flat) to start processing at any point not yet done For example, if you chose Load Image, Auto Zones, Perform OCR, and Auto Paste under the process buttons and then clicked the Image button.
  • Page 188 Selecting Settings for Direct Input You can continue processing if you are already in Direct Input mode at the time. • If you select Click the AUTO button on launch, Direct Input processing begins as soon as the Direct Input window opens, the same as if you had clicked the AUTO button.
  • Page 189 Selecting Settings for Direct Input Preferences Options • Save Page Images in Caere Document Direct Input mode does not save page images. Use the regular OmniPage mode for this. • Prompt Before Deleting Pages You cannot delete pages in Direct Input mode. See Chapter 4, The Settings Panel, for information on other Settings Panel options.
  • Page 190 Selecting Settings for Direct Input loaded or scanned. OmniPage stops processing for you to draw zones if you select Manual Zones. OCR Button There is no drop-down list under this button. The only setting is Perform OCR. Click this button to perform OCR on a zoned page. OCR begins automatically if you click the AUTO button and Auto Zones is also selected in the Zone button’s drop-down list.
  • Page 191 Selecting Settings for Direct Input This section describes commands found only in Direct Input mode. See Chapter 3, Commands and Settings, for a description of all other menu commands. Process Menu See “The Process Menu” on page 117 for a description of other commands. Auto Paste Choose Auto Paste to paste recognized text directly into your open registered application after OCR.
  • Page 192 Chapter 6 Using True Page Ordinarily, you use OCR to turn a printed page into editable electronic text. Regardless of the original page layout, the resulting text appears in a single column. You then save this text in a file format suitable for your target application (such as a word processor) where you apply your own page layout and formatting.
  • Page 193 How True Page Works If you select this as your zoning method, True Page uses frames to preserve the positioning of text and graphic elements on the page. Frames are formatting boxes that are used for designing page layout; they can contain text or graphics. For example, every paragraph in a document may be contained within a separate frame.
  • Page 194 How True Page Works True Page With Frames The best way for True Page to preserve side-by-side columns is to create frames around areas of text and graphics. Therefore, frames are automatically used when you select the Multiple Columns zoning method with True Page.
  • Page 195 How True Page Works Factors That Influence Frame Formatting Consider these factors before you choose True Page formatting for multiple-column pages: • Pages with rectangularly shaped text and graphic areas and non- stylized fonts are the best candidates for True Page formatting. Highly stylized formatting is difficult to replicate especially when you save recognized documents to other file formats.
  • Page 196 How True Page Works True Page Without Frames Frames are unnecessary to preserve single-column formats. Tabs (or spaces) are used to preserve side-by-side formatting in tables. Therefore, frames are not used when you select the Single Column or Table zoning method with True Page.
  • Page 197 Page Types Page Types The quality and layout of your original page influence the way True Page reproduces its appearance. These factors influence True Page output: • Accurate text recognition is essential to duplicating the original page’s appearance successfully. For accurate text recognition, the print on the page should be reasonably clean, crisp, and free of notes, lines, or doodles.
  • Page 198 Page Types Multiple-column pages can be challenging to replicate because they often use stylized fonts, a tight column structure, and non-rectangular paragraph and graphic layouts. To get the best results, you may want to experiment with different settings. Example — Multiple-Column Page Select Multiple Columns as the If you draw zones...
  • Page 199: Table Of Contents

    Page Types Example — Single-Column Page Select Single Column or Table as the zoning method. Select TrueType fonts in the Fonts settings panel that are the closest match to the fonts on your page. To preserve a graphic, draw zones manually and identify the graphic with the Graphic zone...
  • Page 200: To Preserve A

    Page Types The Single Column or Table zoning method retains graphics on a page only if you select Retain Graphics in the OCR settings panel, draw zones manually, and identify any graphics with the Graphic zone contents file. Example — Table To preserve a graphic, draw zones manually...
  • Page 201 Page Types Since combination pages can be complex, experiment with different settings to get the best results. For example, try drawing zones manually if auto zoning does not work well to retain all elements on a page. Example — Combination Page If auto zoning misses a graphic, draw zones...
  • Page 202 Settings Panel Options for True Page Settings Panel Options for True Page The Settings Panel options you select influence the way True Page reproduces a document’s appearance. Since each page varies, experiment with different settings to get the best True Page results. Accurate text recognition is the first step in successfully duplicating your page’s appearance.
  • Page 203 Settings Panel Options for True Page The two other formatting options are easier to edit because they add fewer formatting attributes than True Page. The Retain Font and Paragraph Formatting option tries to replicate a document’s font attributes and paragraph structure but does not reproduce the other formatting elements such as positioning of graphics and columns.
  • Page 204 Settings Panel Options for True Page Fonts Options The font choices that you make in the Settings Panel play a big role in achieving the appearance of your original page. For True Page output, OmniPage determines the style of fonts on a page and maps them to the TrueType fonts you select in the Fonts settings panel.
  • Page 205 Target Applications for True Page For example, if your document has text formatted with Times Roman and Helvetica fonts, select Times Roman as the Serif Proportional font and Helvetica as the Sans Serif Proportional font. True Type fonts that are not an exact match for fonts in your document may not take up the same amount of space especially if the original fonts are stylized.
  • Page 206 Target Applications for True Page The following table provides a brief overview of working with True Page documents in some recommended applications. Application How to View a How an Editing Frames — Getting True Page Application Refers Started Document to Frames Microsoft Choose Page Frame...
  • Page 207 Target Applications for True Page Example — Word for Windows 6.0a The following document was recognized with True Page output and saved in a Word for Windows 6.0a file format. This page is Frames are normally displayed in Page invisible. Click a frame Layout view so that to see its border.
  • Page 208 Chapter 7 Improving Performance This chapter gives specific and general advice on how to speed up recognition, make recognition more accurate, and streamline your OCR workflow. The following topics are included: • Improving Speed • Improving Accuracy • Legal Documents •...
  • Page 209 Improving Speed Improving Speed Although OmniPage is designed to run automatically, the automatic features can take longer to work. You may find you need to trade off better recognition for faster recognition or vice versa. These factors most affect processing speed in OmniPage: •...
  • Page 210 Improving Speed Scanner Settings for Faster Recognition • Page: Choose the proper page orientation. • Options: Select Manual Brightness. Use the Manual Brightness control in the Settings Panel Scanner options if you are scanning high-quality printed documents with crisp, black text printed on a white background. Although faster, Manual Brightness may not provide the best accuracy for all documents.
  • Page 211 Improving Accuracy Improving Accuracy Typeset, high-quality printed pages return the best recognition accuracy. With lesser-quality pages, text-recognition accuracy will be poorer. These factors most affect text-recognition accuracy: • Settings Panel Options • Line Art • Document Quality • Scanning Angle •...
  • Page 212 Improving Accuracy • Manual Brightness: Select this setting if you are scanning high- quality printed documents with crisp, black text printed on a white background. This is the only option available if you have a black- and-white scanner. If text characters on your document appear to be thick and overlapping, adjust the brightness slide towards Lighten.
  • Page 213 Improving Accuracy This figure shows how well-formed characters appear in the Character window. No special brightness adjustment is needed. This figure shows how thin, broken characters appear in the Character window. Drag the brightness control toward Darken and rescan. This figure shows how thick, run-together characters appear in the Character window.
  • Page 214 Improving Accuracy Document Quality OmniPage recognizes characters in almost any font from 6 to 72 points in size. However, keep the following in mind when using OmniPage: • The print should be reasonably clean and crisp. Characters must be distinct: separated from each other and not blotched together or overlapping.
  • Page 215 Legal Documents Legal Documents This section lists some tips to keep in mind when scanning legal documents. General Tips Keep these general tips in mind when scanning legal documents: • Select Legal size in the Scanner settings panel if the document is printed on legal-size (14 inches in length) paper.
  • Page 216 Spreadsheets and Tables Spreadsheets and Tables Keep these general tips in mind when scanning spreadsheets, charts, tables, single-column pages, or memos with page-wide text and tabs: • Select Landscape as the orientation in the Scanner options section of the Settings Panel if the document is presented in landscape view. •...
  • Page 217 Foreign-Language and Multilingual Documents OmniPage can still perform OCR on a page if the right dictionary is not available because it recognizes the characters of the language(s) selected in the Select Languages dialog box. Deselect the option Use Language Analyst in the OCR settings panel when the right dictionary is not available.
  • Page 218 Foreign-Language and Multilingual Documents Select your French dictionary in the Main Dictionary drop-down list in the Spelling settings panel. Click the OCR icon. Deselect Use Language Analyst if it is selected. Set other options as appropriate and click AUTO. 10 Click the Check Recognition button to open the Check Recognition dialog box and correct any mistakes when recognition is done.
  • Page 219 Scanning Large Jobs Save the document as a word-processing file. Use your word-processing program commands to cut and paste the text as necessary.. Scanning Large Jobs You can place a large stack of documents in your scanner if you have an automatic document feeder (ADF).
  • Page 220 Scanning Large Jobs Preparing Documents for the ADF Suppose you want to scan 25 pages. Decide how you will save the scanned documents before you fill the ADF. This affects how you group them in the document feeder. If you choose a word-processing format, you have three options for saving the scanned pages: as a single file, as one file per page, or as one file for every blank separator that OmniPage locates.
  • Page 221 Scanning Large Jobs • Create one file per page: enter a name with up to five characters. OmniPage adds three numbers to each file name to make it unique. For example, if you typed file in the File Name box, the first page is saved as file001, the second page as file002, and so on.
  • Page 222 Chapter 8 Technical Information Use this chapter if you have trouble getting OmniPage or your scanner to run properly, if many errors appear in your recognized text, or if you want to speed up performance. See also Chapter 7, Improving Performance. The following topics are covered in this chapter: •...
  • Page 223 Installation • Use the software that came with your scanner to verify that it works properly before using it with OmniPage. • Fix problems that occur with Windows before using OmniPage again. • Run virus-checking software regularly. • Defragment your hard disk occasionally (see your DOS documentation).
  • Page 224 Installation Install OmniPage according to the instructions in Chapter 1, Installation. Reopen the Windows system.ini file and change the line Shell=progman.exe back to Shell=ndw.exe. This will make the Norton Desktop appear when you start Windows. Save the file. Restart Windows. Speeding Up Installation OmniPage installation is faster if you use a disk-caching program.
  • Page 225 Scanning An error message from DOS means that the floppy disk is damaged. Try to copy a file from the OmniPage disk to your hard disk if you do not receive an error message and DOS displays the disk directory. DOS may be unable to copy files from the disk even if it can read the directory.
  • Page 226 Scanning Memory-related Problem Low memory can cause scanning problems. Try closing open windows and applications to free up memory. Try modifying your system.ini or your config.sys file if freeing up memory does not help. Modifying the system.ini File Locate and open your system.ini file. Its default location is c:\windows.
  • Page 227 Scanning Add a line below it that reads device=drive:\path\filename in which drive, path, and filename represent the path to your scanner driver. This is a typical config.sys entry for an HP ScanJet scanner driver: DEVICE=C:\DESKSCAN\SJII.SYS Choose Save and then Exit in the File menu. Scanner Message on Launch You may receive the following message in a dialog box the first time you launch OmniPage after installing or changing your default scanner:...
  • Page 228 Scanning This is a typical config.sys entry for an HP ScanJet IIp scanner driver: DEVICE=C:\DESKSCAN\SJII.SYS Check your scanner documentation to find the device driver name and version for your scanner. Compare this with the information in the config.sys or autoexec.bat file. Make sure the two are the same. For information on how to edit your config.sys or autoexec.bat file, see your Windows or DOS documentation.
  • Page 229 Scanning TWAIN Scanners To select a supported TWAIN scanner: Exit from OmniPage if it is open. Open the Scan Manager in the Caere Applications program group if you have installed the scanner application already. Otherwise, install the Scan Manager according to the instructions in Chapter 1.
  • Page 230 • 256-level grayscale • 300 dpi resolution Using Microtek Scanners with TWAIN The Manual Brightness setting in Scanner settings panel is not adjustable when using the TWAIN interface with Microtek scanners. Deinstalling the Scan Manager You may want to deinstall the Scan Manager application if you are not using a scanner with OmniPage.
  • Page 231 Re-recognize the document. Document Image Problem in Zone Window You may see vertical lines running through the document image or no image in the zone window and then receive garbage or nothing after OCR. The memory address for your scanner interface card is probably interfering with the memory address for your video display adaptor.
  • Page 232 True Page and WordPerfect for Windows 6.0 You can save a maximum of 12 pages of True Page output to WordPerfect for Windows 6.0. If your document exceeds 12 pages, you can do one of the following: • Choose Delete Current Page in the Edit menu to delete excess pages. •...
  • Page 233 Save the recognized text in a separate file. Combine the resulting spreadsheet files in your spreadsheet application using cut and paste commands. Start with the file that has the cell format that you want. Cut and pasted material should conform to this format. Incorrect Font Size Output The TrueType font size output in your recognized document may be incorrect if your screen resolution is set to large-size fonts.
  • Page 234 Operation Do not remove a device driver unless you are aware of its function and know it may be removed safely. Hard disks often require special device drivers that should not be removed. Video displays that require special device drivers may need to be reconfigured instead of removed.
  • Page 235 Operation Low Memory OmniPage runs poorly in a low memory situation. Optimize your system to reduce the possibility. Here are a few tips: • Increase the amount of physical RAM. More memory optimizes OCR performance: 8MB is the minimum required, but 12MB is recommended. •...
  • Page 236 Operation you have when you start up your computer. Type mem at the prompt. DOS displays your total memory and other information. For information on optimizing your system and application performance, see your Windows User’s Guide. Low Disk Space Check your temp directory for unnecessary files if you seem to be running out of disk space too quickly.
  • Page 237 Operation • Zone contents files (*.zcn) • Zone template files (*.zon) The default location for all these files is the omnipro\data directory. The training files, user dictionary files, and zone contents files must be located here, but a settings file or a zone template file can be saved anywhere on your hard drive.
  • Page 238 Supported Output File Formats Supported Output File Formats OmniPage can save files in the following file formats. Save a file in Caere Document format (*.met) if you want to be able to reopen and work with it in OmniPage. Application Conversion Filter File Caere (*.met) CAERE01...
  • Page 239 Supported Input File Formats Ventura Publisher (MS Word) W4W44T.DLL Windows Write 3.0 W4W43T.DLL WordPerfect 5.0 W4W07T.DLL WordPerfect 5.1 W4W07T.DLL WordPerfect 6.0 W4W48T.DLL WordStar for Windows 1.x, 2.0 W4W04T.DLL Xywrite III Plus, IV W4W17T.DLL This ASCII or ANSI format inserts hard returns to preserve the original line and paragraph breaks and attempts to reproduce white space, including margins, indents, and blank lines.
  • Page 240 Error Messages • Fax File Formats OmniPage supports fax files saved in to DCX, PCX or TIFF format. Many fax boards can receive or convert files in these three formats. Refer to your fax documentation for more information. TIFF files must be 200, 300, or 400 dpi. 300 dpi is recommended. TIFFs are stored and displayed as 300 dpi line art.
  • Page 241 Error Messages OmniPage to a different location on your hard disk or renamed directories in the path where OmniPage is located. If this is not the case, reinstall OmniPage. Error adding word to the user dictionary. Try freeing up hard disk space.
  • Page 242 Error Messages down list under the Zone process button and try drawing fewer, smaller areas on the page. Sometimes restarting Windows and relaunching OmniPage can clear up problems. If the problem persists, try reinstalling OmniPage. Error finding zones on the page. The page may be too complex. Try using Manual Zones to recognize smaller areas of the page.
  • Page 243 Error Messages Error loading the training module. The file may be corrupt or moved. Try reinstalling OmniPage. An internal program file may have been damaged or is no longer in the OmniPage directory. Either restore the file (rtrain.dll) or reinstall OmniPage.
  • Page 244 Error Messages Error saving file to disk. Try freeing up hard disk space. Close open windows and applications to free up memory. You may be out of disk space or low on memory. Delete unnecessary files to free up disk space. See “Low Memory” on page 235. If freeing up disk space and memory does not work, the file you are saving may be corrupt.
  • Page 245 Error Messages Problem with scanner. Make sure the Scan Manager was installed. Check your scanner connections, make sure the scanner is not in use by another application, and verify that you selected the correct default scanner. Try the solutions listed in the message first. Make sure the scanner is turned on and connected securely.
  • Page 246 Error Messages In the second case, either select Save Entire Page to a File, or create manual zones on the page image and specify them as graphic zones and select the Save Each Graphic Zone to a File option in the Save As dialog box. There is no recognized text to save in this document.
  • Page 247 Error Messages This is not a recognized image file format. Check that the file type appears in the List Files of Type drop-down list. You tried to load an unsupported image file.The file type should appear in the List Files of Type drop-down list in the Load Image and Open Document dialog boxes when you load or open an image.
  • Page 248 Caere Product Support Caere Product Support Product support is available if you need help. However, please check the index or table of contents first to find the information you need in this manual — you may be able to save yourself a phone call. Dial-up Services Product support and information are available through the following services.‘...
  • Page 249 Chapter 9 Understanding OCR Caere’s OmniPage products represent the leading edge in page- and text- recognition technology. OmniPage Professional and OmniPage can recognize virtually any scanned page, separate text from graphics, and convert almost any printed material to text files for your favorite word- processor, spreadsheet, or database applications.
  • Page 250 Basic OmniPage OCR Technologies Prior to 1988, text recognition consisted of a method called matrix- matching. The program compared a bitmap’s shape to a library of character shapes. A letter was identified when the shapes matched. Matrix matching, however, only worked for a small number of fonts and sizes.
  • Page 251 Basic OmniPage OCR Technologies challenged his development team to create an OCR system that could read anything that may be in a typical office: even a magazine page with its mixture of fonts, text, and graphics. Page Analysis An 8 1/2” by 11" document at 300 dots per inch creates a 1.2 MB file. AnyFont first puts that entire file into RAM and looks at the complete page.
  • Page 252 Basic OmniPage OCR Technologies each character can be infinitely tuned and re-tuned as new fonts or new problems come up. If there is a problem with “c”s and “e”s, additional tuning of those two experts is done until that one problem is resolved. To recognize a foreign language that has an “ä”...
  • Page 253 Basic OmniPage OCR Technologies AnyPage AnyPage technology is Caere’s proprietary dynamic auto-thresholding technology. It is in many ways similar to the HP AccuPage 2™ technology available with Hewlett-Packard scanners. AnyPage works with grayscale images. This dynamic thresholding technology requires a great deal of interaction between the scanner and the OCR software to vary the definition of “background”...
  • Page 254 Basic OmniPage OCR Technologies compute the probability of a specific character identification. The compound neural system compiles all the available evidence before making a character determination. AnyFax The AnyFax™ portion of the neural system is specially designed to work with low resolution fax images. It uses an image enhancement process that disconnects joined characters and reduces the jagged character edges created during the faxing process.
  • Page 255 Basic OmniPage OCR Technologies 3D OCR 3D OCR is a revolutionary new technology that is available to OmniPage Professional users. 3D OCR is designed to provide precision accuracy for the most difficult-to-read documents such as multi-generational photocopies and pages with very small type. It does this by recognizing gray parts of images as well as bi-level, black and white parts.