Setting up your scanner with OmniPage Pro How to start the program Registering your software New features in OmniPage Pro 11 N T R O D U C T I O N What is optical character recognition OmniPage Pro’s OCR capabilities...
Page 4
Processing from other applications How to set up Direct OCR How to use Direct OCR How to use OmniPage Pro 11 with your PaperPort software Processing documents with Schedule OCR Defining the source of page images O N T E N T S...
Page 5
Input from image files Input from scanner Scanning with an ADF Scanning long documents without an ADF Describing the layout of the document Manual zoning Working with zones Zone properties Table grids in the image Using zone templates R O O F I N G A N D E D I T I N G Proofreading ocr results Checking recognized text against original User dictionaries...
Page 6
E C H N I C A L I N F O R M A T I O N Troubleshooting Solutions to try first Testing OmniPage Pro Low memory problems Low disk space problems Supported file types File types for opening and saving images File types for saving recognition results Saving to PDF OCR problems...
Page 7
Welcome Welcome to OmniPage Pro 11, and thank you for using our software! The following documentation has been provided to help you get started and give you an overview of the program. This User’s Manual This manual introduces you to using OmniPage Pro 11. It includes installation and setup instructions, a description of the program’s...
Page 8
We also assume you are familiar with your scanner and its supporting software, and that the scanner is installed and working correctly before it is setup with OmniPage Pro 11. Please refer to the scanner’s own documentation as necessary. The following conventions are used in this manual: Bold Introduces new terms and presents sub-headings.
Page 9
ETTING ONLINE HELP In addition to using this manual, you can use OmniPage Pro’s online Help to learn about features, settings, and procedures. Online Help is available after you install OmniPage Pro. Online HTML Help Open OmniPage Pro’s online Help at its top level by choosing OmniPage Pro Help Topics at the top of the Help menu.
Page 10
Tech Notes ScanSoft’s web site at www.scansoft.com contains Tech Notes on commonly reported issues using OmniPage Pro 11. Web pages may also offer assistance on the installation process and troubleshooting. Glossary This manual does not include a glossary. The online Help has a comprehensive glossary, with its own alphabetical index and a table of contents.
Pro 11. It presents the following topics: System requirements Installing OmniPage Pro Setting up your scanner with OmniPage Pro How to start the program Registering your software New features in OmniPage Pro 11 A N U A L M N I A G E...
YSTEM REQUIREMENTS You need the following minimum system requirements to install and run OmniPage Pro 11: A computer with a Pentium or higher processor Microsoft Windows 95, Windows 98, Windows ME, Windows 2000, or Windows NT 4.0 32MB of memory (RAM), 64MB recommended...
The program interface language is used for displays such as menu items, dialog boxes, warning messages and so on. You can change the interface language later from within OmniPage Pro 11, but your choice at installation time determines which Text-to-Speech system will be installed with the program.
Page 14
OmniPage Pro. See also the section Reading text aloud in chapter 5. ETTING UP YOUR SCANNER WITH All files needed for scanner setup and support are copied automatically during the program’s installation. Before using OmniPage Pro 11 for scanning, your scanner should be correctly installed and tested for correct functionality.
Page 15
Next. You have successfully configured your scanner to work with OmniPage Pro 11! Click on Finish. E T T I N G U P Y O U R S C A N N E R W I T H...
‘Test and configure current scanning source’ at the start of the process. OW TO START THE PROGRAM To start OmniPage Pro 11 do one of the following: Click Start in the Windows taskbar and choose Programs ScanSoft OmniPage Pro 11.0 OmniPage Pro 11.0.
ScanSoft’s registration Wizard runs at the end of installation. We provide an easy electronic form that can be completed in less than five minutes. You are asked to enter OmniPage Pro 11’s serial number, which appears on a sticker on the CD sleeve.
Page 18
OmniPage Pro 11 the most accurate OmniPage ever. Improved page layout - OmniPage Pro 11 will allow you to retain formatting that is true to the original, even on pages with non- gridded tables, headers and footers and dropped capitals.
This chapter introduces you to the solution: optical character recognition (OCR). It describes how OmniPage Pro 11 uses OCR technology to transform text from scanned pages or image files into editable text for use in your favorite computer applications.
(pixels) that together form character shapes. These present a picture of the text on a page. During OCR, OmniPage Pro 11 analyzes the character shapes in an image and defines solutions to produce editable text. After OCR, you can save the resulting text to a variety of word-processing, desktop publishing or spreadsheet applications.
Documents in OmniPage Pro OmniPage Pro 11 handles documents one at a time. When you acquire your first image (from scanner or from file) a new document is started. Further acquired images are added to the same document, until you save and close it.
Page 22
RO DESKTOP OmniPage Pro’s desktop has a title bar and a menu bar along the top and a status bar along the bottom. It has three main working areas, separated by splitters: the Document Manager, the Original Image area and the Text Editor.
The OmniPage Toolbox lets you control processing. It can have three states, depending which of the three tab buttons on the left is clicked. In the picture, we display its appearance for Manual OCR. We show the program with a three-page document. Page one is the current page, which has been recognized and proofed.
The Image toolbar The Image toolbar contains buttons that allow you to zoom in or out on the current image or to rotate it. They also allow you work with zones and table dividers on the page. This is described in detail in chapter 3, Tutorial: Processing documents.
The OmniPage Toolbox This Toolbox lets you drive the processing. By default it is located along the top of the OmniPage Pro desktop, just above the working areas. It can be floated and also be docked along the bottom of the desktop. It has three tabs on the left: AutoOCR™, Manual OCR and OCR Wizard.
ANAGING DOCUMENTS The Document Manager is situated on the left of the OmniPage Pro desktop. It has two tabbed panels: Thumbnail view and Detail view. Click a tab to see its view. Both views summarize the pages in the document and are synchronized: the current and selected pages remain the same when you switch views.
Detail view This facility is new to OmniPage Pro 11. It provides an overview of your document with a table. Each row represents one page. Columns present statistical or status information for each page, and (where appropriate) document totals. The picture below shows the default columns on the left and four columns which a user has specified.
Customizing columns in Detail view You can specify which columns of information you want to see in Detail view. Click Customize Details... in the View menu for the following dialog box: This item is highlighted. Highlight an item and use Click a checkbox these arrows to to select the item.
Page 29
Closing a document Choose Close in the File menu to close a document. You are prompted to save your document if you have not saved it or you have modified it since the last save. See the next section on saving the document as an OmniPage Pro Document (*.opd).
Page 30
Why save to OPD You do not have to save your documents to the OPD file type. You would typically do this for the following reasons: You cannot finish working with the document in the current session. You want to pass the document to other users who have OmniPage Pro.
ETTINGS The Options dialog box is the central location for OmniPage Pro settings. It has seven panels. Context-sensitive help provides information on each setting. In overview, the settings panels are: Use this to specify recognition language(s), a user dictionary, a reject character, an OCR method (optimize for speed or accuracy) and font matching.
Page 32
Process Use this to define where new images should be placed in the document and set other preferences governing the behavior of the processing. You can change the interface language here. Proofing Use this to define whether proofreading should begin automatically after recognition.
Tutorial: Processing documents This chapter describes different ways you can process a document and also provides information on key parts of this processing. Quick Start Guide Using the OCR Wizard Automatic processing Manual processing Automatic processing with manual finishing From other applications (Direct OCR, PaperPort) At a later time (Schedule OCR) The detailed topics are: Defining the source of page images...
You will process the document automatically and save the recognition results to a file. You will proof the document but will not edit it inside OmniPage Pro 11’s Text Editor. U T O R I A L R O C E S S I N G D O C U M E N T S...
Page 35
Place the document correctly in your scanner. Check the three tab buttons to the left of the Specifies that you want OmniPage Pro 11 to process the OmniPro Toolbox. The AutoOCR button should document automatically according to the given settings.
Page 36
Here is an overview of the processing methods you can use. You will find step-by-step guidance for each of them in the following pages. Using the OCR Wizard The OCR Wizard guides you through the selection of settings and commands by asking you questions. It then launches automatic processing. This is a good way to get started if you are new to OmniPage Pro.
OCR W ROCESSING DOCUMENTS USING THE IZARD The OCR Wizard takes you through six settings panels, guiding you to make settings for your document and then launching automatic processing. Context-sensitive help is available for all Wizard panels. The OCR Wizard can run only when there is no document open in OmniPage Pro.
Page 38
3. The third panel (shown below) lets you define recognition languages and decide OCR method. Languages with dictionary support have the icon 4. The fourth panel lets you define the formatting level to be applied to your document for display and export. See chapter 4 for more information.
Page 39
7. If you requested proofing and the text contains suspect words, the OCR Proofreader™ dialog box will appear. When proofing is finished or closed, recognition results either go directly to the Clipboard, or the Save As dialog box appears so you can specify file export settings.
ROCESSING DOCUMENTS AUTOMATICALLY Automatic processing provides an efficient way of handling documents, especially larger ones. First you select all settings needed, then you can use the AutoOCR™ toolbar in the OmniPage Toolbox to process a new document from start to finish or to restart and finish processing on an open document.
6. Click Start or choose Start in the Process menu. Each page of the document is processed and finished one after the other. The program may perform tasks simultaneously, for instance it may start loading and recognizing a new page as you proofread the previous page. Command buttons Start: This lets you begin automatic processing on a new document.
ROCESSING DOCUMENTS MANUALLY Manual processing gives you more precise control over the way your pages are handled. You can process the document page-by-page with different settings for each page. The program also stops between each step: acquiring images, performing recognition, exporting. This lets you, for instance, draw zones manually on each page.
Page 43
6. Select a value for the Perform OCR button. You describe the layout of the incoming pages. This value has an influence if auto-zoning runs on any pages. You can also select a template to have its zones placed on the current page. For more detail see the sections Describing the layout of the document and Using zone templates.
ROCESSING A DOCUMENT AUTOMATICALLY AND FINISHING IT MANUALLY When you have a large document with only a few pages needing special attention, you do not have to manually process the whole document. You can process it automatically and view results in the Text Editor. You can determine which pages are in order, and which need different settings or some manual zoning.
ROCESSING FROM OTHER APPLICATIONS You can use the Direct OCR feature to call on the recognition services of OmniPage Pro while you work in your usual word-processor or other application. First you must establish the direct connection with the application. Then, two items in its File Menu open the door to OCR facilities.
Preferences and then selecting OmniPage Pro 11 as the OCR package. OCR settings can be specified, as with Direct OCR. Here OmniPage Pro 11 has been selected as the OCR package for MS Word 2000. Then you can drag page images from the PaperPort desktop onto the MS Word link on the PaperPort.
Page 47
ADF. Here is how to set up a job: 1. Click Schedule OCR in the Process menu or in the Windows Start menu: select Programs ScanSoft OmniPage Pro 11.0 Schedule OCR. 2. The Schedule OCR dialog box appears. Click Add Job... to get the Add Job Wizard.
EFINING THE SOURCE OF PAGE IMAGES There are two possible image sources: from image files and from a scanner. There are two main types of scanners: flatbed or sheetfed. A scanner may have a built-in or added Automatic Document Feeder (ADF), which makes it easier to scan multi-page documents.
Normally the Add button places each file at the bottom of the file list. To place a file at a different location, highlight a file in the list. The new file will be added immediately below the lowest highlighted file. Input from scanner You must have a functioning, supported scanner correctly installed with OmniPage Pro.
Brightness and contrast Good brightness and contrast settings play an important role in OCR accuracy. Set these in the Scanner panel of the Options dialog box. The diagram illustrates an optimum brightness setting. After loading an image, check its appearance. If characters are thick and touching, lighten the brightness.
You can scan double-sided documents with an ADF. A duplex scanner will manage this automatically. For non-duplex scanners, select ‘Scan double-sided pages’ in the Scanner panel of the Options dialog box. Then you can scan the document in just a few passes, with even pages grouped together and odd pages also grouped.
Page 52
Single column, no table Choose this setting if your pages contain only one column of text and no table. Business letters or pages from a book are normally like this. Choose it also for a page with words or numbers arranged in columns if you do not want these placed in a table or decolumnized or treated as separate columns.
ANUAL ZONING Zones define areas on the page to be processed. Zones are rectangular or irregular (with sides formed by vertical and horizontal lines). Zones cannot overlap. They have a zone number in the top left corner and a zone type icon top right. Click in a zone to select it. Use Shift+clicks for a multiple selection.
Subtract from zone Click this to subtract irregular parts from an existing zone or split a zone into smaller ones. You cannot move or resize existing zones when this tool is active. You cannot use this with a table type zone. Reorder zones Click this for the zone reordering tool.
Page 55
Table zone Use this to have the zone contents treated as a table. Table grids can be automatically detected, or placed manually as described in the next section. Table zones must be rectangular. The Text Editor displays the table in an editable grid. You can choose whether to export tables in grids or in columns separated by tabs.
ABLE GRIDS IN THE IMAGE After automatic processing you may see table zones placed on a page. They are denoted with a table zone icon in the top right corner of the zone. To change a zone to or from a table zone, use its shortcut menu. You can also draw a table type zone.
Remove/replace all dividers Click this tool and click inside a table zone. Its dividers will all disappear. Click again to have dividers automatically (re)detected. Divider placement usually occurs during recognition; clicking twice with this tool lets you see and edit the dividers before recognition. SING ZONE TEMPLATES A template is a set of zones, their properties and reading order, stored in a file.
Page 58
How to unload a template Select a non-template setting for layout description in the Perform OCR drop-down list. The template zones are not removed from the current or existing pages, but template zones will no longer be used for future processing.
Proofing and editing Recognition results are placed in the Text Editor. This newly developed WYSIWYG (What You See Is What You Get) editor offers the following features, detailed in this chapter: Proofing OCR results Checking recognized text against original (Verifying text) User dictionaries IntelliTrain Text Editor display and views...
ROOFREADING OCR RESULTS After a page is recognized, the recognition results appear in the Text Editor. Proofreading starts automatically if that was requested in the Proofing panel of the Options dialog box or in the OCR Wizard. You can start proofing manually any time the program is not busy. Work as follows: 1.
5. Color markers are removed from words in the Text Editor as they are proofread. You can switch to the Text Editor during proofing to make corrections there. Use the Resume button to restart proofing. Click Close to stop proofreading before the end of the document is reached.
4. Click the Close button to close the verifier window. You should proofread and verify texts before doing large-scale editing. If you cut and paste large blocks of text, the links between text and image may be disturbed. You can use OmniPage Pro’s Text-to-Speech facility to have the recognized text read aloud as another way of verifying text.
NTELLI RAIN IntelliTrain is a newly developed and automated form of training. It takes input from the corrections you make during proofing. When you make a change, it remembers the character shape involved, and your proofing change. It searches other similar character shapes in the document, especially in suspect words.
Page 64
The following shows how IntelliTrain works, using the original image. Our example involves the letters c and e. With some typefaces and scanning settings, the horizontal line in e e e e can become very thin, leading to OCR errors that IntelliTrain can repair. OmniPage Pro read this as bcnefit.
Page 65
Select this, click Click this to edit Save and type the selected in a name to training file save a new (see below). training file. Select this to Use this also to save unload a new training into a training file. loaded training file.
HE EDITOR DISPLAY AND VIEWS The editor displays recognized texts and can mark words that were suspected during recognition. Marking is done with a wavy underline; red underlines for words not found in a dictionary (this applies only to languages with dictionary support) and blue underlines for words containing suspect or reject characters.
EXT AND IMAGE EDITING This is a WYSIWYG Text Editor, providing many editing facilities. These work very similarly to those in leading word processors. Editing character attributes In all views except No Formatting view, you can change the font type, size and attributes (bold, italic, underlined) for selected text.
Graphics You can edit the contents of a selected graphic zone if you have an image editor in your computer. Click Edit Picture in the Tools menu. This will activate the image editor associated with BMP files in your Windows system, and load the graphic.
Page 69
To hear text: Use these keys: Right or left arrow. Letter, number or punctua- One character at a time, forward or back tion names are spoken. Current word Ctrl + Numpad 1 One word to the right Ctrl + right arrow * One word to the left Ctrl + left arrow * A single line...
You also have the following keyboard controls: To do this: Use this: Pause/Resume Ctrl + Numpad 5 Set speed higher Ctrl + Numpad + Set speed lower Ctrl + Numpad - Restore speed Ctrl + Numpad * It is planned to provide speech programs for the following languages, English, French, German, Italian, Portuguese and Spanish.
5 Saving and exporting Once you have acquired at least one image for a document, you can export the image(s) to file. Once you have recognized at least one page, you can export recognition results to a target application by: 1.
REPARING RECOGNITION RESULTS FOR EXPORT Text is exported to file, Clipboard or mail with the formatting level defined by the view set in the Text Editor at export time, if that is possible. However, some export file types and target applications cannot support all formatting elements.
AVING TO FILE You can save recognized pages and original images to disk in a wide variety of file types. See chapter 6 for a complete list of supported file types for saving images and recognition results. Saving original images 1.
Saving recognition results 1. Choose Save As... in the File menu, or click the Export Results button in the Manual OCR toolbar with Save as File selected in the drop-down list. 2. The Save As dialog box appears, as shown in its expanded form. Click Advanced to open the lower panel and Basic...
Note Graphics and formatting are saved in the document only if the selected file type supports them. The formatting level for export is the Editor view set at saving time. You will be warned if the formatting level is not supported by the export file type. Note If more than one export file is created, OmniPage Pro will append a numerical suffix to your file name to create unique file names.
Page 76
If you first save the document as an OmniPage Document (for instance as ), then modify it and later save it to a text file (for instance as memo.opd ), then modify it again and click Save, the recent changes are memo.txt saved to the file, not to the OPD.
ENDING A DOCUMENT AS A MAIL ATTACHMENT You can send recognition results as one or more files attached to a mail message if you have installed a MAPI-compliant mail application, such as Microsoft Outlook. " To send a document by e-mail •...
Page 78
3. Your mail application appears with the attachment(s) in a new empty message. Attachments take the name used for the last save of the document in OmniPage Pro, or ‘Untitled from OmniPage’. The suitable file extension is added, and numerical suffixes for multiple attachments.
6 Technical information This chapter provides troubleshooting and other technical information about using OmniPage Pro 11. Please also read the online Readme file and other help topics, or visit the ScanSoft web pages. The Scanner Information web page contains detailed and regularly updated information about scanner setup and support.
Visit the support section of ScanSoft’s web site at www.scansoft.com. It contains Tech Notes on commonly reported issues using OmniPage Pro 11. Our web pages may also offer assistance on the installation process and troubleshooting. Turn off your computer and your scanner, turn your scanner back on, and then restart your computer.
Testing OmniPage Pro Restarting Windows 95, 98, 2000 or ME in safe mode or Windows NT in VGA mode allows you to test OmniPage Pro on a simplified system. This is recommended when you cannot resolve crashing problems or if OmniPage Pro has stopped running altogether.
5. Launch OmniPage Pro and try performing OCR on an image. Use a known image file such as one of the supplied sample files. Note You can also run OmniPage Pro 11 from a command line in its own safe mode. Choose Start Run, browse for the file OmniPage.exe...
Clear the cache for your web browser and limit its size. UPPORTED FILE TYPES The program supports a wide range of file types. Several important types have been added in OmniPage Pro 11. File types for opening and saving images Multi-...
Note Saving to PDF format is supported, with four options. One of these is to export image only. But this exports the recognition results as images, not the original images. This is done in the Save As dialog box. See the section Saving to PDF. File types for saving recognition results Exten- Format levels...
OPD files created by OmniPage Pro 10 and the similar MET files from OmniPage Pro 9. These files remain in their old format and a copy is converted to OmniPage Pro 11. Saving to PDF You have four choices when saving recognition results to Portable Document Format (PDF) files.
PROBLEMS This section contains information and solutions for possible OCR problems. First we provide suggestions for improving recognition accuracy, second on getting good results from fax input and finally on system or performance problems arising during OCR. Text does not get recognized properly Try these solutions if any part of the original document is not converted to text properly during OCR: Look at the original page image and ensure that all text areas are...
If you use True Page as the Text Editor view or for export, recognized text is put into frames (formatting boxes). Some text may be hidden if a frame is too small. To view the text, place the cursor in the text frame and use the arrow keys on your keyboard to scroll to the top, bottom, left, or right of the frame.
Break complex page images (lots of text and graphics or elaborate formatting) into smaller jobs. Draw zones manually or modify automatically created zones and perform OCR on one page area at a time. See the section in chapter 3 on creating and modifying zones.
N D E X Bold text, 24 Customizing columns in Detail Brightness, 50, 86 view, 28 Accuracy, 49 Cutting and pasting text, 23 Accuracy improvement, 31, 63 Acquire Text menu item, 45 Acquire Text Settings, 45 Changing paragraph order, 70 Acquired page, 26 Changing text flow between Deferred processing, 29...
Page 90
Interrupting automatic processing, 41 Earlier versions of OmniPage Generating table dividers, 57 Irregular zones, 24, 53 Pro, 13 Get Page button, 40, 42 Italic text, 24 Editing a training file, 65 Getting online Help, ix Editing a user dictionary, 62 Graphic zone, 55 Editing character attributes, 67 Graphics editing, 68...
Page 91
Minimum system requirements, Online HTML Help, ix Proofed page, 26 Online registration, 17 Proofing documents in later Modifying a zone template, 57 OPD files, 29 sessions, 29 Moving between pages, 26 Opening image files, 83 Proofing options, 32, 38, 60 Moving table dividers, 56 Optical character recognition, 20 Proofreader dialog box, 39, 60...
Page 92
Rows in tables, 56 Single column pages with tables, TIFF images files, 83 Training, 63 Single-column zone, 54 Training files, 65 Slow recognition, 87 Troubleshooting, 79, 80 Safe mode, 81 Solutions for poor performance, True Page view, 59, 66, 72 Sample images files, 81 TWAIN, 14 Save and Launch, 74...
Need help?
Do you have a question about the OMNIPAGE PRO 11 and is the answer not in the manual?
Questions and answers