Acquire Manage Protect!!

<> Home <> Business Partners <> Contact us <>

Document Scanning

We offer scanning solutions for everything, whether it be drawings, maps, newspapers, photographs, colour-slides, office documentation, books, reports, thesis, magazines, advertising brochures, paintings.Visit our Business Partners page for inexpensive scanners!

If you can create it we can scan it!

OCR

OCR stands for Optical Character Recognition. This term is typically used for general character recognition which includes the transformation of anything humanly readable to machine manipulatable representation.

In this context the task of character recognition involves understanding of machine printed characters and handwritten characters. There have been significant advances in the former and the bulk of software available for automation is geared towards the recognition of machine printed character recognition

Bare minimum requirements for image scanning to be used for OCR are bi-level 200 dpi. The assumption is that the letters will have to be sufficiently large to be effectively extracted and small fonts and small print scanning will probably not work very well with this method. It is more widely accepted that 300 dpi is readable by humans and should also be used for scanning. Since humanly unreadable documents are usually not presented for OCR, it suffices to consider a 300 dpi scanner. While documents are bi-level images in most part, illumination plays a big role in gathering of a good scan and some times there is a need to perform adaptive thresholding to arrive at a clean bi-level document image. To be able to do this, the scanner must be able to retrieve greyscale scans. A scanner able to gather a 256 level greyscale scan is sufficient mostly due to the fact that the greyscale resolution is beyond the capabilities of a human eye. So for a robust system a scanner with 300 dpi and 256 greyscale is preferred.

Higher resolution is always nice but does not improve OCR performance by a large amount, and use of colour does not improve performance unless the text image to be scanned is in multi colour format. If the print is in one colour, not necessarily black, there should be sufficient contrast that the greyscale scanner will be able to extract the character components, but if multi colour fonts and multi colour backgrounds are used extensively, the proper choice would be to use a colour scanner. Note that an adaptive thresholding still needs to be done in order to convert to bi-level for input to an appropriate OCR.

Optical Character Recognition (OCR) is a technology that functions much like a printer in reverse. An OCR system reads printed text and converts it to an electronic format for use in document processing applications. There are a wide variety of OCR systems in use today, from the massive document handling computers used by post offices, to the desktop systems that employ scanners for reading text into word processing and spreadsheet applications.

While they often differ in the combination of technologies employed, all OCR systems have several things in common. They use some form of bitmapped image as an input, whether drawn from a printed document, magnetic tape; or image file. They also employ one or more algorithms (rules or procedures used to solve problems) to translate combinations of dots in a bitmap into a recognized character. Finally, all OCR systems output recognized characters in some kind of computer usable medium, including but not limited to punch cards, electronic data (i.e., point-of-sale scanners in grocery stores) and formatted text.

While recognition accuracy is an important part of an OCR product, it is not the only concern. Recognition products are productivity tools - their objective is to make people more productive by reducing the time it takes to translate printed text or image files into editable text. Recognition accuracy is only a part of a total productivity solution. The measure of a truly useful OCR product is not just its recognition ability, but whether, and to what extent, it improves your productivity.

The essential tasks for an OCR product are those that allow you to work most efficiently, i.e., to maximize your throughput. Users of current OCR products know that you can waste a considerable amount of time getting to the point where your electronic document is ready to use. Among the most common time sinks are: manually defining page layout, assigning text, graphic and/or numeric zones, proofing recognition errors and reformatting documents after export. A product that minimizes or eliminates the additional time it takes to perform these tasks is the product that maximizes throughput.

A professional stand-alone software product designed to convert raster-scanned files into vector formats for Computer Aided Design (CAD) and Geographic Information System (GIS).

ProVec is a raster to vector conversion software package especially designed for CAD users. ProVec can vectorise very large files, limited only by available virtual memory. ProVec incorporates a powerful raster editor to make pre-Vectorisation modifications. The power of ProVec really becomes obvious through the extensive range of user definable Vectorisation parameters and the ability to preview the Vectorisation before processing the entire file.

© Copyright 2001. All rights reserved. Contact: West-Net Systems