|
Document Scanning
We offer scanning solutions for everything, whether it be drawings, maps,
newspapers, photographs, colour-slides, office documentation, books,
reports, thesis, magazines, advertising brochures, paintings.
If you can create it we can scan it!
OCR
OCR stands for Optical Character Recognition. This term is typically used
for general character recognition which includes the transformation of
anything humanly readable to machine manipulatable representation.
In this context the task of character recognition involves understanding
of machine printed characters and handwritten characters. There have been
significant advances in the former and the bulk of software available for
automation is geared towards the recognition of machine printed character
recognition
Bare minimum requirements for image scanning to be used for OCR are
bi-level 200 dpi. The assumption is that the letters will have to be
sufficiently large to be effectively extracted and small fonts and small
print scanning will probably not work very well with this method. It is more
widely accepted that 300 dpi is readable by humans and should also be used
for scanning. Since humanly unreadable documents are usually not presented
for OCR, it suffices to consider a 300 dpi scanner. While documents are
bi-level images in most part, illumination plays a big role in gathering of
a good scan and some times there is a need to perform adaptive thresholding
to arrive at a clean bi-level document image. To be able to do this, the
scanner must be able to retrieve greyscale scans. A scanner able to gather a
256 level greyscale scan is sufficient mostly due to the fact that the
greyscale resolution is beyond the capabilities of a human eye. So for a
robust system a scanner with 300 dpi and 256 greyscale is preferred.
Higher resolution is always nice but does not improve OCR performance by
a large amount, and use of colour does not improve performance unless the
text image to be scanned is in multi colour format. If the print is in one
colour, not necessarily black, there should be sufficient contrast that the
greyscale scanner will be able to extract the character components, but if
multi colour fonts and multi colour backgrounds are used extensively, the
proper choice would be to use a colour scanner. Note that an adaptive
thresholding still needs to be done in order to convert to bi-level for
input to an appropriate OCR.
Optical Character Recognition (OCR) is a technology that functions much
like a printer in reverse. An OCR system reads printed text and converts it
to an electronic format for use in document processing applications. There
are a wide variety of OCR systems in use today, from the massive document
handling computers used by post offices, to the desktop systems that employ
scanners for reading text into word processing and spreadsheet applications.
While they often differ in the combination of technologies employed, all
OCR systems have several things in common. They use some form of bitmapped
image as an input, whether drawn from a printed document, magnetic tape; or
image file. They also employ one or more algorithms (rules or procedures
used to solve problems) to translate combinations of dots in a bitmap into a
recognized character. Finally, all OCR systems output recognized characters
in some kind of computer usable medium, including but not limited to punch
cards, electronic data (i.e., point-of-sale scanners in grocery stores) and
formatted text.
While recognition accuracy is an important part of an OCR product, it is
not the only concern. Recognition products are productivity tools - their
objective is to make people more productive by reducing the time it takes to
translate printed text or image files into editable text. Recognition
accuracy is only a part of a total productivity solution. The measure of a
truly useful OCR product is not just its recognition ability, but whether,
and to what extent, it improves your productivity.
The essential tasks for an OCR product are those that allow you to work
most efficiently, i.e., to maximize your throughput. Users of current OCR
products know that you can waste a considerable amount of time getting to
the point where your electronic document is ready to use. Among the most
common time sinks are: manually defining page layout, assigning text,
graphic and/or numeric zones, proofing recognition errors and reformatting
documents after export. A product that minimizes or eliminates the
additional time it takes to perform these tasks is the product that
maximizes throughput.
A professional stand-alone software product designed to convert
raster-scanned files into vector formats for Computer Aided Design (CAD) and
Geographic Information System (GIS).
ProVec is a raster to vector conversion software package especially
designed for CAD users. ProVec can vectorise very large files, limited only
by available virtual memory. ProVec incorporates a powerful raster editor to
make pre-Vectorisation modifications. The power of ProVec really becomes
obvious through the extensive range of user definable Vectorisation
parameters and the ability to preview the Vectorisation before processing
the entire file.
|