Module:OCR
The GEODI OCR module can work not only on scanned documents, but also on images and even videos. It makes text and Barcode/QRcodes in these data sources searchable.
You need a GEODI OCR Module license to use the OCR module. GEODI OCR offers 2 different methods. The first method uses GEODI OCR infrastructure, the other method uses ABBYY engine. The ABBYY option may require additional license and per-use fees. The GEODI OCR engine is better in terms of performance and does not require any non-license usage fees.
How do I set up OCR for sources?
You need to complete the following settings for each source you want to OCR.
You can convert scanned documents to SPDF (Searchable PDF). SPDF creation requires additional space and time. When the result is a PDF, the word you are looking for is marked on the PDF.
OCR of very large documents (such as Scanned Project) is also optional.
Barcode and QRCode recognition can be provided.
Generating SPDF can increase the total time by around 50%.
On the last page of the project wizard general settings for OCR should be made. These settings affect all resources.
You can specify which engine will be used for OCR (GEODI or ABBYY).
You can add additional languages according to the languages in your documents.
With Fast OCR you can save 50%-70% of your time. With Fast OCR, the performance decreases slightly but saves a lot of time
OCR Settings for Source
OCR setting for the whole project
OCR those that have not been OCRed
This command allows you to activate OCR settings for the scanned project at a later time. It initiates the Rescan service for OCR. Since OCR can significantly increase the scanning time of the project in use, you can scan the project without OCR initially and then activate OCR settings, allowing you to save time by OCR-ing the unprocessed items in the background. This command does not perform OCR repeatedly, focusing on not increasing unnecessary file versions. You can execute this command through the user interface or via DCC.
In the user interface, you can narrow down the content to be OCR-ed by specifying query criteria in the "OCR Unprocessed" field.
If OCR settings are not enabled in the project, the OCR command does not allow non-OCR'd files to be OCR'd. It does not allow starting a new OCR process until an OCR repair process is finished.
When you run the command, a message window opens. "... content will be examined for OCR. The process will be done in the background... Do you want to continue? Yes/No"
A few things to be aware of
OCR is a time-consuming process and can tie up computers/servers. You should take this into account if you have a large number of documents that require OCR.
The fastest option is to make SPDF, Fast OCR enabled.
The performance of the OCR process depends on the scan resolution and quality of the documents. The success decreases with very old documents such as blueprints or very old documents, skewed pictures taken with a cell phone, skewed images such as an incompletely opened book, etc.
The C++ Runtime 2015-2019 package must be installed in the environment where the GEODI OCR module will run.
Detail OCR Settings
The OCR module also includes technical settings. These settings can be set in the OCRSettings.json file in the GEODI/Settings/Geodi.OCR directory. This file has a .sample extension if no settings have been made. This extension should be deleted and saved with the .json extension. The json rules must be followed when writing the content.
(GEODI must be restarted to apply these changes)
"NoSpellCheck": true/false Corrects misspelled words with the dictionaries in GEODI or with the dictionaries in GEODI itself.
"NoConnectedWordCheck": true/false ensures that the "-" sign placed at the end of the line when the word is incomplete preserves the meaning of the word it is placed on.
"NoEnhance": true/false corrects the defects in the documents to be OCRed. It performs the OCR process and puts the characters it finds on the original page.
The "Quality" value is related to the quality of the images in the document. It does not change the OCR quality. The higher the value, the higher the OCR time and SPDF size.
"MaxPageCount" default 64. Limitation of pages to be OCRed. The first 64 pages are done and the other pages are not processed. It is added as it is. As the value increases, the processing time increases.
Scope of Training
For GEODI User
What is OCR, search impact of OCR performance, expectations
Going over document samples to be OCRed
Good document, Dirty document, document taken with cell phone, between the books, Barcode and QRCode sample
Photo
Video
Explaining that the process will happen automatically with drag and drop or other ways of adding data
barcode recognition
Awareness of OCR process performance
For GEODI Admin
GEODI-OCR - Difference of ABBY and Why GEODI OCR?
Better
No price per transaction..
This document covers GEODI OCR
OCR requires processing power
If you want to OCR everything, a long process has begun.
OCR Setup
Activate OCR for a source in the GEODI Project
Activate OCR
Activate barcode recognition
General OCR settings in GEODI Project
What is the FastOCR setting? → Speed
How to make
Impact
What is SPDF setting? → Capability
How to make
SPDF benefit, what happens without it?
Impact
Why we delete/don't delete TIFFs? → Saving
They are not necessary after OCR, they take up space.
Be sure to ask the user?
Barcode recognition
OCR with videos
Mask application, why?
Removing camera traces
Elimination of edges
Which details can we capture with video OCR?
If we want to OCR all images
SPDF of tif files containing geometric information, GeoTIFF, does not occur.
SPDF directory and metafile awareness.
OCR in different languages.
Questions and Answers
**