Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

https://www.dece.com.tr/geodi-moduller#ocr

The GEODI OCR module can work not only on scanned documents, but also on images and even videos. It makes text and Barcode/QRcodes in these data sources searchable.

You need a GEODI OCR Module license to use the OCR module. GEODI OCR offers 2 different methods. The first method uses GEODI OCR infrastructure, the other method uses ABBYY engine. The ABBYY option may require additional license and per-use fees. The GEODI OCR engine is better in terms of performance and does not require any non-license usage fees.

How do I set up OCR for sources?

  • You need to complete the following settings for each source you want to OCR.

    • You can convert scanned documents to SPDF (Searchable PDF). SPDF creation requires additional space and time. When the result is a PDF, the word you are looking for is marked on the PDF.

    • OCR of very large documents (such as Scanned Project) is also optional.

    • Barcode and QRCode recognition can be provided.

    • Generating SPDF can increase the total time by around 50%.

  • On the last page of the project wizard general settings for OCR should be made. These settings affect all resources.

    • You can specify which engine will be used for OCR (GEODI or ABBYY).

    • You can add additional languages according to the languages in your documents.

    • With Fast OCR you can save 50%-70% of your time. With Fast OCR, the performance decreases slightly but saves a lot of time

OCR Settings for Source

OCR setting for the whole project

A few things to be aware of

OCR is a time-consuming process and can tie up computers/servers. You should take this into account if you have a large number of documents that require OCR.

The fastest option is to make SPDF, Fast OCR enabled.

The performance of the OCR process depends on the scan resolution and quality of the documents. The success decreases with very old documents such as blueprints or very old documents, skewed pictures taken with a cell phone, skewed images such as an incompletely opened book, etc.

The C++ Runtime 2015-2019 package must be installed in the environment where the GEODI OCR module will run.

Detail OCR Settings

The OCR module also includes technical settings. These settings can be set in the OCRSettings.json file in the GEODI/Settings/Geodi.OCR directory. This file has a .sample extension if no settings have been made. This extension should be deleted and saved with the .json extension. The json rules must be followed when writing the content.

(GEODI must be restarted to apply these changes)

  • "NoSpellCheck": true/false Corrects misspelled words with the dictionaries in GEODI or with the dictionaries in GEODI itself.

  • "NoConnectedWordCheck": true/false ensures that the "-" sign placed at the end of the line when the word is incomplete preserves the meaning of the word it is placed on.

  • "NoEnhance": true/false corrects the defects in the documents to be OCRed. It performs the OCR process and puts the characters it finds on the original page.

  • The "Quality" value is related to the quality of the images in the document. It does not change the OCR quality. The higher the value, the higher the OCR time and SPDF size.

  • "MaxPageCount" default 64. Limitation of pages to be OCRed. The first 64 pages are done and the other pages are not processed. It is added as it is. As the value increases, the processing time increases.

Scope of Training

For GEODI User

  1. What is OCR, search impact of OCR performance, expectations

  2. Going over document samples to be OCRed

    1. Good document, Dirty document, document taken with cell phone, between the books, Barcode and QRCode sample

    2. Photo

    3. (question) Video

  3. Explaining that the process will happen automatically with drag and drop or other ways of adding data

  4. barcode recognition

  5. Awareness of OCR process performance

Module:OCR

For GEODI Admin

  1. GEODI-OCR - Difference of ABBY and Why GEODI OCR?

    1. Better

    2. No price per transaction..

    3. This document covers GEODI OCR

  2. OCR requires processing power

    1. If you want to OCR everything, a long process has begun.

  3. OCR Setup

    1. Module:OCR

    2. Activate OCR for a source in the GEODI Project

      1. Activate OCR

      2. Activate barcode recognition

    3. General OCR settings in GEODI Project

      1. What is the FastOCR setting? → Speed

        1. How to make

        2. Impact

      2. What is SPDF setting? → Capability

        1. How to make

        2. SPDF benefit, what happens without it?

        3. Impact

      3. Why we delete/don't delete TIFFs? → Saving

        1. They are not necessary after OCR, they take up space.

        2. Be sure to ask the user?

  4. Barcode recognition

  5. OCR with videos

    1. Mask application, why?

      1. Removing camera traces

      2. Elimination of edges

      3. Which details can we capture with video OCR?

  6. If we want to OCR all images

    1. SPDF of tif files containing geometric information, GeoTIFF, does not occur.

    2. SPDF directory and metafile awareness.

    3. OCR in different languages.

  7. Questions and Answers

**

  • No labels