Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Table of Contents
stylenone

Info

Continuous Discovery

Once you index data sources, the process will repeat automatically for new data(rows, files, emails, etc.). You do not need to intervene; just tell GEODI the recheck period. This is done in Project Wizard.

Info

Index Storage

  1. The index needs some storage. The size will be much smaller than the data, but it is unpredictable. You may assume %10 to %20 generally.

  2. Options like sampling mode or similarity indexing., affect the index size.

  3. A backup space for the index should also be reserved for uninterrupted service.

Info

Indexing Speed

  1. The time very depends on the CPU, Memory, Disk and other resources of the Server that GEODI runs on.

  2. Data throughput of the Network and Disk of data sources are also very important.

  3. Options like OCR or FacePro greatly affect the performance.

  4. Indexing speed explained on the settings page affects the speed. We suggest that you set it to maximum. GEODI will use the CPU as much as possible but leave at least a core for user interaction.

Image Removedimage-20240919-142743.pngImage Added

Info

Options

The beginning should be “Index all Content”. After that, you may use all other options. If you set scheduled indexing and have periodic backups you will not need to use options other than maintenance needs.

Image Added

image-20240728-074018.png

Info

Monitoring Indexing

GEODI informs you about the progress of indexing. Please be careful about progress bar, which is not lineer. That is GEODI can not know how much time will require to index for future documents, so the progress bar is only an estimate using the previous document indexing time.

Info

Index Storage

  1. The index needs some storage. The size will be much smaller than the data, but it is unpredictable. You may assume %10 to %20 generally.

  2. Options like sampling mode or similarity indexing., affect the index size.

  3. A backup space for the index should also be reserved for uninterrupted service.

Info

Continuous Discovery

Once you index data sources, the process will repeat automatically for new data(rows, files, emails, etc.). You do not need to intervene; just tell GEODI the recheck period. This is done in Project Wizard.

Sampling

Sampling is possible for both, structured and unstructured data. Each data source asks you the sampling values. Sampling saves great time for discovery projects. We suggest you always use sampling for DB discovery. For unstructured data sampling is also a good starting point. Start with sampled mode and see what is in data, are there any unnecessary types or are there any permission problem.

...