Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Table of Contents
stylenone

Info

Indexing Speed

  1. GEODI has an indexing speed settings. We suggest that you set it to maximum. Please check the details on Settings page.

  2. There are other factor that affects indexing speed:

    1. The time very depends on the CPU, Memory, Disk and other resources of the Server that GEODI runs on.

    2. Data throughput of the Network and Disk of data sources are also very important.

    3. Options like OCR or FacePro greatly affect the performance

    . Indexing speed explained on the settings page affects the speed. We suggest that you set it to maximum. GEODI will use the CPU as much as possible but leave at least a core for user interaction
    1. .

image-20240919-142743.png

Info

Options

The beginning should be “Index all Content”. After that, you may use all other options. If you set scheduled indexing and have periodic backups you will not need to use options other than maintenance needs.

image-20240728-074018.png

Info

Monitoring Indexing

GEODI informs Once you about the progress start indexing the process can be monitored.

  1. The progress bar shows the aprozimate percentge of indexing. Please be careful about progress bar, which is not lineer. That is GEODI can not know how much time will require to index for future documents, so the progress bar is only an estimate using the previous document indexing time.

  2. This area shows the numbers. The graphics shows trend in terms of document/second.

  3. This are shows if any error or warning generated. You can click and download the report. This report resides in %appdata%logs folders.

  4. This link downloads a detailed log about indexing named timelog. Timelogs can also be foun in %appdata%\logs folder.

  5. Error reports and detailed reports about the project.

image-20240920-105836.pngImage Added

Info

Index Storage

  1. The index needs some storage. The size will be much smaller than the data, but it is unpredictable. You may assume %10 to %20 generally.

  2. Options like sampling mode or similarity indexing., affect the index size.

  3. A backup space for the index should also be reserved for uninterrupted service.

...

Filters (%100)

Explanation

IgnoreRules

Ignore rules contain file extensions, directory names and some patterns.

Default list contains *.DLL, *.SYS, “programm files” and similar.

Any file matches ignore rules are not indexed and logged at all.

Settings are in:

  • <geodi>Settings\IgnoreFileTypes

  • <geodi>\Settings\IgnoreFolders

(warning) If you need to override defauls please do it in %appdata%

KnownFiles

These files are the ones GEODI has a reader like PDF, DOCX etc. The full list is Supported Formats

These files are processed as expected unless there is an IgnoreRule or ProtectRule. Ignorule will set the file type invisible. Protectrule may set so size limitation.

UnknownFiles

By default unknownfile types are ignored.

You may override this settings from Project Wizard advanced settings.

If you use “only name and date” then all unknown extensions will be indexed.

You may add any unwanted to ignore list but these actions requires to run discovery all over again.

image-20240919-115040.png

ProtectRules

These rules is to protects system and network again too large files. Protect rules apply to known and unknown files.

The content are grouped as local and far. There is no limitation for local content which resides in local folders and network folders. Far means files from GDE, e-mail attachments and files from web pages.

By default, Far content is filtered as any file greater than 100Mb, and Compressed files greater than 500Mb are indexed as name only. You will know these files but not their content.

Settings are in:

<geodi>\Settings\Engine\ResourceBalancing

(warning) If you need to override defauls please do it in %appdata%

Query of Content/Files status

there are special queries to query how a content indexed.

  1. status:OnlyName → gives the content with only nane and date. These files comes from UnkownFiles and ProtectRules.

  2. status:HasScanError → Files unread by file error, encyiption or so. These files are marked by an ! at the name.  

  3. status:IsContainer → Files within a folder or ZIP/RAR.

  4. status:IsCompletedIndex → Content succesful indexed.

  5. status:Crashed → Incase of an index recovery after a system crash GEODI will recover the index. This query shows the unsuccesslful content. To avoid these you should use index backup.

  6. Status:PartialRead → Content partialread by protectRules.

  7. GEODICryptedContent → encyripted content. 

  8. GEODICryptedContentPart → parlt encyripted content

Troubleshooting

Expand
titleIndexing is slow

GEODI discovery engine is one of the fastest among other discovery engines. Slow indexing may depend on machine, settings or enviroment.

  1. Check indexing speed and be sure that speed is high

  2. Check engine errors, If a source is throwing too many errors this may slow down the indexing

  3. Another task may be using too much resources,

  4. Too many recognizers may slow down a indexing

  5. Slow disk may slow down indexing. Consider dividing index and putting some part to a fast disk, like SSD.

  6. Use sampling mode if you need a quciker result

...

Expand
titleIndex errors

GEODI generates error logs during indexing. These logs mostly are about content and shuld be considered as warning or info. There may be real erros about system, you will be informed about them.

Most of them are

  1. Unreadable content

  2. Encyrpted content

  3. Unreachable content (because of permissions)

FAQ

Expand
titleMyDocument count and GEODI content count does not match

GEODI content count includes all folders and files in compressed files like ZIP, RAR. So it is normal not to match.

And some filetypes may be in ignorelists.

Other than that be sure that GEODI covers all.