Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Filters (%100)

Explanation

IgnoreRules

Ignore rules contain file extensions, directory names and some patterns.

Default list contains *.DLL, *.SYS, “programm files” and similar.

Any file matches ignore rules are not indexed and logged at all.

Settings are in:

  • <geodi>Settings\IgnoreFileTypes

  • <geodi>\Settings\IgnoreFolders

(warning) If you need to override defauls please do it in %appdata%

KnownFiles

These files are the ones GEODI has a reader like PDF, DOCX etc. The full list is Supported Formats

These files are processed as expected unless there is an IgnoreRule or ProtectRule. Ignorule will set the file type invisible. Protectrule may set so size limitation.

UnknownFiles

By default unknownfile types are ignored.

You may override this settings from Project Wizard advanced settings.

If you use “only name and date” then all unknown extensions will be indexed.

You may add any unwanted to ignore list but these actions requires to run discovery all over again.

image-20240919-115040.png

ProtectRules

These rules is to protects system and network again too large files. Protect rules apply to known and unknown files.

The content are grouped as local and far. There is no limitation for local content which resides in local folders and network folders. Far means files from GDE, e-mail attachments and files from web pages.

By default, Far content is filtered as any file greater than 100Mb, and Compressed files greater than 500Mb are indexed as name only. You will know these files but not their content.

Settings are in:

<geodi>\Settings\Engine\ResourceBalancing

(warning) If you need to override defauls please do it in %appdata%

Troubleshooting

Expand
titleIndexing is slow

GEODI discovery engine is one of the fastest among other discovery engines. Slow indexing may depend on machine, settings or enviroment.

  1. Check indexing speed and be sure that speed is high

  2. Check engine errors, If a source is throwing too many errors this may slow down the indexing

  3. Another task may be using too much resources,

  4. Too many recognizers may slow down a indexing

  5. Slow disk may slow down indexing. Consider dividing index and putting some part to a fast disk, like SSD.

Expand
titleToo much CPU is used

High CPU usage for an engine like GEODI should be expected. CPU usage of GEODI never goes to unresponsive machine state. GEODI always leaves one core to other tasks.

  1. CPU usage may be temporary, just wait to see if it drops

  2. If a consistent CPU usage try decreasing indexing speed

  3. OCR or FacePro needs CPU, if you are using these options, decrease indexing speed or wait.

Expand
titleIndex size is too high

GEODI compressed to index as much as possible. Index size upto %20 percent of corpus size should be expected. If it looks too high to you, then you may try the following.

  1. Similarity index may be open. This index need disk.

  2. Some files may hav too much information (logs, csv etc), you may exlude them.

Expand
titleIndex errors

GEODI generates error logs during indexing. These logs mostly are about content and shuld be considered as warning or info. There may be real erros about system, you will be informed about them.

Most of them are

  1. Unreadable content

  2. Encyrpted content

  3. Unreachable content (because of permissions)