In most organizations, up to 40% of documents are duplicates or similar. This content:

Create confusion
Complicate search results
Waste storage space

Duplicate or Similar content can be scattered across multiple sources. A single file may have dozens of copies, possibly in the same source or across different folders, drives, emails, and systems.

GEODI automatically detects duplicate or similar content across data sources and helps you focus on clean, deduplicated content, leading to more efficient discovery, classification, and reporting.

A copy is the same content. Duplicity or similarity is calculated based on content, not format. So, a PDF and DOCX may be duplicates or similar. Images are included.

1 Find All Duplicates
2 Deduplication with GEODI - Identifying and Managing Duplicate Content
3 Find Similars
4 Deduplication of Similar Content

Find All Duplicates

These queries find all duplicate content; these are not suitable for remediation. Use duplicate2 or original2 instead.

duplicate → Lists all documents that have duplicates.
-duplicate → Lists only unique documents (those without any duplicates)
duplicate:(doc:a.pdf) (finds copies of a.pdf)
duplicate:"Georgia Aquarium" (finds a copy document containing the words)

Deduplication with GEODI - Identifying and Managing Duplicate Content

The following queries separate the original and duplicates. Let's assume we have 10 duplicates.

duplicate2: → Returns all duplicates within the target dataset → returns nine content
original2: → Returns only the originals, one per duplicate set → returns 1

This approach helps you clean your content, reduce storage, and improve the precision of search and reporting.Duplicate2 effectively removes duplicate content when used with a Destroy Workflow.

Sorgu	Açıklama

Sorgu	Açıklama
duplicate2:(order:New\|Old)	Specifies how to choose the original among duplicate files when no source list is defined: `order:Old` → Older versions are treated as the originals `order:New` → Newer versions are kept, older ones are marked as duplicates
duplicate2:(i:Source1,Source2,...)	`i:` stands for Important and keeps duplicates in the listed sources, that is If duplicates exist elsewhere, they are marked for deletion or action. Example: `duplicate2:(i:HR_Drive,Legal_Archive) -->` → Keep duplicates found in `HR_Drive` and `Legal_Archive`, remove others
duplicate2:(ni:Source1,Source2,...)	`ni:` means Not Important and If a duplicate exists outside these sources, the files in the listed sources can be safely deleted Example: `duplicate2:(ni:Temp, Shared_Folder) -->` → If a duplicate exists somewhere else, delete it from `Temp` or `Shared_Folder`
original2:(..)	You can only use original2 to find originals. original2 is just the opposite of duplicate2 with the same parameters and interpretation.

This query gives you complete control over how GEODI handles duplicates, making cleanup safer, smarter, and aligned with your organizational priorities.

Find Similars

Like duplicates, GEODI finds similarities between text and image contents. Unlike duplicates, similarity has degree. All duplicates are also similar but you can exclude them.

maxcount:<n> - limit similar count by n.
minsimilarity:0.7 - set similarity index. The default is 0.7.
excludeDuplicates:true exclude copies default = false means copies are listed under similar.

similar:(doc:a.pdf) → similar content to a.pdf

similar:”Georgia Aquarium" (finds similar document containig the words)

Deduplication of Similar Content

When dealing with large sets of similar (but not identical) documents, you often want to retrieve only one representative rather than going through every variant. This improves search efficiency and helps you clean up redundant content through actions like deletion or quarantine.

Query	Description

Query	Description
`benzer2`	Returns one document per group with 70% or more similarity (default threshold)
`benzer2:(distance:0.9)`	Only returns one document from groups with 90%+ similarity
`benzer2:(<query>)`	Filters within the results of your custom query (e.g., a folder or date range)
`benzer2:(getnonsimilar:true)`	Also includes documents that do not have any similar matches in the result set

Search Duplicate or Similar Content

Find All Duplicates

Deduplication with GEODI - Identifying and Managing Duplicate Content

Find Similars

Deduplication of Similar Content

Related content