Dataset configuration governs every aspect of how data is displayed, how duplicates are detected, and how merges are handled in Dedupe.
Accessing Dataset Configuration
Existing Datasets: Configuration can be accessed from the Dedupe Dashboard or directly within the dataset.
New Datasets: The configuration panel appears automatically during dataset creation.
You can open configuration in several ways:
From the dataset list view — select the three dots next to a dataset and choose Edit.
From the dashboard tiles — click the three dots on a dataset card and select Edit.
From inside an open dataset — use the three-dot menu next to Analyze, then select Edit Dataset.
Configuration Tabs
The dataset configuration panel is organized into six tabs, each serving a specific function.
1. General
This tab allows you to name your dataset and select the Salesforce object that will be analyzed for duplicates.
2. Fields
Use this tab to choose which fields will be visible during the deduplication process.
Fields can be reordered via drag-and-drop to change how they're displayed in side-by-side comparisons.
3. Filter
This section lets you apply custom filters to focus deduplication efforts on specific subsets of data.
You can:
Add filters by selecting a field, setting criteria, and providing values.
Use SOQL queries for more advanced filtering.
Example showing multiple field-based filters applied.
Example showing filters written directly in SOQL mode.
New — IN / NOT IN Filter Support
You can now filter datasets using multiple values for a single field.
Select the IN or NOT IN operator, then enter multiple values in a new multi-line text area that supports scrolling and resizing — ideal for bulk inclusion or exclusion filters (e.g., multiple states or industries).
4. Match
This tab specifies the models used for duplicate detection.
Key Features
Matching Model: Select the primary model used to detect duplicates.
Minimum Match Confidence: Set a threshold to define when two records are considered duplicates.
Additional Matching Models: Assign up to two more models to a single dataset for broader detection coverage.
Each model is color-coded and has its own confidence slider.
Multi-Model Execution Options
Then Run – Sequential Execution
Runs models one after another. The second model only evaluates records not matched by the first, ideal for layered precision (e.g., exact first, ML second).
And Run – Parallel Execution
Runs all selected models simultaneously for maximum coverage. Results are merged with confidence prioritization.
5. Merge
Here you can define how merges are handled.
Master Record Rule: Determines which record remains after a merge (e.g., Most Recently Modified).
Field Merge Rule: Specifies how to populate master record fields (e.g., Fill Empty Fields).
Undo / Rollback Merge: Enable options to reverse merges or restore records from encrypted backups.
6. Live Dedupe
Enable real-time duplicate detection, AI recommendations, and automatic merging.
When Live Dedupe is enabled:
DataGroomr continuously monitors record changes (create, update, delete, undelete).
It runs incrementally by default, scanning only new or modified records for faster results.
Full dataset analysis remains available when required.
Custom transformations can be applied before matching for improved accuracy.
The Get AI Recommendations toggle activates DataGroomr’s AI to automatically tag potential duplicates in real time. AI-labeled duplicates appear as tags allowing quick review and merge actions or these tags
can be in the Aut- merge criteria.
Auto Merge
Automatically merge records that meet or exceed the defined match confidence threshold.