Configuring Datasets : DataGroomr Support

Dataset configuration governs every aspect of how data is displayed, how duplicates are detected, and how merges are handled in Trimmr.

Accessing Dataset Configuration

Existing Datasets: Configuration can be accessed from the Trimmr dashboard or directly within the dataset.
New Datasets: The configuration panel appears during dataset creation.

Configuration Tabs

The dataset configuration panel is organized into five tabs, each serving a specific function:

1. General

This tab allows you to name your dataset and select the Salesforce object that will be analyzed for duplicates.

Advanced Options:

Bypass standard Salesforce assignment rules: Determines record ownership when creating or updating records. If enabled, assignment rules will be ignored.
Bypass Salesforce duplicate rules: Allows saving records even if Salesforce duplicate rules would normally block them.

2. Fields

Use this tab to choose which fields will be visible during the deduplication process. Fields can be reordered via drag-and-drop to change how they're displayed in side-by-side comparisons.

3. Filter

This section lets you apply custom filters to focus deduplication efforts on specific subsets of data. You can:

Add filters by selecting a field, setting criteria, and providing values.
Use SOQL queries for more advanced filtering.

4. Match

This tab specifies the models used for duplicate detection.

Key Features:

Matching Model: Select primary model used to detect duplicates
Minimum Match Confidence: Set a threshold to control the threshold at which the system considers two records to be duplicates. Adjust this threshold to increase or decrease the strictness of duplicate detection
Additional matching models: Assign up to two additional models to a single dataset. This feature is particularly useful when classic and ML models are combined for comprehensive duplicates detection. Each model is represented by a color-coded tag and includes its own match confidence slider.

Multi-Model Execution Options: "Then Run" vs "And Run"

When assigning multiple matching models to a dataset, Trimmr offers two execution strategies to enhance flexibility and control: Then Run and And Run. These methods define how models are applied and how their results are interpreted.

Then Run – Sequential Model Execution

The Then Run method executes models one after the other in a pipeline. It works as follows:

Model 1 is applied to the entire dataset to identify and group duplicate records.
Matched duplicates are removed from the dataset.
Model 2 then analyzes the remaining unmatched records and the master records from the first round.

This approach ensures that each model works only on records that haven't already been matched by a previous model. It’s particularly useful when models are specialized—for example, one model for exact matching and another for machine learning matching. The sequential process minimizes duplicate overlap and helps refine results across progressively cleaner data.

And Run – Parallel Model Execution

The And Run method applies all assigned models to the entire dataset simultaneously. Here's how it works:

All selected models analyze the full dataset in parallel.
Each model identifies its own set of potential duplicates.
Trimmr merges the results, prioritizing the matches based on confidence scores from each model.

This approach is ideal for combining the strengths of multiple models to maximize coverage. Because results from all models are retained and merged, this method provides broader visibility into potential matches—even if different models flag the same records in different ways.

5. Merge

Here, you can define rules that determine:

Master Record Selection: Automatically choose which record survives during a merge.
Field Value Rules: Specify how data from child (non-surviving) records should be retained in the master record.

Additional Option:

Undo/Rollback Merges: Enable this feature to allow users to reverse merge actions. Learn more →

Configuring Trimmr Datasets Print