Matching Models are algorithms used to detect duplicate records. DataGroomr provides two options for duplicate detection, machine learning based matching model and classic matching model.


Accessing Matching Models


To access matching models, press Objects/Matching item in the Navigation Menu and press on the desired object.




Creating a Matching Model


There are two types of Matching Models in DataGroomr. Machine Learning Models or Classic Matching Models

Press ADD MODEL button and select desired model type:




Classic Matching Model

  • Rule-based approach: You manually define which fields to compare (e.g., First Name, Email, Company) and assign weights to each.

  • Deterministic logic: Matches are found based on exact or fuzzy logic using predefined thresholds.

  • Transparent scoring: You can clearly see which fields influenced a match and by how much.

  • Best for: Simple or predictable data scenarios where you want full control over match logic.


Machine Learning Matching Model

  • AI-assisted approach: Uses a machine learning algorithm trained on labeled examples of duplicates and non-duplicates.

  • Pattern recognition: Learns complex relationships between fields that may not be obvious or linear.

  • Adaptive: Gets better with more training data (you label examples, the model adjusts).

  • Confidence scores: Matches are scored based on statistical confidence, not rigid rules.

  • Best for: Large or messy datasets where manual rules may miss patterns or create false positives.


Selecting either model option will open a dialog that includes the following elements:


  • Name - enter a unique name for your model
  • Fields - select all fields that should be included as part of this model.
  • Field Sets - contains a list of Salesforce's configured matching rules and sets of fields that DataGroomr recommends based on commonly matched fields for that object for you.
  • Assist - DataGroomr AI will select the best fields to be used for deduplication based on the best practices and the statistics of the data population.


Press Save button to create a model.


Basic Fields Configuration

The prefilled matching type is auto selected by DataGroomr based on the type of field.

The following comparison types are available for Classic Models:

  • Exact - matches if values are exactly the same
  • Similar - matches if values are similar

The following comparison types are available for ML models:

  • Text - compares text values. Default comparison type;
  • Short Text - compares short text values, faster than text, good examples to use it are City names and Zip Codes;
  • Long Text - compares long text values like Description, preselected for TextArea field types;
  • Name - compares person or company names;
  • List - compares values in a list, preselected for Picklists;
  • Date/Time - compares values as date and time, preselected for Date and DateTime field types;
  • Number - compares numbers, preselected for price and number field types;
  • Exact - checks if values are exactly the same;



Note: All data is cleaned up before comparison for all types of fields
- text fields are converted to lowercase and all special characters are removed
- phone numbers cleaned up to contain only digits
- emails are converted to lowercase
- websites are normalized to exclude protocol and leading www


In addition to the settings mentioned above, classic models allow to specify Field Weights. Clicking on the percentage values will allow you to specify the importance of field similarity matches between specified fields. The higher the percentage controlled by the slider, the greater the influence of matches between record fields will have against the match confidence score.



Note: Classic models are based on the OR condition, it means that you will see matched groups based on the match confidence as calculated across all fields in a model. For example,


Using the example model above will generate groups where records
- matching on all fields will have 100% match confidence;
- matching on Full Name and Business Phone will have 70% match confidence;
- matching on Full Name, Phone and Email will have 90% match confidence;
- matching on Full Name and Email will have 60% match confidence;
- and so forth as long as group match confidence is more than the minimum confidence selected in a dataset.



Advanced Fields Configuration (Gear icon)


1. Blank Values

Matching behavior on blank values can be customized to allow blank values to be considered as matches. Users can specify whether or not to match records if both fields or either field is blank or if blank values should be disregarded entirely from the process. Match confidence values are impacted by this setting.


2. Field Groups

Fields can be configured as a group, which tells DataGroomr to match records as a collective group, rather than individual fields in records. This feature is useful to match on values that can be stored in different fields, for example, an address field inside an account could be populated in one of many places. A group tag assigned to fields allows the cross comparison and duplicate detection in mis-entered fields.


Good to know: Fields of type Phone and Email are pre-configured as a group.


3. Synonyms

When selected, words contained within a dictionary list are considered to be the same word. A common example would be the contact name Robert which might be alternately be entered as Rob, Bob or Robbie. 


4. Ignore Words

When selected, words contained within a dictionary list are ignored, therefore field value similarities between two records being compared are ignored. Ie. Corporation, Corp, Incorporated or Inc.


Add additional words into your list of synonyms and ignore words by Supervisr: Dictionaries


5. First N Characters 

First N Characters setting allows users to specify a defined number of first characters in a field value instead of the entire text. This feature might be used to compare only area codes within a phone number field or the prefix numbers within a zip code.



Machine Learning Matching Model


Machine Learning Models are based on algorithms powered by machine learning.  Before ML model can be used in a dataset it will need to be Trained. Click Train button to start training. 


There are two options available for training machine learning models


  • Train by AI Assistant (recommended) autonomous training and fine-tuning the machine learning model by AI. If you choose this option, once AI has profiled and trained ML model, you may apply directly to specified dataset.
  • Train Manually is a process where you tell DataGroomr which records are duplicates and which are not, in order to teach the algorithm to identify patterns in data based on the fields you specify in the Set Up process.


Tip: A model may be trained multiple times to improve accuracy.



Learn more: Training machine learning model


Editing an Existing Model


To Edit an existing model, select it and then press the Open button.



Classic matching model can be edited at any time. Machine learning model can be edited only while it's in a Draft state. If it's Trained then it can be trained or re-trained and another version of the trained model will be created.


Learn more: Training machine learning model


Cloning a Model


Occasionally you may need to modify or retrain an existing matching model.  For example, you may need to remove or add a field.  However, an existing model cannot be changed this way, but you can create a copy that can be edited.


To do this, select the rule and then press the CLONE button. 




Deleting Model


A model can be deleted by selecting rule and then pressing the Trash button.



Assigning to Datasets


Classic matching models and trained machine learning models can be assigned to datasets or can be designated as default model for your organization.


From the Matching Models feature, select a model press the Assign button.  Then choose the datasets to apply and press Assign button.




Good to Know: Alternatively, the same can be done using the Dataset Configuration window.


To set a model as a default, select the rule and then press the SET DEFAULT button. The green 'Default' label will be displayed next to that model.