Matching Models are algorithms used to detect duplicate records. DataGroomr provides two options for duplicate detection, machine learning based model and classic matching model.


Accessing Matching Models


To access matching models, press Objects/Matching item in the Navigation Menu and press on the desired object.




Creating a Matching Model


Press ADD MODEL button and select desired model type:




Classic Matching Model


Selecting Classic Matching option will open a dialog that includes the following elements:

  • Name - enter a unique name for your model
  • Fields - select all fields that should be included as part of this model. Matched records will be identified based on exact match on these fields.
  • Field Weights - Clicking on the percentage values will allow you to specify the importance of field similarity matches between specified fields. The higher the percentage controlled by the slider, the greater the influence of matches betweeen record fields will have against the match confidence score.



Press Save button to create a model.


Note: Classic models are based on the OR condition, it means that you will see matched groups based on the match confidence as calculated across all fields in a model. For example,


Using the example model above will generate groups where records
- matching on all fields will have 100% match confidence;
- matching on Full Name and Business Phone will have 70% match confidence;
- matching on Full Name, Phone and Email will have 90% match confidence;
- matching on Full Name and Email will have 60% match confidence;
- and so forth as long as group match confidence is more than the minimum confidence selected in a dataset.


Gear Icon (Additional Options)


1. Field Groups

Fields can be configured as a group, which tells DataGroomr to match records as a collective group, rather than individual fields in records. This feature is useful to match on values that can be stored in different fields, for example, an address field inside an account could be populated in one of many places. A group tag assigned to fields allows the cross comparison and duplicate detection in mis-entered fields.


Good to know: Fields of type Phone and Email are pre-configured as a group.


2. First N Characters 

First N Characters setting allows to compare only specified number of first characters of the value instead of entire text. This feature could be used to compare only area codes within a phone number field


3. Match on Blank Values

Identifies duplicate records by matching against blank fields. This gives the model the ability to match records if one or both values in the records being compared are empty. 


4. Ignore Common Words

When this switch is on, all non-letters, non-numbers and common words like "Corp", "Inc", "and", etc are ignored when values are compared. 


Machine Learning Matching Model


Selecting Machine Learning option will open a dialog that includes the following elements:

  • Name - enter a unique name for your model.
  • Fields - select all the fields that should be included as part of this model. Notice that a pencil icon is shown within each field. Pressing this icon will allow you to control how the matching is done for this field.


Machine Learning Models are based on algorithms powered by machine learning.  When a new model is created it will need to be Trained. Supervised Training is a process where you provide examples of the duplicates and distinct records for DataGroomr to learn and identify patterns in data based on the fields you specified.


Tip: A model may be trained multiple times to improve accuracy.


Fields

The prefilled matching type is auto selected by DataGroomr based on the type of object field and it is generally not 

The following comparison types are available:

  • Text - compares text values. Default comparison type;
  • Short Text - compares short text values, faster than text, good examples to use it are City names and Zip Codes;
  • Long Text - compares long text values like Description, preselected for TextArea field types;
  • Name - compares person or company names;
  • List - compares values in a list, preselected for Picklists;
  • Date/Time - compares values as date and time, preselected for Date and DateTime field types;
  • Number - compares numbers, preselected for price and number field types;
  • Exact - checks if values are exactly the same;


Press Save to save model in Draft status or press Train button to advance to the Training stage.


Learn more: Training machine learning model


Editing an Existing Model


To Edit an existing model, select it and then press the Open button.



Classic matching model can be edited at any time. Machine learning model can be edited only while it's in a Draft state. If it's Trained then it can be trained or re-trained and another version of the trained model will be created.


Learn more: Training machine learning model

Cloning a Model


Occasionally you may need to modify or retrain an existing matching model.  For example, you may need to remove or add a field.  However, an existing model cannot be changed this way, but you can create a copy that can be edited.


To do this, select the rule and then press the CLONE button. 




Deleting Model


A model can be deleted by selecting rule and then pressing the Trash button.



Assigning to Datasets


Classic matching models and trained machine learning models can be assigned to datasets or can be designated as default model for your organization.


From the Matching Models feature, select a model press the Assign button.  Then choose the datasets to apply and press Assign button.




Good to Know: Alternatively, the same can be done using the Dataset Configuration window.


To set a model as a default, select the rule and then press the SET DEFAULT button. The green 'Default' label will be displayed next to that model.