Matching Models are algorithms used to detect duplicate records. DataGroomr provides two options for duplicate detection, machine learning based model and classic matching model.
Accessing Matching Models
To access matching models, press Objects/Matching item in the Navigation Menu and press on the desired object.
Creating a Matching Model
Press ADD MODEL button and select desired model type:
Classic Matching Model
Selecting Classic Matching option will open a dialog that includes the following elements:
- Name - enter a unique name for your model
- Fields - select all fields that should be included as part of this model. Matched records will be identified based on exact match on these fields. The match confidence will be calculated as proportion of the fields matched to the total number of fields in a model (for example, with a four fields model, records that match in all four fields will have 100% match confidence, and records where only two fields match will have 50% match confidence).
Press Save button to create a model.
Machine Learning Matching Model
Selecting Machine Learning option will open a dialog that includes the following elements:
- Name - enter a unique name for your model.
- Fields - select all the fields that should be included as part of this model. Notice that a pencil icon is shown within each field. Pressing this icon will allow you to control how the matching is done for this field.
- Continuous Training - If enabled, the new training answers will be added to the existing training data (if available). If disabled, the model will be trained from scratch.
Machine Learning Models are based on algorithms powered by machine learning. When a new model is created it will need to be Trained. Supervised Training is a process where you provide examples of the duplicates and distinct records for DataGroomr to learn and identify patterns in data based on the fields you specified.
Tip: A model may be trained multiple times to improve accuracy.
For new models or those in Draft status users may edit the matching (or comparison) type used for each field. This is done by clicking on the pencil icon inside a field which brings up a drop-down dialogue as shown below.
The prefilled matching type is auto selected by DataGroomr based on the type of object field and it is generally not
The following comparison types are available:
- Text - compares text values. Default comparison type;
- Short Text - compares short text values, faster than text, good examples to use it are City names and Zip Codes;
- Long Text - compares long text values like Description, preselected for TextArea field types;
- Name - compares person or company names;
- List - compares values in a list, preselected for Picklists;
- Date/Time - compares values as date and time, preselected for Date and DateTime field types;
- Number - compares numbers, preselected for price and number field types;
- Exact - checks if values are exactly the same;
Press Save to save model in Draft status or press Train button to advance to the Training stage.
Learn more: Training machine learning model
To Edit an existing model, select it and then press the Open button.
Classic matching model can be edited at any time. Machine learning model can be edited only while it's in a Draft state. If it's Trained then it can be trained or re-trained and another version of the trained model will be created.
Learn more: Training machine learning model
Occasionally you may need to modify or retrain an existing matching model. For example, you may need to remove or add a field. However, an existing model cannot be changed this way, but you can create a copy that can be edited.
To do this, select the rule and then press the CLONE button.
A model can be deleted by selecting rule and then pressing the Trash button.
Assigning to Datasets
Classic matching models and trained machine learning models can be assigned to datasets or can be designated as default model for your organization.
From the Matching Models feature, select a model press the Assign button. Then choose the datasets to apply and press Assign button.
Good to Know: Alternatively, the same can be done using the Dataset Configuration window.
To set a model as a default, select the rule and then press the SET DEFAULT button. The Default label will be displayed next to that model.