Training Matching Rules : DataGroomr Support

Machine Learning Models are based on algorithms powered by machine learning. When a new model is created it will need to be Trained. There are two options for training machine learning models.

AI Assistant replaces the need for manual training by autonomously training and fine-tuning the machine learning model for you.

Manually Training is a process where you tell DataGroomr which records are duplicates and which are not in order to help the algorithm train and identify patterns in data based on the fields you specify in the Set Up process.

Tip: A model may be trained multiple times to improve accuracy.

Set Up

First, name your Machine Learning model, then add fields that you would like to match on. Keep in mind, that the fewer the number of fields, the few the number of duplicates that the model requires in order to be trained. So start with a small model, then create larger models with more fields as needed.

Train

When a user presses the TRAIN button, DataGroomr will analyze your existing data to identify duplicate (and non-duplicate) sets of records. The dialog to initiate training opens.

AI Assistant - There’s no need to train or tune the model yourself. The AI assistant automates the entire process, so you can focus on outcomes—not algorithms.
Manually Training - teaches DataGroomr which records are duplicates by using your input to recognize patterns from the fields you set up.
Train on - allows to select training data, all Salesforce records or specific datasets from Trimmr or Importr;
Continuous Training - If enabled, the new training answers will be added to the existing training data (if available). If disabled, the model will be trained from scratch.

The amount of time required for model to initialize is based on the number of records in your Salesforce environment and number of fields selected. **If you chose AI Assistant, when model analyzing is complete, you may directly apply to dataset. You may exit this window and return at any time.

After model initializes, you will be shown sets of records along with three options:

YES - the records are duplicates
NO - the records are not duplicates
NOT SURE - if you cannot determine if the records are duplicates

Coverage

As record groups are reviewed, the Coverage score will change. Coverage indicates the percentage of data where model is confident in its prediction. Higher coverage means the model has learned the patterns in the data, while lower coverage suggests the model requires more training. We recommend providing examples until desired Coverage value is achieved.

When minimum required number of duplicate sets is reviewed the FINISH button will become active. Pressing this button will generate a confirmation window with additional information.

Press CONFIRM button to activate the model.

Retraining Existing Model

Occasionally an existing model needs to be retrained. This sometimes happens when a model is not detecting all the duplicates, or a new important field has been added to Salesforce.

Notice that an existing model has a version number. To re-train this model, select it from the list and press OPEN.

Then select the TRAIN button and CONFIRM to create a new version.

On the next screen, you are asked to evaluate if a pair of records is a duplicate. The process is the same as training a new model, except the training picks up where the previous version stopped.

Press the Finish button to complete the re-training. Note that a new version of the model will be created.

Good to know: All the versions of a model are available for selection by a dataset

Training Machine Learning Matching Model Print

Set Up

Train

Retraining Existing Model

Related Articles