Machine Learning Models are based on algorithms powered by machine learning. When a new model is created it will need to be Trained. There are two options for training machine learning models.
AI Assistant replaces the need for manual training by autonomously training and fine-tuning the machine learning model for you.
Manually Training is a process where you tell DataGroomr which records are duplicates and which are not in order to help the algorithm train and identify patterns in data based on the fields you specify in the Set Up process.
Tip: A model may be trained multiple times to improve accuracy.
Set Up
First, name your Machine Learning model, then add fields that you would like to match on. Keep in mind, that the fewer the number of fields, the few the number of duplicates that the model requires in order to be trained. So start with a small model, then create larger models with more fields as needed.
Train
When a user presses the TRAIN button, DataGroomr will analyze your existing data to identify duplicate (and non-duplicate) sets of records. The dialog to initiate training opens.
- AI Assistant - There’s no need to train or tune the model yourself. The AI assistant automates the entire process, so you can focus on outcomes—not algorithms.
- Manually Training - teaches DataGroomr which records are duplicates by using your input to recognize patterns from the fields you set up.
- Train on - allows to select training data, all Salesforce records or specific datasets from Trimmr or Importr;
- Continuous Training - If enabled, the new training answers will be added to the existing training data (if available). If disabled, the model will be trained from scratch.
The amount of time required for model to initialize is based on the number of records in your Salesforce environment and number of fields selected. **If you chose AI Assistant, when model analyzing is complete, you may directly apply to dataset. You may exit this window and return at any time.

After model initializes, you will be shown sets of records along with three options:
- YES - the records are duplicates
- NO - the records are not duplicates
- NOT SURE - if you cannot determine if the records are duplicates
Coverage
As record groups are reviewed, the Coverage score will change. Coverage indicates the percentage of data where model is confident in its prediction. Higher coverage means the model has learned the patterns in the data, while lower coverage suggests the model requires more training. We recommend providing examples until desired Coverage value is achieved.
When minimum required number of duplicate sets is reviewed the FINISH button will become active. Pressing this button will generate a confirmation window with additional information.
Press CONFIRM button to activate the model.
Good to know: All the versions of a model are available for selection by a dataset
Occasionally you may need to modify or retrain an existing matching model. For example, you may need to remove or add a field. An existing ML model cannot be edited, but you can clone an existing model, make changes and retraing.
Please review this article for more information.