Machine Learning Models are based on algorithms powered by machine learning.  When a new model is created it will need to be Trained. Supervised Training is a process where you tell DataGroomr which records are duplicates and which are not in order to help the algorithm train and identify patterns in data based on the fields you specify in the Set Up process.


Tip: A model may be trained multiple times to improve accuracy.


Set Up

First, name your Machine Learning model, then add fields that you would like to match on. Keep in mind, that the fewer the number of fields, the few the number of duplicates that the model requires in order to be trained. So start with a small model, then create larger models with more fields as needed. 



Train


When a user presses the TRAIN button, DataGroomr will analyze your existing data to identify duplicate (and non-duplicate) sets of records.  The amount of time required is based on the number of records in your Salesforce environment and number of fields selected.  You may exit this window and return at any time.



You will be shown sets of potentially duplicates records along with three options:

  1. YES - the records are duplicates
  2. NO - the records are not duplicates
  3. NOT SURE - if you cannot determine if the records are duplicates



Coverage

As record groups are reviewed, the Coverage score will change. Coverage indicates the percentage of data where model is confident in its prediction. Higher coverage means the model has learned the patterns in the data, while lower coverage suggests the model requires more training. We recommend providing examples until desired Coverage value is achieved.


When minimum required number of duplicate sets is reviewed the FINISH button will become active.  Pressing this button will generate a confirmation window with additional information.


Press CONFIRM button to activate the model.


Retraining Existing Model


Occasionally an existing model needs to be retrained.  This sometimes happens when a model is not detecting all the duplicates, or a new important field has been added to Salesforce. 


Notice that an existing model has a version number.  To re-train this model, select it from the list and press OPEN.  



Then select the TRAIN button and CONFIRM to create a new version.



On the next screen, you are asked to evaluate if a pair of records is a duplicate. The process is the same as training a new model, except the training picks up where the previous version stopped.  


Press the Finish button to complete the re-training.  Note that a new version of the model will be created.


Good to know:  All the versions of a model are available for selection by a dataset