Data Quality Models are used to score the quality of data based on user defined formula rules that generate an integer score that gauges attributes of values contained within a dataset. Rules can be used to score datasets on parameters based on completeness, formatting, timeliness or any custom metrics.


The Data Quality Model Editor comprises of 6 sections:

1. Name and Description

2. Color Buckets

3. Sync Settings

4. Selected Fields

5. Rule designer

6. Rule preview


1. Name and Description

The name and description fields are referenced by DataGroomr when Data Quality Models are associated with a dataset and displayed within column views and hover overs in Brushr. A short name combined with a descriptive description field are reccomended.

2. Color and Buckets

Users can choose the base color of the background shown in each score value when displayed in Brushr as well is the rule preview section. By specifying the integer values in the Bucket Min/Mid/Max fields, the color threshold of each score can be specified. Swapping the values for Min-point and Max-point provides the ability to select a darker color for a minimum value or vice versa. 


3. Sync Settings

This setting allows users to synchronize the computed Data Quality Model scores back into Salesforce for additional analysis, within dashboards and reports or for display on the individual Salesforce record views.


Push to Salesforce [Enabled/Disabled]

Toggle whether the field specified in Target Field should be synchronized to Salesforce whenever an analysis event is triggered.


Target Field [Text]

Select an existing field within a Salesforce object to sync the computed Data Quality Model Score to, if an intended field does not exist, click on "+ Create new field" and name the field. A new custom field will be created within the object being scored after you click Save.


In order for this feature to work properly, the logged in user needs to have Salesforce write access to the record field specified by Target Field.


Note: In order to be able to create a new field from DataGroomr, a user account should have Modify Metadata Through Metadata API Functions permission or Modify All Data permission. These permissions are required to access Metadata API calls.


4. Selected fields

This area allows to object fields to be referenced within the rule designer. Fields need to be added into the list of Selected Fields in order to be referenced in the context of the rule designer. 


5. Rule Designer

Similar in usage and function as the Rule Designer for Master Record Rules and Field Merge rules, the Data Quality Model rule designer is used to create rules that define the scores of field values contained in a dataset. 


Scenarios

Formatting - This example rule ensures that phone numbers, country name, zip codes and state/provinces match a specified format. 

Timeliness This example rule assigns a score based on the Created Date/Age of an account

Completeness - This example rule determines a score based on the number of non-empty fields within a dataset


Scoring

This section contains block that work with the data quality score.



Modify Score [OPERATOR] - Modifies the resulting score of a model by assigning or applying on operator against the value on the right side:

  • + Addition
  • - Subtraction
  • * Multiplication
  • / Division
  • = Assignment

Score - returns the current value of a score


Patterns

This section includes blocks to check values adherence to certain formats or patterns.



  • Phone format - Verifies that a phone number field of a record currently being evaluated follows E164, International, National or RFC3966 phone number formatting
  • Country format - Verifies that a country field of a record currently being evaluated is fully spelled out, two characters abbreviated or three characters abbreviated
  • State/Province format - Verifies that a state field of a record currently being evaluated is fully spelled out or two letter abbreviated. The abbreviation for the state is cross referenced for validity against the country being referenced. 
  • Zip/postal code format - Verifies that a postal code field of a record currently being evaluated is formatted correctly. The zip code format is cross referenced against the country of the address for formatting correctness.


Good to know: Blocks in a rule are executed sequentially from top to bottom.


Logic

This section contains blocks that when combined with the IF/DO block, can be used to create logic that checks field values within datasets

If/Do - If the entry criteria evaluates to be true, the inner contents of the block will execute, otherwise it will proceed to execute the block directly underneath the block

[INPUT] Equals/Not Equal To/Less Than/Greater Than/Less or Equal/Greater or Equal/Contains/Does Not contain/Starts With/Ends With [INPUT]- Returns a true/false value that evaluates both sides of the input value against the chosen logical evaluator

[INPUT] and/or [INPUT] - Returns a true/false value that checks for either both value inputs to be true or just one value input to be true

[INPUT] is empty - Returns a true/false value that determines whether the input value is an empty field

Not [VALUE]- negates the right input value and passes it to the left side receiving block


Values

This section allows users to input a text, Boolean, empty field, numerical and date/time value into the receiving block

"[INPUT]" - Passes the text string of INPUT to the receiving block

True/False - Passes a true/false value into the receiving block

Empty - Passes an empty field value to the receiving block

- Passes a integer/floating point value to the receiving block

Now - Passes the current date/time into the receiving block


Fields

This section allows users to add in loops and reference field names and values in their data quality models

For [SELECTED] fields of [INPUT] type - Iterates over corresponding field values of any/address/phone type in each record of a dataset.

Field Name - Returns the name of the current field being evaluated. This block may only be used within "for fields" blocks

Field Name [INPUT] - Returns the name of the corresponding to the specified field name

Field Value - Returns the value of the current field within a record loo. This block may only be used within "for fields" blocks 

Field Value [INPUT] - Returns the field value corresponding to the specified field name

Fields count [SELECTED] - Returns the number of fields inside of the record type being evaluated by the for loop


Math

This section contains mathematical operations that return a numerical value

[INPUT] [OPERATOR] [INPUT] - Returns the numerical value when the operation is computed against the two inputs

Round - Rounds the value to the nearest integer value, values above 0.5 round up. 

Round up - returns the smallest integer greater than or equal to the right input (ceiling)

Round down - returns the greatest integer less than or equal to the right input (floor)


Date

This section contains an operator that allows users to add or subtract numerical values to time

add [NUMERICAL INPUT] [TIME UNIT] to [DATE/TIME INPUT] - Adds a numerical input value to a specified date/time value.  A negative numerical input can be used to subtract time.


6. Rule preview

Similar to the functionality of the Rule Preview section within Supervisr Merge and Master rules, the Data Quality Model Rule Preview section shows a live view showing each generated score that corresponds to the rule created within the Rule Designer. The background color intensity of each score is specified by the values specified in the Color and Buckets section.


The output of an example completeness rule, note that the missing fields contributing to the diminished score