Data quality models are used to score the quality of data based on user defined formula rules that generate an integer score that gauges attributes of values contained within a dataset. Rules can be used to score datasets on parameters based on completeness, formatting, timeliness or any custom metrics.


Rule editor comprises of 5 sections:

  • the left hand-side of the screen allows users to specify a rule name and description; setup color coding and bucketing ranges for score values and select fields that can be used in a rule designer.
  • the right hand-side consists of the rule designer and rule preview section



Rule Designer

Similar in usage and function as the Rule Designer for Master Record Rules and Field Merge rules, the Data Quality Model rule designer is used to create rules that define the scores of field values contained in a dataset. 


Scenarios

Formatting - This example rule ensures that phone numbers, country name, zip codes and state/provinces match a specified format. 

Timeliness This example rule assigns a score based on the Created Date/Age of an account

Completeness - This example rule determines a score based on the number of non-empty fields within a dataset


Scoring

This section contains block that work with the data quality score.



Modify Score [OPERATOR] - Modifies the resulting score of a model by assigning or applying on operator against the value on the right side:

  • + Addition
  • - Subtraction
  • * Multiplication
  • / Division
  • = Assignment

Score - returns the current value of a score


Patterns

This section includes blocks to check values adherence to certain formats or patterns.



  • Phone format - Verifies that a phone number field of a record currently being evaluated follows E164, International, National or RFC3966 phone number formatting
  • Country format - Verifies that a country field of a record currently being evaluated is fully spelled out, two characters abbreviated or three characters abbreviated
  • State/Province format - Verifies that a state field of a record currently being evaluated is fully spelled out or two letter abbreviated. The abbreviation for the state is cross referenced for validity against the country being referenced. 
  • Zip/postal code format - Verifies that a postal code field of a record currently being evaluated is formatted correctly. The zip code format is cross referenced against the country of the address for formatting correctness.


Good to know: Blocks in a rule are executed sequentially from top to bottom.


Logic

This section contains blocks that when combined with the IF/DO block, can be used to create logic that checks field values within datasets

If/Do - If the entry criteria evaluates to be true, the inner contents of the block will execute, otherwise it will proceed to execute the block directly underneath the block

[INPUT] Equals/Not Equal To/Less Than/Greater Than/Less or Equal/Greater or Equal/Contains/Does Not contain/Starts With/Ends With [INPUT]- Returns a true/false value that evaluates both sides of the input value against the chosen logical evaluator

[INPUT] and/or [INPUT] - Returns a true/false value that checks for either both value inputs to be true or just one value input to be true

[INPUT] is empty - Returns a true/false value that determines whether the input value is an empty field

Not [VALUE]- negates the right input value and passes it to the left side receiving block


Values

This section allows users to input a text, Boolean, empty field, numerical and date/time value into the receiving block

"[INPUT]" - Passes the text string of INPUT to the receiving block

True/False - Passes a true/false value into the receiving block

Empty - Passes an empty field value to the receiving block

- Passes a integer/floating point value to the receiving block

Now - Passes the current date/time into the receiving block


Fields

This section allows users to add in loops and reference field names and values in their data quality models

For [SELECTED] fields of [INPUT] type - Iterates over corresponding field values of any/address/phone type in each record of a dataset.

Field Name - Returns the name of the current field being evaluated. This block may only be used within "for fields" blocks

Field Name [INPUT] - Returns the name of the corresponding to the specified field name

Field Value - Returns the value of the current field within a record loo. This block may only be used within "for fields" blocks 

Field Value [INPUT] - Returns the field value corresponding to the specified field name

Fields count [SELECTED] - Returns the number of fields inside of the record type being evaluated by the for loop


Math

This section contains mathematical operations that return a numerical value

[INPUT] [OPERATOR] [INPUT] - Returns the numerical value when the operation is computed against the two inputs

Round - Rounds the value to the nearest integer value, values above 0.5 round up. 

Round up - returns the smallest integer greater than or equal to the right input (ceiling)

Round down - returns the greatest integer less than or equal to the right input (floor)


Date

This section contains an operator that allows users to add or subtract numerical values to time

add [NUMERICAL INPUT] [TIME UNIT] to [DATE/TIME INPUT] - Adds a numerical input value to a specified date/time value.  A negative numerical input can be used to subtract time.


Color and Buckets

Users can choose the base color of the background shown in each score value when displayed in Brushr as well is the rule preview section. By specifying the integer values in the Bucket Min/Mid/Max fields, the color threshold of each score can be specified.



Selected fields

This area allows to select object fields to be available in a rule designer, the same fields will be used by the blocks in the Rule Designers referencing "selected fields".


Rule preview

Similar to the functionality of the Rule Preview section within Supervisr merge rules, the Data Quality Model Rule Preview section shows a live view showing each generated score that corresponds to the rule created within the Rule Designer. The background color intensity of each score is specified by the values specified in the Color and Buckets section.


The output of an example completeness rule, note that the missing fields contributing to the diminished score