How does Screena handle inaccurate data?

Before all, let’s clarify what inaccurate data is. Either human or machine errors can cause inaccurate data. It takes various forms: manual data mistakes (e.g., permuted name fields), missing data entry controls, unstructured/free format fields, missing or non-standardized information in databases, incompatible formats between data processing systems, etc.

Screena systematically controls the completeness and quality of imported data. For example, Screena ensures dates are always provided in accordance with the ISO 8601 format. Likewise, countries shall always be imported in ISO 3166-alpha 2 format.

When the original data is not compliant with those standards, Screena tries to resolve it using specific normalization rules and libraries. This normalization process harmonizes and transforms data into a format that makes attribute matching consistent. Normalization libraries are enriched with new synonyms or alternative spellings whenever an unknown or incompatible value is provided.

Screena’s rules-based algorithms tackle specific data quality issues such as typos, truncated names, out-of-order name elements, and split or concatenated names.

Screena data model also provides distinct fields to differentiate structured and unstructured information (e.g., parsed names vs. full names, structured addresses vs. free format addresses).

Distinct algorithm parameters are actionable to handle all data quality nuances. For example, it is possible to use the parameter nullMatch and specify how a match should be handled when one attribute associated with an algorithm is either empty or not provided.

In other instances, inaccurate dates can be matched within the same year or decade.

Similarly, addresses can be matched within the same region or subregion based on the United Nations geoscheme.

To achieve greater precision when screening free format fields, Screena applies advanced text analytics technics to detect distinct objects (named entities vs. addresses) within the same field and thus prevent irrelevant matches.

When it comes to name matching, Screena will call out generic machine learning models specifically trained with richer comprehensive datasets if no valid culture can be determined with high certainty.

Last updated