EMPI Matching

An robust EMPI engine runs standardization routines such as name, title, salutation, company and address normalization. This lets OHMPI generate an aggregate score based on each address component (i.e “Ave.”, “Ave” and “Avenue” are all considered equal). The engine does this automatically so customers don’t have to create complex business rules to parse an address. For example, the address fed to the engine is “10144 Hiawatha Ave”. The engine automatically breaks apart the address and compares scores as:

	Address1	Address on file	Min Score	Max Score	Actual Score
House No.	10144	10141	0	5	4.32
Street Name	Hiawatha	Hiawatha	-2	4	4
Street Type	Ave	Road	-2	3	-2
Total for Address:					6.32

Record Matching Process

All records are filtered down to a candidate record set based on a selection query and then compared one on one for matches. Based on the Matching Score, a record is identified as being a match, a potential match or not a match.

Most EMPI’s rely on a two step matching process. The first is described as “Casting the Wide Net” which generates a set of probable candidates. The second is a refined and detailed pass which runs through a field by field comparison. The various algorithms generate scores (Soundex, NYSIIS, etc.) on each field and a total is aggregated. Only the close matches are returned (determined by score and thresholds).

The reason vendors use this approach of two steps is for efficiency and response times. Response times would suffer greatly if we had to do a field by field comparison against millions of records. Robust EMPIs maintain indices that allow result sets / subsets to be derived in milliseconds.

Step 1) “Casting the Wide Net”

The EMPI leverages a combination of several “Blocking” or fuzzy queries for the first step. Each “Blocking” query includes a different set of fields (some Blocks might overlap others). For example, one blocker search might be “First Name / Last Name” and another might be “First Name / DOB / Gender” and a third might be “Last Name / DOB / SSN”.

These fuzzy queries are already defined in the base product. They are editable and additional Blocks can be added via the configuration GUI.

Step 2) “Refine Pass”

After retrieving our set of probable candidates (which could be hundreds or thousands), it is important to cut the list down. Since our current subset is just a sliver of the overall database, we can efficiently run field by field comparisons. Each field has attributes that the algorithms use to determine a score. Some of the weight considerations:

Probabilistic and Deterministic scoring
Reliability of data element / field or source
Character uncertainty (phonetic errors, transpositions, character insertion, deletion, and replacement)
Absolute difference in numbers (distance calculation)
Specific to Names:
Specific to Addresses:

Reach out for Instant Contact

If you need information fast, reach out to us for an immediate response. Our team of professionals is ready to discuss your challenges, requirements and questions. Please submit your contact information you’ll hear from us within a of couple hours.

EMPI Record Matching

EMPI Matching

Record Matching Process

Step 1) “Casting the Wide Net”

Step 2) “Refine Pass”

Reach out for Instant Contact