Technical brief
The OpenEMPI Matching Engine
A high-performance, runtime-configurable entity-resolution platform for the most demanding data environments — deterministic, probabilistic, and artificial intelligence-based matching in one engine.
Record linkage and entity resolution
OpenEMPI combines matching algorithms, configurable distance metrics, and review workflows in one engine.
What is record linkage?
Record linkage is the process of identifying records that refer to the same real-world person, organization, place, or other entity across one or more systems.
What is entity resolution?
Entity resolution goes beyond finding duplicates: it groups related records, preserves evidence, and supports a trusted master record for downstream systems.
How does OpenEMPI deduplicate records?
OpenEMPI standardizes data, generates candidate pairs, compares fields with configurable metrics, scores likely matches, and routes uncertain cases to review.
How does human review work?
Candidate matches that fall between automated thresholds can be reviewed by data stewards with confidence scores, field-level evidence, and audit logging.
Three matching paradigms, one engine
OpenEMPI doesn't force a single matching strategy. Choose the approach that fits your data — or combine them — all behind the same configuration and review workflow.
Probabilistic record linkage
The classic probabilistic record-linkage framework with numerous proprietary enhancements. Field agreements and disagreements are weighted into a match score with tunable upper and lower thresholds.
Deterministic rules
Exact and fuzzy matching rules are configured to fit your data and use case — from simple exact matches to complex, multi-field conditions.
Artificial intelligence
Artificial intelligence-based models learn from labelled match data to capture patterns that fixed rules miss, lifting accuracy on messy, real-world records.
The OpenEMPI matching process
The workflow is transparent enough for technical evaluation and configurable enough for production data quality constraints.
Standardize data
Normalize names, addresses, dates, phones, and other fields before comparison.
Block and index
Generate likely candidate pairs without comparing every record against every other record.
Compare fields
Use exact, fuzzy, phonetic, numeric, date-aware, phone-aware, and address-aware metrics.
Score and classify
Apply deterministic, probabilistic, and artificial intelligence-based models to classify candidates.
Review uncertainty
Route borderline candidates to human review when business rules require stewardship.
Resolve records
Create trusted master records and keep downstream systems aligned with the resolved identity.
Flexible entity resolution
OpenEMPI doesn't force you into a fixed record shape. The platform is built around flexible algorithms, extensible components, and configurable entity definitions.
Cutting-edge algorithms
Deterministic, probabilistic, and machine-learning matching in one engine, so each deployment can choose the strategy that best fits its data.
Extensible architecture
Advanced matching algorithms plug into an OpenEMPI instance, and that same extension model applies across the components that support matching workflows.
Configurable data model
Record definitions are customized to your datasets — patient demographics, providers, facilities, customers, business listings, or any entity you define.
Highly customizable
Matching components expose configuration parameters tuned to your data quality, duplicate-rate tolerance, and operational review process.
Supported distance metrics
Enterprise features
- Real-time & batch processing modes
- Deterministic, probabilistic, and AI-powered matching in one engine
- Manual review workflow with audit logging
- Horizontally scalable architecture
- Data-source weighting & reliability scoring
Integration first
OpenEMPI is designed to live inside your stack, not replace it. A RESTful API and webhook system integrate cleanly with modern data lakes, warehouses, and operational systems — with HL7 and FHIR for healthcare.
See the engine on your data
Tell us about the records you need to resolve and we'll walk you through how OpenEMPI matches, scores, and merges them.