Pick your unique key
Choose any subset of columns. Email, customer ID, full-name + DOB — whatever the business says is the dedup key. Dedup builds a normalized hash from those columns.
Find duplicate rows in a messy dataset. Pick the columns that should be unique; Dedup groups likely-duplicate rows into clusters with a suggested canonical row per group.
Dedup runs on the Norm pipeline. After cleaning, rows with the same normalized key cluster together. The canonical row is the one with the fewest empty cells; ties broken by lowest row index.
Each capability runs in the shared engine — the Norm pipeline, the Trust audit chain, the Decisioning mode toggle. Same substrate as the other four products.
Choose any subset of columns. Email, customer ID, full-name + DOB — whatever the business says is the dedup key. Dedup builds a normalized hash from those columns.
Per cluster, the row with the fewest empty cells wins. You see the selection rationale alongside the cluster, so reviewers can override.
Every cluster decision writes a chain block. Auditors get a tamper-evident record of which rows were merged, when, and why.
In Decisioning mode, low-confidence clusters route to a human reviewer before merging. Staff can approve, reject, or split clusters. All actions audited.
The Python SDK is the most mature. TypeScript follows the same shape. Both ship with strict types and async-first APIs.
Same auth as the rest of DAVA: bearer API key on Authorization: Bearer dava_live_… or session cookie + CSRF for browser flows.
We'll run your hardest dataset through DAVA Dedup during a 5-day pilot. You keep the cleaned output and the evidence pack either way.