Product
DAVA Norm
Drop a messy CSV, get a clean structured table back. Snake-case headers, type inference + coercion, whitespace trim, drop empty rows. Plus Smart Tables: per-column PII tags and outlier counts.
Quickstart (Python)
bashpip install dava-normpythonimport asyncio
from dava_norm import Client
async def main():
async with Client(api_key="dava_live_…") as c:
with open("messy.csv", "rb") as f:
result = await c.preview("messy.csv", f.read())
print(f"{result.rows_in} → {result.rows_out} rows "
f"({result.dropped_rows_empty} empty rows dropped)")
for col in result.columns:
print(f" {col.name_in!r} → {col.name_out!r} "
f"({col.inferred_type}, {col.sensitivity_tag}, "
f"{col.outlier_count} outliers)")
with open("clean.csv", "w") as f:
f.write(result.cleaned_csv)
asyncio.run(main())Quickstart (TypeScript)
typescriptimport { Client } from "@avaresearch/dava-norm";
import { readFile } from "node:fs/promises";
const c = new Client({ apiKey: process.env.DAVA_API_KEY! });
const result = await c.preview("messy.csv", await readFile("messy.csv"));
console.log(`${result.rows_in} rows → ${result.rows_out} rows`);Smart Tables
Every column comes back tagged with a sensitivity inference (PII detection) and, for numeric columns, an outlier count. The sensitivity tag is a hint surfaced to the dashboard so customers can mask before exporting; it's not a security boundary — Trust Layer policies do the actual access enforcement.
| Tag | Detector |
|---|---|
email | Standard email regex; ≥ 80% of sample matches. |
phone | 10-15 digit phone-shaped values. |
ssn_us | NNN-NN-NNNN. |
credit_card | 13-19 digits passing Luhn check. |
iban | Country code + check digits + BBAN. |
dob | ISO date or DD/MM/YYYY / MM/DD/YYYY. |
name_like | 1-3 capitalized words, conservative threshold. |
none | Nothing matched. Most columns end up here. |
API surface
| Method | Endpoint | Purpose |
|---|---|---|
| POST | /v1/norm/preview | Multipart upload of one CSV/TSV (≤ 5 MB). Returns cleaned bytes inline + per-column stats. |