Product

DAVA Norm

Drop a messy CSV, get a clean structured table back. Snake-case headers, type inference + coercion, whitespace trim, drop empty rows. Plus Smart Tables: per-column PII tags and outlier counts.

Quickstart (Python)

bashpip install dava-norm
pythonimport asyncio
from dava_norm import Client

async def main():
    async with Client(api_key="dava_live_…") as c:
        with open("messy.csv", "rb") as f:
            result = await c.preview("messy.csv", f.read())
        print(f"{result.rows_in} → {result.rows_out} rows "
              f"({result.dropped_rows_empty} empty rows dropped)")
        for col in result.columns:
            print(f"  {col.name_in!r} → {col.name_out!r}  "
                  f"({col.inferred_type}, {col.sensitivity_tag}, "
                  f"{col.outlier_count} outliers)")
        with open("clean.csv", "w") as f:
            f.write(result.cleaned_csv)

asyncio.run(main())

Quickstart (TypeScript)

typescriptimport { Client } from "@avaresearch/dava-norm";
import { readFile } from "node:fs/promises";

const c = new Client({ apiKey: process.env.DAVA_API_KEY! });
const result = await c.preview("messy.csv", await readFile("messy.csv"));
console.log(`${result.rows_in} rows → ${result.rows_out} rows`);

Smart Tables

Every column comes back tagged with a sensitivity inference (PII detection) and, for numeric columns, an outlier count. The sensitivity tag is a hint surfaced to the dashboard so customers can mask before exporting; it's not a security boundary — Trust Layer policies do the actual access enforcement.

TagDetector
emailStandard email regex; ≥ 80% of sample matches.
phone10-15 digit phone-shaped values.
ssn_usNNN-NN-NNNN.
credit_card13-19 digits passing Luhn check.
ibanCountry code + check digits + BBAN.
dobISO date or DD/MM/YYYY / MM/DD/YYYY.
name_like1-3 capitalized words, conservative threshold.
noneNothing matched. Most columns end up here.

API surface

MethodEndpointPurpose
POST/v1/norm/previewMultipart upload of one CSV/TSV (≤ 5 MB). Returns cleaned bytes inline + per-column stats.