Product
DAVA Connect
Discover relationships across datasets. Heuristic structural matches (FK candidates, value overlap, name match) plus an opt-in LLM second pass for semantic links.
Quickstart (Python)
bashpip install dava-connectpythonimport asyncio
from dava_connect import Client
async def main():
async with Client(api_key="dava_live_…") as c:
a = await c.upload_file(
"customers.csv",
b"customer_id,email\n1,a@b.c\n2,c@d.e\n",
)
b = await c.upload_file(
"orders.csv",
b"order_id,customer_id\n10,1\n11,2\n",
)
result = await c.discover([a.file_id, b.file_id])
for edge in result.edges:
print(
f"{edge.source_column} ↔ {edge.target_column} "
f"({edge.kind}, {edge.confidence:.0%})"
)
asyncio.run(main())Quickstart (TypeScript)
bashnpm install @avaresearch/dava-connecttypescriptimport { Client } from "@avaresearch/dava-connect";
import { readFile } from "node:fs/promises";
const c = new Client({ apiKey: process.env.DAVA_API_KEY! });
const customers = await c.uploadFile("customers.csv", await readFile("customers.csv"));
const orders = await c.uploadFile("orders.csv", await readFile("orders.csv"));
const result = await c.discover([customers.file_id, orders.file_id]);
for (const edge of result.edges) {
console.log(`${edge.source_column} ↔ ${edge.target_column} (${edge.kind}, ${(edge.confidence * 100).toFixed(0)}%)`);
}How discovery works
Connect runs in two passes. Pass 1 (always on) is a deterministic heuristic: column-name Jaccard, sample-value Jaccard, and inferred-type compatibility scored together into a confidence in [0, 1]. Pass 2 (opt-in via enable_semantic=true) is an LLM call that looks at the file profiles and proposes additional semantic relationships the heuristic misses.
Edge kinds
| Kind | When the heuristic emits it |
|---|---|
fk_candidate | Strong value overlap AND at least one side reads as an ID. Most likely a real foreign key. |
value_overlap | Strong value overlap, neither side is ID-shaped. Could be a shared dimension. |
name_match | Names align but values don't overlap. Possible rename, possible coincidence — surface for review. |
composite | No single signal dominates; the score got over the bar via combination. The catch-all. |
LLM-proposed edges (when enable_semantic=true) come back tagged evidence.source = "llm_semantic" with a reason string the LLM produced.
API surface
| Method | Endpoint | Purpose |
|---|---|---|
| POST | /v1/connect/upload | Multipart upload of one CSV/TSV (≤ 10 MB). |
| POST | /v1/connect/jobs | Submit discovery on ≥ 2 file_ids. Body: {file_ids, min_confidence?, enable_semantic?}. |
| GET | /v1/connect/jobs | List jobs for the active org. |
| GET | /v1/connect/jobs/{id} | Job header + status + counters. |
| GET | /v1/connect/jobs/{id}/result | Graph payload — nodes (one per file) + edges, sorted by confidence. |
Embed the graph
bashnpm install @avaresearch/dava-connect-react @avaresearch/dava-connect reacttsximport { ConnectGraph } from "@avaresearch/dava-connect-react";
export function MyView({ jobId }: { jobId: string }) {
return <ConnectGraph apiKey={process.env.NEXT_PUBLIC_DAVA_API_KEY!} jobId={jobId} />;
}