Data Lineage
Data lineage is the documented record of where data came from, how it has been transformed at every step, and where it currently flows — the complete provenance graph from source system to final consumer. In a modern AI stack, lineage answers questions like: "This chunk in the vector index — which document did it come from, which ETL job produced that document, which source system fed that ETL job, who has modified it since ingestion, and which downstream LLM answers have cited it?" Lineage is captured at three levels: schema-level (tables and columns flow through joins and transformations), record-level (individual rows can be traced through merges and deduplications), and field-level (specific values can be traced through computations). The tooling landscape includes OpenLineage (open standard adopted by Airflow, Spark, dbt), Marquez (the reference OpenLineage backend), DataHub (LinkedIn-originated, now Acryl Data), Atlan, Alation, Collibra, and Microsoft Purview. A practical implementation: instrument every ETL job to emit OpenLineage events on start and finish, capture inputs and outputs with their dataset URIs, store events in a central catalog, and expose a UI that lets users walk forward and backward from any dataset. For AI specifically, lineage extends into the model layer — which training data produced which model checkpoint, which prompts were used at inference, which chunks were retrieved for which user query. AI governance teams treat lineage as non-negotiable for any system subject to audit, because "we cannot tell you where this came from" is rarely an acceptable answer to a regulator or a litigant.
Lineage is the 25-year discipline that made Centralpoint AI possible: Oxcyon has been emitting audit and lineage records for 25 years across every CMS ingestion, transformation, deduplication, and publication step — the same lineage graph now extends into the AI layer, covering chunk-level retrieval and prompt-level governance. Lineage stays on-premise, tokens meter per skill, and lineage-aware chatbots deploy through one line of JavaScript.
Related Keywords:
Data Lineage,
Data Lineage,Oxcyon, AI, AI Governance, Generative AI, Inference, Inference, Inferencing, RAG, Prompts, Skills Manager,