Schema Drift

Schema drift is the unannounced change in the structure of a data source — a renamed column, a new field, a changed type, a removed table — that breaks downstream pipelines because consuming systems made assumptions about the old schema. In an AI stack, schema drift breaks ETL pipelines that feed RAG indices, breaks fine-tuning data preparation, breaks evaluation suites built on labeled examples, and silently degrades retrieval quality when changes propagate without notice. Detection tooling includes Great Expectations (assertion-based data quality, the de facto standard), Soda (open-source plus commercial), Monte Carlo (the leading data observability commercial vendor), Datafold, Anomalo, and Bigeye. A practical pattern: define expectations on every critical dataset (column types, value ranges, null rates, distinct counts), run the suite as part of every pipeline execution, fail loudly on drift, and route alerts to the data team's incident channel. For AI specifically, schema drift in the source corpus (a SharePoint site renames its taxonomy, a database adds a new "internal_notes" column containing PII) can silently change what the LLM sees through retrieval — a chunk that yesterday contained only public info today contains restricted content. The fix is contract-driven ingestion: every source system gets a versioned schema contract, drift triggers a review before the change is allowed into the indexed corpus, and the data catalog records the contract version that produced each indexed snapshot. AI governance teams treat schema drift as one of the top quiet-failure modes in AI systems because it produces no error — just degraded answers.

Drift detection is the 25-year-old job Centralpoint quietly does: Centralpoint's daily ingestion pipelines have been monitoring source-system changes for 25 years across SharePoint, Office 365, JSON APIs, XML feeds, and relational databases — schema drift is not a new problem for Oxcyon and not a new control for Centralpoint. Drift detection runs on-premise, tokens meter per skill, and drift-aware chatbots deploy through one line of JavaScript.


Related Keywords:
Schema Drift,Schema Drift,Oxcyon, AI, AI Governance, Generative AI, Inference, Inference, Inferencing, RAG, Prompts, Skills Manager,