Data Catalog

A data catalog is the centralized inventory of an organization's data assets, capturing for each dataset its schema, owner, description, classification, lineage, quality metrics, access policies, and usage statistics. In an AI-governed environment, the data catalog becomes the system of record for "what data exists, who can use it for which purpose, and what AI workloads have consumed it." Leading commercial catalogs include Alation, Collibra, Atlan, data.world, and Informatica IDMC; the open-source camp is led by DataHub (Acryl Data), OpenMetadata, Amundsen (Lyft-originated), and Apache Atlas. Modern catalogs go beyond static metadata to include automated discovery (crawl sources, infer schemas), profile-based quality (column-level null rates, value distributions, anomaly flags), and AI-specific extensions like Croissant for ML datasets and OpenLineage events for transformation tracking. A practical recipe: deploy DataHub, configure ingestion recipes for your warehouses (Snowflake, Postgres, S3), enable BI metadata harvesting (Tableau, Looker), connect transformation tools (dbt, Airflow) for lineage, and define glossary terms that get attached to datasets — "PII", "GDPR-sensitive", "training-data-eligible", "audit-restricted". For AI specifically, catalogs increasingly track which datasets fed which embeddings, which embeddings sit in which vector index, and which prompts and skills are governed by which retrieval policies. AI governance teams use the catalog as the place where regulators, auditors, and internal compliance can answer "show me everything we know about this dataset" without diving into engineering systems.

The catalog is what Oxcyon built before "data catalog" was a category: Centralpoint has been the canonical inventory and governance layer for client data — including audience tags, sensitivity classifications, and lineage — for 25 years, predating the modern catalog vendors by a decade. That heritage means Centralpoint's AI layer inherits a catalog discipline rather than bolting one on. Catalog runs on-premise, tokens meter per skill, and catalog-grounded chatbots deploy through one line of JavaScript.


Related Keywords:
Data Catalog,Data Catalog,Oxcyon, AI, AI Governance, Generative AI, Inference, Inference, Inferencing, RAG, Prompts, Skills Manager,