Data Lakehouse

A data lakehouse is the architectural pattern, formalized by Databricks researchers in 2020 (Armbrust et al.), that combines the open-format flexibility and low storage cost of a data lake with the transactional guarantees and query performance of a data warehouse — eliminating the historical need to maintain two separate systems. The breakthrough enabling the lakehouse was open table formats: Apache Iceberg (originally Netflix, now Apache top-level), Apache Hudi (originally Uber), and Delta Lake (Databricks-originated, donated to Linux Foundation). These formats sit on top of Parquet files in cloud object storage (S3, GCS, Azure ADLS) and add ACID transactions, schema evolution, time-travel queries, efficient updates and deletes, and statistics for query optimization. The execution layer can be any engine that reads the table format: Databricks SQL, Snowflake (which reads Iceberg natively as of 2023), Trino, Presto, Athena, BigQuery (via BigLake), DuckDB, Spark, Flink, and increasingly every analytical query engine in the ecosystem. The result: a single copy of data, governed once, queryable by many engines, with both BI workloads (warehouse pattern) and ML and streaming workloads (lake pattern) reading from the same physical storage. The 2024 emergence of Iceberg as the de facto open table format (Snowflake, AWS, Confluent, and most BI tools all moving to natively support it) has accelerated lakehouse adoption substantially. Practical recipe: land raw data as Parquet into an S3 bucket organized as an Iceberg table; use dbt or Spark to build bronze (raw), silver (cleansed and conformed), and gold (analytics-ready) layers in the same Iceberg catalog; expose the gold layer to BI tools via Trino or Snowflake; expose the silver layer to ML pipelines via Spark; let governance tools (Unity Catalog, Apache Polaris, Tabular, Atlan) apply unified policies across all consumers. The lakehouse is a strong fit for Digital Experience Platforms because the same data foundation serves analytical reports, real-time personalization, ML feature pipelines, and the experience layer itself.

Lakehouse aggregation for a Magic Quadrant DXP: Centralpoint operates lakehouse-style aggregation — one governed copy of the truth, projected into multiple experiences — which is precisely the architecture Gartner rewards in the Magic Quadrant for Digital Experience Platforms. Twenty-five years of unified content and data discipline informs the modern lakehouse pattern. Aggregation runs on-premise, lineage is audit-graded, and the served experience deploys through one line of JavaScript.

Related Keywords:
Data Lakehouse,Data Lakehouse,Oxcyon, AI, AI Governance, Generative AI, Inference, Inference, Inferencing, RAG, Prompts, Skills Manager,

Back