Dremio’s open lakehouse now supports SQL DML and DDL operations on Apache Iceberg
Dremio has unveiled its support for DML operations (insert, update & delete) on Apache Iceberg tables and for time travel for in-place querying of historical data. These features enable key data lakehouse use cases that were previously only available in database and data warehousing technologies.
Apache Iceberg, an open-source table format for analytics on the lakehouse, is a core component of Dremio’s open lakehouse and is used at companies like Netflix, Apple and Adobe. Iceberg’s major 1.0 release, combined with Dremio’s new GA capabilities, enables customers to build enterprise-grade data lakehouses with full data warehousing capabilities.
Apache Iceberg 1.0 means an enterprise-ready open lakehouse
As an open-source table format that’s vendor agnostic, Apache Iceberg allows businesses to run more workloads, with more flexibility, on the data lake. Iceberg unlocks new functionality such as data mutations, data versioning, schema evolution, transparent partitioning and scalable metadata.
The expanding adoption of the Apache Iceberg project, and broad contributions to it from AWS, Apple, Netflix, Dremio, Snowflake and others, demonstrates that Iceberg is ushering in a new era for data lakes.
With its milestone 1.0 release, Apache Iceberg introduces substantial performance and usability improvements, including features such as a long-term API, high-performance updates and deletes (merge-on-read), multi-dimensional sorting (Z-order), statistics, and numerous other capabilities.
“All of Apache Iceberg’s new functionality means there’s never been a better time to adopt it to build your data lakehouse,” said Mark Lyons, vice president of product management at Dremio.
“With the broadest ecosystem of community contributions and deployment, Iceberg is the fastest growing table format and the industry standard for managing data in data lakes. It’s essential to the foundation of an open lakehouse, and Dremio has been in step with Iceberg from the start.”, Lyons continued.
DML and time travel support for Apache Iceberg
With GA support for DML operations and time travel, powered by Apache Iceberg, Dremio provides functionality previously only found in database and data warehouse technologies. These new Dremio capabilities support use cases like deletes for privacy and compliance, updates for customer information changes, and inserts for late-arriving supply chain records directly in the data lakehouse.
With these capabilities, companies can leverage their data lakehouse for workloads that previously required complex ETL pipelines.
“Until recently, using robust DML operations and accessing historical data within any defined period were only available in data warehouses and other databases,” explained Lyons.
“Now it’s easier than ever to put aside expensive and proprietary cloud data warehouses and run workloads on an open lakehouse, with the full power of SQL at your fingertips and without the need to copy your data into a closed proprietary system. Data mutations and leveraging historical snapshots are possible directly on the data lake. The result is lower costs, more flexibility, significantly reduced time-to-insight and increased productivity and innovation for data engineers and business analysts—without vendor lock-in.”, Lyons added.
“Fivetran is excited about Dremio’s recent release that enables customers to leverage the features of Apache Iceberg 1.0.,” said Fraser Harris, vice president of product at Fivetran.
“We are impressed by the broad ecosystem adoption and performance that Iceberg offers. For customers who desire the open architecture approach, Fivetran looks forward to providing automated and reliable pipelines to open data lakehouses built on Apache Iceberg tables as an alternative to data warehouses.”, Harris continued.
Innovation on the open lakehouse
When open data lakehouses are the architecture of choice for enterprises, they decrease data movement and copying, and in turn, decrease complexity and cost, while still offering full and direct access to petabyte data sets.
Dremio is committed to creating an open, independent data tier that enables a major paradigm shift in data analytics.
“Along with general availability of DML operations on Iceberg tables, we’re excited to announce new GA features on Dremio’s platform that accelerate the transition from data warehouse to data lakehouse and accelerate enterprise adoption,” said Lyons.
Those features include:
- Native row and column role-based access policies;
- SQL User Defined Functions (UDFs);
- A new SQL IDE with autocomplete and multi-statement support;
- New Azure data sources; and
- BI integration updates including Tableau SSO and Power BI Azure Active Directory.
“We were drawn to Dremio Cloud for its performance at scale and for the ability of the semantic layer to provide easy, efficient access to our data in Amazon S3,” said Angelo Slawik, data engineer at Moonfare.
“After our initial implementation is complete, we are eager to explore capabilities enabled by Dremio Arctic such as Git-like version control for our datasets.”, Slawik continued.
Moonfare adopted Dremio Cloud on AWS to enable interactive analytics and dashboards for all of its employees.