Apache Iceberg

data

Open table format for huge analytic datasets. Won the format wars vs Delta and Hudi; even Databricks supports Iceberg natively now.

From Wikipedia

Apache Iceberg is a high-performance open-source format for large analytic tables. Iceberg enables the use of SQL tables for big data while making it possible for engines like Spark, Trino, Flink, Presto, Hive, Impala, and Pig to safely work with the same tables, at the same time. Iceberg is released under the Apache License. Iceberg addresses the performance and usability challenges of Apache Hive tables in large and demanding data lake environments. Iceberg was originally developed at Netflix in 2017 to overcome scalability and consistency limitations of Apache Hive tables, and was donated to the Apache Software Foundation in 2018. It graduated to a top-level Apache project in 2020. Vendors that support Apache Iceberg tables include Cloudera, IBM Watsonx, Oracle, Snowflake, Teradata, AWS, Google Cloud, and Databricks.

Read on Wikipedia ↗

Open source ↗

← #53 Apache Spark #55 dbt-core →