Unleash the Power of Dynamic Tables in Snowflake

Data Modeling

Date : 09/02/2024

Data Modeling

Date : 09/02/2024

Unleash the Power of Dynamic Tables in Snowflake

Learn how to automate data updates and streamline your data pipelines with Snowflake Dynamic Tables. Discover their benefits, use cases, and how to choose the right tool for your needs. Improve data freshness and optimize performance for faster, more reliable insights.

Asish Patro

AUTHOR - FOLLOW
Asish Patro
Senior Manager, Data Engineering

Unleash the Power of Dynamic Tables in Snowflake
Like the blog
Unleash the Power of Dynamic Tables in Snowflake

In today's data-driven world, having access to up-to-date information is crucial. However, manually refreshing tables can be time-consuming and error prone. Here comes dynamic tables: a powerful option that automates data updates, ensuring your analyses are always based on the latest information.

What are Dynamic Tables?

Think of dynamic tables as self-updating spreadsheets. You define a query that transforms data from one or more base tables, and a built-in refresh process takes care of the rest. Whenever the underlying data changes, the dynamic table automatically updates itself (Based on the target Lag), reflecting the latest information.

Example Analytics Use Case: -

A global retail company’s data analyst needs to calculate the Quarter-to-date order amount and the number of customers in each country for market research. The required data is loaded in Snowflake through multiple ETL (Extract Transform and Load) pipelines with different refresh frequencies. The analyst accessed the data via Role-based access control and developed a SQL query to get the desired aggregated data by joining different tables (i.e., customer, order, nation). Not to worry about CDC (Change Data Capture), pipeline building, and dependency management; the dynamic table has solved the problem.

 Dynamic Tables Snowflake

Dynamic Table (Syntax): - 

CREATE DYNAMIC TABLE REPORT.QUARTERLY_CUSTOMER_AND_ORDER_BY_COUNTRY 
     TARGET_LAG = 'DOWNSTREAM' -- Other options seconds | minutes | hours | days (Specifies the lag for the dynamic table)
    WAREHOUSE = SF_US_DE_WH_01
    INITIALIZE = ON_CREATE --Other option ON_SCHEDULE (Specifies the behavior of the initial refresh of the dynamic table)
     REFRESH_MODE = INCREMENTAL --Other options FULL or AUTO (Specifies the refresh mode for the dynamic table) 
    AS
select
    nat.N_NAME as Country_Name,
    YEAR(O_ORDERDATE) as Year,
    QUARTER(O_ORDERDATE) as Quarter,
    count (distinct C_NAME) as Number_Of_Unique_Customer,
    sum(ord.o_totalprice) as total_order_amount
FROM STN.STN_NATION nat
    inner join 
    STN.STN_CUSTOMER cus
    on cus.C_NATIONKEY=nat.N_NATIONKEY
    inner join 
    STN.STN_ORDERS ord 
    on cus.c_custkey = ord.o_custkey
group by 1, 2, 3 order by 1,2,3;

DAG View of Dynamic Tables

Dynamic tables can be chained together to create a Directed Acyclic Graph (DAG), allowing you to use the output of one table as the input for another. This enables you to build complex data transformations in a modular and efficient way.

DAG View of Dynamic Tables

In the given DAG, a new dynamic table (YTD_ORDER_BY_COUNTRY) with a Target Lag of 1 minute was created on top of the QUARTERLY_CUSTOMER_AND_ORDER_BY_COUNTRY Dynamic table. This allows Snowflake to detect changes in the underlying dynamic and base tables every minute.

The Refresh Process Explained

The magic behind dynamic tables lies in the automated refresh process. This process works in two key steps:

  1. Change Detection: Snowflake analyzes the dynamic table’s underlined query and creates a DAG based on dependencies. To identify incremental changes to the base tables, Snowflake creates Stream in the backend to capture RowID, insert/update/delete. (As this is just 3 metadata columns it is results in a small Storage cost).
  2. Merging Updates: Only the changes are incorporated into the dynamic table, not the entire dataset. This keeps the refresh process efficient and minimizes resource consumption.

dynamic tables lies in the automated refresh process

When we loaded new records to the order table for today's order date and refreshed the Dynamic table, it only added 24 new records (i.e. aggregated records for 24 countries).

When a source record was deleted for today’s order date and it only affected one record in the Dynamic table (Quarterly dataset), Snowflake deleted that record and added a new one.

Balancing Freshness and Cost

When setting up a dynamic table, you get to define the ideal "freshness" for your data, also known as the target lag. This tells the system how far behind the base tables the data in the dynamic table can be.

Here is the trade-off:

  • High Freshness (Low Target Lag): Get the most recent data, ideal for real-time analytics. This, however, requires more frequent refreshes and consumes more compute resources.
  • Lower Freshness (Higher Target Lag): Reduce refresh frequency and save on resources. This might be suitable for scenarios where near real-time data is not critical.

The key is to find the sweet spot between data freshness and cost-effectiveness. 

Here are different cost components to a Dynamic table: -

  • Compute Cost: -
    • Usage of the Virtual Warehouse defined in the table DDL
    • Cloud Services compute to identify changes in underlying base objects and whether the virtual warehouse needs to be invoked.
  • Storage cost: - Active, Time Travel, fail-safe storage, cloning, replication, and metadata to keep track of Source table record changes.

Use Cases: - 

  • The analyst wants to process data sourcing from various tables in Snowflake without depending on the Data Engineering team for help building ETL pipelines.
  • Continuously merge CDC data into target tables for batch and streaming.
  • Generating Slowly Changing Dimension tables without depending on stored procedures and ETLs (Extract Transform and Load).
  • Continuous data aggregation and transformation from multiple source tables.
  • Building functional subsets by filtering data from one table into separate tables on low latency.
  • Materializing results of complex views (SQLs on top of one or more tables) for improved performance

Advantages of Dynamic Table: -

  • Simplified data engineering: Reduces complexity and manual effort compared to traditional methods like streams and tasks.
  • Automated data freshness: Maintains desired data recency without manual intervention. Dynamic Tables adapt to changing latency needs. Easily adjust data freshness from hours to seconds with a simple ALTER statement.
  • Auto Incremental Update: Snowflake leverages Stream to track source table changes and performs incremental updates to the dynamic table.
  • Efficient query performance: Pre-computes complex queries for faster access and analysis.
  • Flexible data pipelines: Supports both batch and streaming data processing within a unified framework.
  • Reduced development time: Streamlines the ETL process and accelerates time-to-market for insights.
  • Improved data quality: Ensures data consistency and accuracy through automated updates.
  • Cost-effective: Optimizes resource utilization by automating data management tasks. 

How to choose between Dynamic Table Vs. Stream & Task Vs. Materialized View