Lakeflow Declarative Pipelines: Deep Dive into the Evolution of Delta Live Tables (DLT)

Data Engineering

Date : 06/26/2025

Data Engineering

Date : 06/26/2025

Lakeflow Declarative Pipelines: Deep Dive into the Evolution of Delta Live Tables (DLT)

Discover Lakeflow Declarative Pipelines, Databricks' next-generation data pipeline platform. Learn how DLT evolved into this unified, declarative approach for scalable ETL, featuring built-in governance, observability, real-time ingestion with Zerobus, and AI-powered design.

Shakti Prasad Mohapatra

AUTHOR - FOLLOW
Shakti Prasad Mohapatra
Senior Solutions Architect

Lakeflow Declarative Pipelines: Deep Dive into the Evolution of Delta Live Tables (DLT)

Like the blog

Table of contents

Lakeflow Declarative Pipelines: Deep Dive into the Evolution of Delta Live Tables (DLT)

Lakeflow: Unified ETL for the AI Era
What Makes Declarative Pipelines Powerful?
Built-in Data Quality Expectations
Publish to Multiple Catalogs and Schemas
Row-Level Security (RLS) & Column Masking
Seamless Migration from Hive Metastore to Unity Catalog
Lakeflow Designer: No-Code AI-Powered Pipeline Builder
Zerobus: Real-Time Ingestion at Massive Scale
Summary: DLT is Now Lakeflow Declarative Pipelines
Conclusion

Like the blog

Table of contents

Lakeflow Declarative Pipelines: Deep Dive into the Evolution of Delta Live Tables (DLT)

Lakeflow: Unified ETL for the AI Era
What Makes Declarative Pipelines Powerful?
Built-in Data Quality Expectations
Publish to Multiple Catalogs and Schemas
Row-Level Security (RLS) & Column Masking
Seamless Migration from Hive Metastore to Unity Catalog
Lakeflow Designer: No-Code AI-Powered Pipeline Builder
Zerobus: Real-Time Ingestion at Massive Scale
Summary: DLT is Now Lakeflow Declarative Pipelines
Conclusion

Lakeflow Declarative Pipelines: Deep Dive into the Evolution of Delta Live Tables (DLT)

At the 2025 Databricks Data + AI Summit (DAIS), Delta Live Tables (DLT) was reintroduced under a broader umbrella as Lakeflow Declarative Pipelines — a unified, declarative approach to building data pipelines that empowers data engineers and SQL analysts to build scalable, production-grade ETL pipelines effortlessly.

DLT is no longer a standalone feature — it is now the declarative transformation layer at the heart of the Lakeflow platform.

Lakeflow: Unified ETL for the AI Era

Lakeflow is Databricks’ next-generation data pipeline platform and is composed of three key components:

Lakeflow Connect — High-throughput, low-latency data ingestion (e.g., Zerobus)
Lakeflow Declarative Pipelines — Formerly DLT, now the core engine for transformations
Lakeflow Jobs — Workflow orchestration and production management

Lakeflow Declarative Pipelines simplify how teams express data transformations using SQL, Python, or no-code with full support for streaming, batch, and incremental workloads

What Makes Declarative Pipelines Powerful?

1. Simplicity with Declarative Syntax

You write what you want, not how to do it.

CREATE OR REFRESH STREAMING TABLE cleaned_sales
SELECT * FROM raw_sales WHERE status = 'valid';

Lakeflow Declarative Pipelines automatically handles:

Dependency resolution
Execution order
Incremental computation
Change propagation

Unified Streaming & Batch

With a single codebase, you can run your pipeline in:

Continuous mode (streaming)
Triggered mode (batch)

This enables hybrid use cases like:

Real-time fraud detection
Daily revenue aggregation

When configuring pipelines for continuous mode, trigger intervals can be set to control how frequently the pipeline starts an update for each flow.

use pipelines.trigger.interval to control the trigger interval for a flow updating a table or an entire pipeline. Because a triggered pipeline processes each table once, the pipelines.trigger.interval is used only with continuous pipelines.

Databricks recommends setting pipelines.trigger.interval on individual tables because streaming and batch queries have different defaults. Set the value on a pipeline only when processing requires controlling updates for the entire pipeline graph.

import dltfrom pyspark.sql.functions import col@dlt.table(spark_conf={"pipelines.trigger.interval" : "10 seconds"})def my_table(): return spark.readStream.format("json").load("/path/to/input/data")

SET pipelines.trigger.interval=10 seconds;CREATE OR REFRESH STREAMING TABLE my_tableAS SELECT * FROM json.`/path/to/input/data`;

To set pipelines.trigger.interval on a pipeline, add it to the configuration object in the pipeline settings:

{ "configuration": { "pipelines.trigger.interval": "10 seconds" }}

Built-in Data Quality Expectations

Define expectations directly in your pipeline to enforce data quality rules:

CREATE OR REFRESH STREAMING TABLE valid_customers
(CONSTRAINT valid_email EXPECT (email IS NOT NULL))
AS SELECT * FROM raw_customers;

Failed records can be:

Quarantined
Dropped
Trigger pipeline failure (based on policy)

Below are different actions for expect.

warn (default)

SQL Syntax: EXPECT

Python Syntax: dlt.expect

Invalid records are written to the target.

SQL Syntax: EXPECT ... ON VIOLATION DROP ROW

Python Syntax: dlt.expect_or_drop

Invalid records are dropped before data is written to the target. The count of dropped records is logged alongside other dataset metrics.

SQL Syntax: EXPECT ... ON VIOLATION FAIL UPDATE

Python Syntax: dlt.expect_or_fail

Invalid records prevent the update from succeeding. Manual intervention is required before reprocessing. This expectation causes a failure of a single flow and does not cause other flows in your pipeline to fail.

Developers can also implement advanced logic to quarantine invalid records without failing or dropping data. See Quarantine invalid records.

Pipeline dashboards show expectation metrics for transparency.

Native Lineage and Observability

With Unity Catalog integration, every pipeline execution:

Captures full data lineage
Supports column-level tracking
Logs expectations and data quality metrics
Publishes query history for debugging and optimization

Event logs can now be written to Delta tables for SQL-based observability.

Autoscaling and Monitoring

Lakeflow Declarative Pipelines offers:

Auto-scaling compute resources
Real-time monitoring dashboards
Built-in metrics: throughput, data quality, error tracking

This leads to quicker troubleshooting and more stable production operations.

Enhanced Governance with Unity Catalog

Lakeflow Declarative Pipelines’s deep integration with Unity Catalog enhances governance, lineage, and multi-catalog operations.

Publish to Multiple Catalogs and Schemas

Consolidate Bronze, Silver, and Gold layers or even create aggregates of similar nature in a single pipeline while publishing to different catalogs and schemas:

This capability reduces operational complexity, lowers costs, and simplifies data management by allowing you to consolidate your medallion architecture (Bronze, Silver, Gold) into a single pipeline while maintaining organizational and governance best practices. Let’s take a look at an example below creating 2 aggregate tables of different grains in a single pipeline.

import dlt
from pyspark.sql.functions import current_timestamp

# Retrieve pipeline parameters
catalog = spark.conf.get("catalog")
schema = spark.conf.get("schema")
table1 = spark.conf.get("table1")
table2 = spark.conf.get("table2")

# Define the DLT table with fully qualified table name
@dlt.table(
name=f"{catalog}.{schema}.{table1}",
comment="Aggregated price by category"
)
def create_aggregated_table1():
return (
spark.table("items")
.groupBy("category")
.agg({"price": "sum"})
.withColumnRenamed("sum(price)", "agg_price")
.withColumn("created_ts", current_timestamp())
)
#Define 2nd DLT table with out any parameter
@dlt.table(
name=f"{catalog}.{schema}.{table2}",
comment="Aggregated price by brand"
)
def create_aggregated_table2():
return (
spark.table("items")
.groupBy("brand")
.agg({"price": "sum"})
.withColumnRenamed("sum(price)", "agg_price")
.withColumn("created_ts", current_timestamp())
)

Pipeline graph after successful run:

Row-Level Security (RLS) & Column Masking

Enforce governance by dynamically restricting access to rows and columns based on user roles:

Precision access control
Improved data security
Compliance with GDPR, HIPAA, etc.

Seamless Migration from Hive Metastore to Unity Catalog

Transitioning to Unity Catalog is simple and non-disruptive:

Clones existing pipeline configurations
Updates MVs and STs to UC
Resumes STs without data loss

Copy-less migration via API coming soon

Key Benefits

Seamless transition — Copies pipeline configurations and updates tables to align with UC requirements.
Minimal downtime — STs resume processing from their last state without manual intervention.
Enhanced governance — UC provides improved security, access control, and data lineage tracking.

Once migration is complete, both the original and new pipelines can run independently, allowing teams to validate UC adoption at their own pace. This is the best approach for migrating DLT pipelines today. While it does require data copy, later this year databricks plan to introduce an API for copy-less migration.

Write to Any Destination with `foreachBatch`

With foreachBatch, you can write streaming data to any batch-compatible sink, expanding integration options:

Use Cases:

MERGE INTO with Delta Lake

Writing to systems without native streaming support (e.g., Cassandra, Synapse Analytics)

Key Benefits:

Unrestricted sink support — Write streaming data to virtually any batch-compatible system, beyond just Kafka and Delta.
More flexible transformations — Use MERGE INTO and other batch operations that aren’t natively supported in streaming mode.
Multi-sink writes — Send processed data to multiple destinations, enabling broader downstream integrations.

Lakeflow Declarative Pipelines Observability Enhancements

Gain deeper operational insights with enhanced observability:

Query History — Debug queries, identify performance bottlenecks, and optimize pipeline runs.
Structured Event Logs — Stored in Delta tables via Unity Catalog
Run As Feature — Use service principals or app identities for executions
Pipeline Filtering — Filter by tags, identity, etc.

Table details

Query Performance

The event log can now be published to UC as a Delta table, providing a powerful way to monitor and debug pipelines with greater ease. By storing event data in a structured format, users can leverage SQL and other tools to analyze logs, track performance, and troubleshoot issues efficiently.

Run As in Lakeflow Declarative Pipelines pipeline allows users to execute the pipeline using service principals or application users.

Users can filter pipelines based on various criteria, including run as identities and tags. These filters enable more efficient pipeline management and tracking, ensuring that users can quickly find and manage the pipelines they are interested in.

Lakeflow Designer: No-Code AI-Powered Pipeline Builder

The new Lakeflow Designer is a visual ETL builder tailored for both business users and engineers:

Drag-and-drop canvas
Natural language interface
Outputs are Lakeflow Declarative Pipelines
Enables seamless collaboration
Direct integration with Unity Catalog & AI functions.3

Designer leverages data intelligence to guide users in building optimized, scalable, governed pipelines — without writing code.

This lets business analysts and data engineers co-build governed pipelines without handoffs.

Every visual pipeline in Designer is powered by the same Lakeflow Declarative Pipeline engine — ensuring scalability, governance, and consistency.

Zerobus: Real-Time Ingestion at Massive Scale

Zerobus, introduced at DAIS, is a Lakeflow Connect API for streaming external events to the Lakehouse with:

<5 seconds latency
100 MB/s throughput

Ideal for:

IoT streams
Clickstream ingestion
Message bus integrations

It delivers ingestion at scale without requiring external Kafka or Flink infrastructure.

Summary: DLT is Now Lakeflow Declarative Pipelines

Feature Description

Unified Interface: One pipeline engine for batch, streaming, SQL, Python, and Designer
Enterprise-Grade Governance: Powered by Unity Catalog with RLS, column masking, and lineage
Real-Time Ingestion : Via Zerobus and LakeFlow Connect
Built-in Observability: Logs, metrics, expectations, query history
AI-Native Experience: Via Lakeflow Designer with no-code and NL interface

Conclusion

The evolution of Lakeflow Declarative Pipelines represents a major leap forward in simplifying and scaling data pipelines on the Databricks platform. With the latest enhancements in governance, observability, ingestion, and AI tooling, Databricks is redefining how modern data teams build and operate trusted, real-time data pipelines.

Whether you’re a data engineer, analyst, or architect — Lakeflow Declarative Pipelines offer a unified, future-proof foundation for all your data transformation needs.

Shakti Prasad Mohapatra

AUTHOR - FOLLOW
Shakti Prasad Mohapatra
Senior Solutions Architect

Next Topic

5 Lessons from Tredence’s Rise to 3X Forrester Wave Leadership

Continue reading

Next Topic

5 Lessons from Tredence’s Rise to 3X Forrester Wave Leadership

Continue reading

our categories

Telecom, Media, Technology

Travel & Hospitality

Healthcare & Life Sciences

Banking & Financial Services

Ready to talk?

Join forces with our data science and AI leaders to navigate your toughest challenges.

recommended articles

No-Code, All Power: Multi-Agent Supervisor Joins the Agent Bricks Crew

Blog

No-Code, All Power: Multi-Agent Supervisor Joins the Agent Bricks Crew

How to scope a Unity Catalog (UC) Workflow Migration using System Tables

Blog

How to scope a Unity Catalog (UC) Workflow Migration using System Tables

Databricks Chess Apps

Blog

Databricks Chess Apps

×

Thank you for a like!

Stay informed and up-to-date with the most recent trends in data science and AI.

Share this article

×

Ready to talk?

Join forces with our data science and AI leaders to navigate your toughest challenges.