In the rapidly evolving landscape of Databricks data platforms, the distinction between data engineering and application development is becoming increasingly blurred. Recently, we undertook a compelling Proof of Concept (POC) for a client. Their requirement was deceptively simple yet architecturally significant: they needed a web-based UI that could visualize the datasets and accept user inputs to update the backend.
The client was at a crossroads, evaluating two distinct paths:
- The Traditional Route: Developing a custom UI using frameworks like React/Node.js, connecting to backend APIs.
- The Databricks Native Route: Leveraging the newly evolved Databricks Apps to build a data-centric application directly on the Lakehouse.
At Tredence, with deep experience in data architecture and application development, we chose to evaluate the latter to stress-test its capabilities, limitations, and architectural nuances. Here is a deep dive into our journey, the hurdles we overcame, and our roadmap for the future
Image 1: Roadmap
Phase 1: The Quick Win - Streamlit on Databricks Apps
Our first objective was speed-to-value. We needed to demonstrate that we could ingest data and rapidly spin up a functional UI.
- The Architecture: We ingested a 5 million volume dataset into a Lakebase (the storage layer for Databricks Apps). On top of this, we deployed a Databricks App using a Streamlit template.
- The Result: Within a short sprint, we had a functional dashboard. It was responsive and interacted directly with the data in the Lakebase. This validated the core premise: you can build a data application without managing separate infrastructure for the frontend and the API layer.
Image 1.1 Representation of 5 million records developed using Databricks apps and its exact replication of traditional web UI
Phase 2: The Architectural Shift - Syncing from the "Golden Layer"
In a real-world production environment, data doesn't just live in a single location. The client had a medallion architecture, with a "Golden Layer" of curated, production-grade data residing in the Lakehouse (specifically, in a Catalog).
The challenge was clear: How do we make this high-quality golden data available to our new Databricks App without duplicating ETL logic?
Our solution was to use Sync Tables.
- The Concept: We created a sync from the Lakehouse table (the source of truth) to the Lakebase instance.
- The Mechanism: This created a pointer or a synchronized copy within Lakebase. The beauty here is that the data physically resides in the Lakehouse and runs on the Lakehouse engine, but it appears as a first-class citizen within the Lakebase environment.
- The Outcome: This pattern allowed us to build the same dashboard on the production-grade data while keeping the application layer decoupled from the core transformation logic. It provided a clean separation of concerns: the Lakehouse remains the system of record, and Lakebase becomes the system of engagement.
Image 1.2: Databricks UI of sync table
Phase 3: Adding Intelligence - Integrating Databricks Genie
A dashboard is great, but conversational AI is better. To elevate the POC, we integrated Databricks Genie—a chatbot that allows users to ask questions about the data in natural language.
- The Integration: We exposed the Genie app directly into the UI.
- The Experience: This turned the application from a passive reporting tool into an interactive data companion. Users could now look at a chart and ask, "What were the trends last month?" without needing to know SQL.
The Devil in the Details: Challenges Faced (and Lessons Learned)
The solution surfaced with a few valuable lessons that helped shape the next steps. We encountered two critical challenges that are worth understanding for anyone considering this architecture.
Pic 1.3: Genie view
Challenge 1: The "Two-Level" Permission Paradox
This was our most significant roadblock.
- The Scenario: When you create a Databricks App and attach a resource (like a Lakebase instance), you grant the app’s service principal permission to access that instance.
- The Issue: Access to the instance is not the same as read permissions on the underlying tables.
- The Symptom: Despite successfully connecting to the instance, our app code (and the Genie integration) returned empty results or permission errors. The app could "see" the database but couldn't "read" the data.
- The Solution: We had to manually grant explicit SELECT permissions on the specific tables to the app's service principal.
- The Takeaway: When architecting these solutions, table-level ACLs (Access Control Lists) must be part of your automated deployment script or Terraform plan. Relying solely on instance-level permissions will lead to runtime failures.
Challenge 2: The Bloat Factor - Compute Isn't Just for Queries
As we integrated more features (rich Streamlit visualizations + Genie), we noticed significant performance degradation.
- The Symptom: The app became sluggish; querying and rendering slowed down.
- The Investigation: The initial assumption was that the data volume (5 million) was too high. It wasn't. The issue was the package weight and the compute allocated to the app itself.
- The Resolution:
- Optimization: We had to scale back on overly complex visualization libraries within the same process.
- Scaling: We realized that the Databricks App is not "serverless" in the sense of infinite scaling. It runs on a configurable compute instance (e.g., Medium with 2 CPUs). Just like any application server, if you increase the complexity of the code (heavy Python packages) or the number of concurrent features, you must scale the app's compute resources.
- The Lesson: Treat the Databricks App as a lightweight application server. Monitor its resource consumption just as you would a Node.js or Spring Boot service.
The Road Ahead: Closing the Loop with Writes
Currently, our POC is read-heavy. However, Lakebase is designed for both reads and writes. Our next phase is to tackle the Reverse Sync.
- The Use Case: The client needs to input data via the UI (e.g., updating metadata, submitting new entries).
- The Architecture Plan:
- The user input will hit the Databricks App.
- The app will write this data to the Lakebase table.
- The Challenge: Downstream systems (like Power BI) depend on the Lakehouse (the Golden Layer).
- The Solution: We plan to implement a Reverse Sync - a process that takes the user input from Lakebase and propagates it back to the Lakehouse table.
- The user input will hit the Databricks App.
Example:
Imagine a scenario where business users update product metadata (e.g., correcting product descriptions or adding new attributes) directly through the Databricks App UI. These updates are first written into the Lakebase table. Without Reverse Sync, downstream systems like Power BI would continue showing outdated product information because they rely on the Lakehouse Golden Layer. By implementing Reverse Sync, the updated metadata flows back into the Lakehouse, ensuring that Power BI dashboards immediately reflect the latest product details.
This makes the benefit tangible: user-driven changes in the application layer seamlessly propagate to the analytical layer, closing the loop between operational input and reporting accuracy.
The pattern of bi-directional sync is critical for building true transactional applications on Lakehouse. It ensures that operational data entered by users is immediately available for analytical workloads, closing the loop between action and insight.
Conclusion:
- Choose Databricks Apps if: Your application is data-first. If the primary value is visualizing, exploring, and lightly interacting with data that already lives in the Databricks ecosystem, this is a game changer. The tight integration with Genie and Unity Catalog is unparalleled. Lakebase is nice for capturing what-if scenarios inputs from users and handling concurrent write requests. Gold Layer in Lakehouse with serverless SQL Warehouse offers faster read performance for large datasets. Choosing a hybrid approach with Data present from Lakehouse and Lakebase based on the volume is better for meeting millisecond SLAs.
- Choose Traditional Dev if: Your application requires complex state management, highly customized user authentication flows beyond simple SSO, or heavy frontend-backend logic that doesn't revolve around data transformation.
From Tredence’s perspective, it was a resounding success in understanding the new paradigm of the Data Application. The future isn't just about storing data in a Lakehouse; it's about building applications on top of it that are intelligent, responsive, and deeply integrated.
Next Topic
The Hidden Influencer Problem: Using Knowledge Graphs on Snowflake to Uncover Network-Driven Revenue
Next Topic







