Exploring Databricks Serverless SQL and Understanding the Benefits of Transitioning

Databricks

Date : 05/14/2024

Databricks

Date : 05/14/2024

Exploring Databricks Serverless SQL and Understanding the Benefits of Transitioning

Learn how Databricks Serverless SQL boosts performance, cuts costs, and simplifies analytics with auto-scaling, workload isolation, and zero management overhead.

Srinivasan Shanmugasundaram

AUTHOR - FOLLOW
Srinivasan Shanmugasundaram
Senior Manager, Data Engineering, Tredence Inc.

Exploring Databricks Serverless SQL

Like the blog

Table of contents

Exploring Databricks Serverless SQL and Understanding the Benefits of Transitioning

Serverless Compute Compared with Classic Compute
What are the primary hurdles we often encounter with the classic compute plane?
Why should we consider transitioning to serverless compute?
What are the essential requirements for implementing Serverless SQL?
Guidelines to Implement Serverless SQL
Conclusion

Like the blog

Table of contents

Exploring Databricks Serverless SQL and Understanding the Benefits of Transitioning

Serverless Compute Compared with Classic Compute
What are the primary hurdles we often encounter with the classic compute plane?
Why should we consider transitioning to serverless compute?
What are the essential requirements for implementing Serverless SQL?
Guidelines to Implement Serverless SQL
Conclusion

Exploring Databricks Serverless SQL

Databricks Serverless SQL is a robust, fully managed platform capable of powering next-generation analytics and data applications. It provides a unified platform, especially for collaborating with data analysts and data scientists.

Before delving into the serverless architecture, let’s first understand the control pane and compute pane, which is the operational foundation of Databricks.

Control Pane: This encompasses all the backend services managed by Databricks within its Azure subscription. Additionally, it stores Databricks SQL queries, notebooks, and various other workspace configurations, all of which are encrypted at rest to enhance security.
Compute Pane: It contains the various clusters of compute resources available to process the data.

Serverless Compute Compared with Classic Compute

serverless and classic compute planes

The above image highlights the workings of the Serverless and Classic compute planes.

Features	Serverless compute plane	Classic compute plane
Location of Compute Resources	The compute layer stays within the Databricks account	The compute layer stays within the customer’s Azure
Public IP address for nodes	No	No, if enabled with secure cluster connectivity otherwise there is public IP for each VM
Unallocated VMs costs are covered by	Databricks	Customer pays Azure
Data Protection	Operates within network boundary of workspace, bolstered by extra layers of security to ensure isolation	Natural isolation as compute stays in customer environment

What are the primary hurdles we often encounter with the classic compute plane?

Reduced Productivity: Analysts, data engineers, and data scientists are unproductive because they must wait for the cluster start-up.
Always-on: Customers often maintain the warehouse as active with a fixed cluster to avoid start-up delays. However, this approach is inefficient and costly.
Limited Scalability: Fixed Cluster sizes pose a bottleneck as data volume and processing needs grow. Dynamic scaling can be challenging and may lead to over or under-provisioning.
Infrastructure Overhead: Require platform administrators to provision, configure, and manage clusters. However, this process can be time-consuming and error-prone, mainly due to the diverse user requirements and fluctuating workloads.
Added Cost: The warehouse remains operational with fixed clusters even during low activity or no usage, leading to cost inefficiency.
Resource Contention: Sharing the same cluster among multiple users or applications can lead to resource contention, resulting in poor performance or longer query execution time.

Why should we consider transitioning to serverless compute?

Increased Productivity: Serverless SQL provides on-demand instant compute to perform SQL workloads. This approach is efficient as there is no wait time for cluster start-up.
Improved Performance: Serverless compute incorporates optimization techniques like query caching, automatic indexing, and adaptive query execution, ultimately reducing latency and improving productivity.
Auto Scaling: In serverless, scaling is automatic based on the workload. Customers need not be concerned about over- or under-provisioning compute resources.
Managed Infrastructure: Databricks fully manages the infrastructure, handling tasks such as provisioning, scaling clusters, performance monitoring, and ensuring high availability.
Reduced Cost: Customer only pay for the compute resources utilized in their workload. This approach is cost-efficient as running an idle cluster is unnecessary.

What are the essential requirements for implementing Serverless SQL?

Databricks workspace must be the premium plan.
The workspace must be in the supported region (Azure Databricks regions).
If Azure storage data is used as a source and is protected by Azure Storage firewall, then additional configuration should be set to permit access from a Serverless compute node.

Guidelines to Implement Serverless SQL

Step 1: Create a Serverless warehouse

Login to the workspace and click the SQL Warehouse option in the sidebar.

SQL warehouse

Click Create SQL Warehouse and complete the following details.
- Name
- Cluster Size
- Auto Stop Time Limit
- Scaling Option
- Type Serverless

new sql warehouse

The advanced option is optional with details such as tags and unity catalog (if enabled in the workspace).
Provide access to the serverless SQL
- Search for the AD group or the user and add
- Select the level of permission required (can use and can manage)

sql manage permissions

The successful creation of a Serverless SQL workspace will look as follows:

serverless sql workspace

Step 2: Use the Serverless option for data analysis

Click on the SQL Editor option in the sidebar
In the New query window, select the serverless compute (users can see other options as well based on the permission)

SQL Editor option

Analysts can start running their queries, and the access is instantaneous.

Step 3: Use Serverless SQL in Databricks Workflows

Click on the Workflows in the sidebar.

workflows in the sidebar

Click on Create Job and complete the following details
- Task Name
- Type – (Notebook, SQL, Python Script, etc.)
  - Based on the type, options vary
  - To start with, I have selected SQL as the type
- SQL task – (Query, Dashboard, Alert, File)
- SQL query
- SQL warehouse
- Other optional things are – Parameters, Notifications, Retries, and Duration threshold.

serverless test

Serverless SQL does come with the following limitations:

Policies: There is no support for establishing cluster policies or spot instance policies.
Metastores: Using external hive metastores is not supported.
Pubic IP: Serverless SQL has no public IP address and does not require or use secure cluster connectivity.
Disk Storage: Disks in serverless SQL are short-lived and do not require customer-managed keys.

Conclusion

Databricks Serverless SQL provides an efficient, scalable, cost-effective, and reliable platform to perform data analysis and handle analytical workloads. By eliminating the need to manage infrastructure coupled with its robust security and governance features, it stands out as a compelling choice for data teams. The serverless plane outperforms the classic plane in various aspects, making it a desirable option for modern analytics.

Srinivasan Shanmugasundaram

AUTHOR - FOLLOW
Srinivasan Shanmugasundaram
Senior Manager, Data Engineering, Tredence Inc.

Topic Tags

Databricks

Serverless SQL

Next Topic

Migrating to Serverless SQL with DBT

Continue reading

Next Topic

Migrating to Serverless SQL with DBT

Continue reading

our categories

Telecom, Media, Technology

Travel & Hospitality

Healthcare & Life Sciences

Banking & Financial Services

Ready to talk?

Join forces with our data science and AI leaders to navigate your toughest challenges.

recommended articles

Agentic Chess against Stockfish (1-0)

Blog

Agentic Chess against Stockfish (1-0)

Mastering Databricks Apps: From Creation to Deployment of Interactive Data Applications

Blog

Mastering Databricks Apps: From Creation to Deployment of Interactive Data Applications

Architecting AI Agents with Databricks: From Vector Search to Foundation Models

Blog

Architecting AI Agents with Databricks: From Vector Search to Foundation Models

×

Thank you for a like!

Stay informed and up-to-date with the most recent trends in data science and AI.

Share this article

×

Ready to talk?

Join forces with our data science and AI leaders to navigate your toughest challenges.