Exploring Databricks Serverless SQL and Understanding the Benefits of Transitioning

Databricks

Date : 05/14/2024

Databricks

Date : 05/14/2024

Exploring Databricks Serverless SQL and Understanding the Benefits of Transitioning

Discover the advantages of transitioning to Databricks Serverless SQL for enhanced analytics. Learn how serverless computing boosts productivity and reduces costs effectively.

Srinivasan Shanmugasundaram

AUTHOR - FOLLOW
Srinivasan Shanmugasundaram
Senior Manager, Data Engineering, Tredence Inc.

Exploring Databricks Serverless SQL and Understanding the Benefits of Transitioning
Like the blog
Exploring Databricks Serverless SQL and Understanding the Benefits of Transitioning

Databricks Serverless SQL is a robust, fully managed platform capable of powering next-generation analytics and data applications. It provides a unified platform, especially for collaborating with data analysts and data scientists.

Before delving into the serverless architecture, let’s first understand the control pane and compute pane, which is the operational foundation of Databricks.

  • Control Pane: This encompasses all the backend services managed by Databricks within its Azure subscription. Additionally, it stores Databricks SQL queries, notebooks, and various other workspace configurations, all of which are encrypted at rest to enhance security.
  • Compute Pane: It contains the various clusters of compute resources available to process the data.

Serverless Compute Compared with Classic Compute

The above image highlights the workings of the Serverless and Classic compute planes. 

Features Serverless compute plane Classic compute plane
Location of Compute Resources The compute layer stays within the Databricks account

The compute layer stays within the customer’s Azure

Public IP address for nodes No No, if enabled with secure cluster connectivity otherwise there is public IP for each VM
Unallocated VMs costs are covered by Databricks Customer pays Azure
Data Protection Operates within network boundary of workspace, bolstered by extra layers of security to ensure isolation Natural isolation as compute stays in customer environment

What are the primary hurdles we often encounter with the classic compute plane?

  • Reduced Productivity: Analysts, data engineers, and data scientists are unproductive because they must wait for the cluster start-up.
  • Always-on: Customers often maintain the warehouse as active with a fixed cluster to avoid start-up delays. However, this approach is inefficient and costly.
  • Limited Scalability: Fixed Cluster sizes pose a bottleneck as data volume and processing needs grow. Dynamic scaling can be challenging and may lead to over or under-provisioning.
  • Infrastructure Overhead: Require platform administrators to provision, configure, and manage clusters. However, this process can be time-consuming and error-prone, mainly due to the diverse user requirements and fluctuating workloads.
  • Added Cost: The warehouse remains operational with fixed clusters even during low activity or no usage, leading to cost inefficiency.
  • Resource Contention: Sharing the same cluster among multiple users or applications can lead to resource contention, resulting in poor performance or longer query execution time.

Why should we consider transitioning to serverless compute?

  • Increased Productivity: Serverless SQL provides on-demand instant compute to perform SQL workloads. This approach is efficient as there is no wait time for cluster start-up.
  • Improved Performance: Serverless compute incorporates optimization techniques like query caching, automatic indexing, and adaptive query execution, ultimately reducing latency and improving productivity.
  • Auto Scaling: In serverless, scaling is automatic based on the workload. Customers need not be concerned about over- or under-provisioning compute resources.
  • Managed Infrastructure: Databricks fully manages the infrastructure, handling tasks such as provisioning, scaling clusters, performance monitoring, and ensuring high availability.
  • Reduced Cost: Customer only pay for the compute resources utilized in their workload. This approach is cost-efficient as running an idle cluster is unnecessary.

What are the essential requirements for implementing Serverless SQL?

  • Databricks workspace must be the premium plan.
  • The workspace must be in the supported region (Azure Databricks regions).
  • If Azure storage data is used as a source and is protected by Azure Storage firewall, then additional configuration should be set to permit access from a Serverless compute node.

Guidelines to Implement Serverless SQL

Step 1: Create a Serverless warehouse

  • Login to the workspace and click the SQL Warehouse option in the sidebar.

  • Click Create SQL Warehouse and complete the following details.
    • Name
    • Cluster Size
    • Auto Stop Time Limit
    • Scaling Option
    • Type Serverless 

  • The advanced option is optional with details such as tags and unity catalog (if enabled in the workspace).
  • Provide access to the serverless SQL
    • Search for the AD group or the user and add
    • Select the level of permission required (can use and can manage)

  • The successful creation of a Serverless SQL workspace will look as follows:

Step 2: Use the Serverless option for data analysis

  • Click on the SQL Editor option in the sidebar
  • In the New query window, select the serverless compute (users can see other options as well based on the permission)

  • Analysts can start running their queries, and the access is instantaneous.

Step 3: Use Serverless SQL in Databricks Workflows

  • Click on the Workflows in the sidebar.

  • Click on Create Job and complete the following details
    • Task Name
    • Type – (Notebook, SQL, Python Script, etc.)
      • Based on the type, options vary
      • To start with, I have selected SQL as the type
    • SQL task – (Query, Dashboard, Alert, File)
    • SQL query
    • SQL warehouse
    • Other optional things are – Parameters, Notifications, Retries, and Duration threshold.