The great wall of Deep Research: Tongyi Deep Research

Data Engineering

Date : 10/09/2025

Data Engineering

Date : 10/09/2025

The great wall of Deep Research: Tongyi Deep Research

This article details the process of onboarding and running Alibaba's open-source Tongyi Deep Research agent on the Databricks platform. It covers the use of Databricks Serverless GPU for efficient inference, the integration of external tools like Serper.dev and Jina.ai, and the implementation of MLflow for crucial observability and tracing.

Jason Yip

AUTHOR - FOLLOW
Jason Yip
Director of Data and AI, Tredence Inc.

The great wall of Deep Research: Tongyi Deep Research

Like the blog

Table of contents

The great wall of Deep Research: Tongyi Deep Research

The challenges
End to End Databricks workflow
Serverless GPU to the rescue
What we need
Observability
Conclusion

Like the blog

Table of contents

The great wall of Deep Research: Tongyi Deep Research

The challenges
End to End Databricks workflow
Serverless GPU to the rescue
What we need
Observability
Conclusion

The great wall of Deep Research: Tongyi Deep Research

It the past couple of posts, I have discussed some Deep Research techniques, including LangGraph’s Open Deep Research and doing Deep Research based on Databricks’ Genie. They are great initiatives but in this article we will do our boldest move yet, which is to onboard an Alibaba model to Databricks and run their open source research agent on Databricks. Let’s meet Tongyi Deep Research.

The challenges

Tongyi Deep Research is different than Open Deep Research, which contains only prompt and the workflow. Alibaba also released a model, it’s a 30B parameter model available on Hugging Face. The Tongyi model was trending #1 on Hugging Face when it was first released. And it’s even regarded as the “DeepSeek” moment for Deep Research. Despite that, it is not available on Qwen yet 🤔

End to End Databricks workflow

The challenge is how do I run everything on Databricks without paying for external hosting? Challenge accepted! Let’s first look at Tongyi’s architecture. The reason why I call this is a Great Wall is that Tongyi has this “heavy mode”, which can keep doing research until it is satisfied and the default setting from the code is 100 LLM calls! So it can easily run 50–60 iterations of research, with each iteration gathering more information.

Serverless GPU to the rescue

Databricks Serverless GPU is still in beta, but it has proven to solve one of the biggest headache, which is inferencing! The below code is a sample inference that we can run directly on Databricks Serverless GPU. The surprising factor is that it takes 1–2 minutes to download this model with GPU, but without GPU it will take hours.

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Alibaba-NLP/Tongyi-DeepResearch-30B-A3B")
model = AutoModelForCausalLM.from_pretrained("Alibaba-NLP/Tongyi-DeepResearch-30B-A3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
 messages,
 add_generation_prompt=True,
 tokenize=True,
 return_dict=True,
 return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

What we need

Now we have basic understanding on the actions required, we can start planning how to set this up on Databricks. Below is the minimum requirement:

A predict function based on the above code, which is a drop in replacement for the OpenAI-style research function
SERPER_KEY_ID: Get your key from Serper.dev for web search and Google Scholar
JINA_API_KEYS: Get your key from Jina.ai for web page reading
API_KEY/API_BASE: OpenAI-compatible API for page summarization from OpenAI. We will leverage an FMAPI from Databricks. Review the code in tool_visit.py to ensure you are passing in the right parameters

Observability

The problem with most of the open source projects is that it is very difficult to understand what’s going on behind the scenes. The below two steps are must in every project to add tracing easily without affecting the code function:

MLflow experiment

Just as we have migrated our data to Unity Catalog, we also need to ensure our MLflow in UC as well. The below code will become instrumental:

import mlflow # Storing artifacts in a volume requires MLflow 2.15.0 or above

EXP_NAME = "/Users/first.last@databricks.com/my_experiment_name"
CATALOG = "my_catalog"
SCHEMA = "my_schema"
VOLUME = "my_volume"
ARTIFACT_PATH = f"dbfs:/Volumes/{CATALOG}/{SCHEMA}/{VOLUME}" # can be a managed or external volume

mlflow.set_tracking_uri("databricks")
mlflow.set_registry_uri("databricks-uc")

if mlflow.get_experiment_by_name(EXP_NAME) is None:
    mlflow.create_experiment(name=EXP_NAME, artifact_location=ARTIFACT_PATH)
mlflow.set_experiment(EXP_NAME)

LLM decorators

Tacing is the most important part of the migration. Without that, the progress can easily be lost. Because in this case we are not using LangChain or LangGraph, we won’t be able to run autolog(). To customize the flow and messages, using the decorators is key to uncover insights.

Decorators & Fluent APIs (recommended) | Databricks on AWS
Decorators & Fluent APIs (recommended)docs.databricks.com

Tongyi Deep Research in action

Conclusion

In this article, we successfully tackles the complex challenge of bringing Alibaba’s state-of-the-art, open-source Tongyi Deep Research agent onto the Databricks platform. We’ve proven that by leveraging Serverless GPU — even in beta — we can efficiently handle the 30B parameter model, drastically cutting the download and inference time from hours to minutes.

The project delivers a fully integrated, self-contained research environment, including the necessary external web tools (Serper and Jina) and internal Databricks services (FMAPI). Most importantly, we’ve solved the “black box” problem inherent in the model’s heavy-mode, 100-call workflow. By diligently applying MLflow Experiments and custom LLM decorators instead of relying on standard python logging, we’ve established crucial observability.

Jason Yip

AUTHOR - FOLLOW
Jason Yip
Director of Data and AI, Tredence Inc.

Next Topic

AI Warehouse Optimization: A Supply Chain Leader’s Blueprint for Intelligent Fulfillment

Continue reading

Next Topic

AI Warehouse Optimization: A Supply Chain Leader’s Blueprint for Intelligent Fulfillment

Continue reading

our categories

Telecom, Media, Technology

Travel & Hospitality

Healthcare & Life Sciences

Banking & Financial Services

Ready to talk?

Join forces with our data science and AI leaders to navigate your toughest challenges.

recommended articles

Plug in now! Open Deep Research with Genie MCP!!

Blog

Plug in now! Open Deep Research with Genie MCP!!

Beyond Dashboards: How Unity Catalog Metrics Powers Trusted KPIs Everywhere

Blog

Beyond Dashboards: How Unity Catalog Metrics Powers Trusted KPIs Everywhere

Databricks Open Deep Research

Blog

Databricks Open Deep Research

×

Thank you for a like!

Stay informed and up-to-date with the most recent trends in data science and AI.

Share this article

×

Ready to talk?

Join forces with our data science and AI leaders to navigate your toughest challenges.