Plug in now! Open Deep Research with Genie MCP!!

Data Engineering

Date : 09/04/2025

Data Engineering

Date : 09/04/2025

Plug in now! Open Deep Research with Genie MCP!!

This post provides a step-by-step guide on how to integrate Databricks' Managed MCP Server with Open Deep Research. Learn how to connect these two powerful tools, overcome common challenges, and leverage them to perform targeted, in-depth research on your own enterprise data.

Jason Yip

AUTHOR - FOLLOW
Jason Yip
Director of Data and AI, Tredence Inc.

Plug in now! Open Deep Research with Genie MCP!!
Like the blog
Plug in now! Open Deep Research with Genie MCP!!

In my previous posts, I have discussed Databricks’ Managed MCP Servers as well as Open Deep Research on Databricks.

These are useful applications but questions remain:

  1. How does the Managed MCP Server work? Why should I care?
  2. Open Deep Research is hitting the web, can we use it on my data?

In this post, we will answer these questions but first let’s look at some partial output to see what we want to achieve:


Databricks Genie Deep Research

As seen from the screenshot above, the research isn’t generic. In fact, it specifically quoting 422,580 customer records. And it further states that the dataset came from unitygo.telebricks.churn_prediction_raw. These are clear evidence that we are working with a specific dataset instead of just some general opinions.

Let the journey begins

This post is intended to walkthrough the journey of how we can repurpose Open Deep Research into researching our dataset. As of early September 2025, these are still two separate components. In other words, Open Deep Research still doesn’t have access to any Enterprise dataset and Genie still can’t provide reports. As the world of AI changes incredibly fast, there is a chance when you are reading this post, you already have a dozen reports produced on your data in Genie.

We still can’t steer with English descriptions

Every time I open a vibe coding tool, I have the expectation that this time I can complete this coding project purely based on English. But unfortunately, both Genie and MCP are still very new and there isn’t so much documentation.

Tools we need:

This journey would not be possible without the below tools:

  1. My previous post on setting up Open Deep Research on Databricks 🥳
    As they say, you can’t jump without first learning how to walk, setting up the tool was absolutely necessary
  2. Github copilot 🤩
    I use it to understand the repo because vibe coding tools nowadays can be too “agentic” that it’d change the code without me asking
  3. Jules 👑
    Once I have good understanding of the current limitations, I can ask Jules to code it up
  4. MCP Inspector 😼
    It allows me to look at the input and output of the MCP Server and steer the architecture
  5. Coding skills 😎
    Vibe coding can give high level ideas, some of the very small nuances we still need to troubleshoot ourselves

Ideas 💡

My idea first came from David Huang’s blog and his diagram:


Diagram from David’s blog

However, I only want it to look like below, with Genie being one of the tools:

A diagram of a research process

AI-generated content may be incorrect.

Open Deep Research architecture

Challenges

There are a number of challenges and uncertainties, I will list them below and hopefully it will help you to create your own Deep Research agent:

  1. How do we connect to Genie MCP Server? There is limited documentation that can be found on Databricks website, but it is far from being enough. The solution is to test thoroughly using MCP Inspector.
  2. How do we connect Genie MCP tool to Open Deep Research? Surprisingly, it’s not as easy as it sounds because as it turns out the “newest” implementation does not support multiple MCP Servers despite using MultiServerMCPClient. Confused? Talk to Github copilot to find out why!
  3. Where do I provide our PAT token to Open Deep Research. It turns out once again despite having the auth_required flag, we CANNOT supply a PAT token. Super confused? Me too!
  4. Tool is connected but how do I use it? Without hard-wiring the tool into the research prompt inside Open Deep Research, it turns out we need to be very specific about our ask: Use the data provided in the MCP tool ‘query_space_01f08636c357189e81d584d6bbfef815’
  5. Token limit reached?! Sometimes error messages can be confusing. As it turns out if our research question is too open, Genie will give a dataset, which will result being too long. To verify that, simply take the output from MCP Inspector and paste into ChatGPT.

A screenshot of a computer

AI-generated content may be incorrect.

Genie response is too long

The final solution

Congratulations if you have made it thus far and have connected Genie to Open Deep Research. Now we need to construct the right prompt so Open Deep Research can do its job.

Use statistical analysis and the data provided in the MCP tool 'query_space_01f08636c357189e81d584d6bbfef815' to determine what are the factors that customers are churning in the telecom dataset provided?

The prompt is a bit awkward. However there are a few things that’s very important:

  1. statistical analysis — we want Genie to give condense output but we don’t want to have custom code to handle the output. This is key to getting a short output because our goal is not to look at individual record
  2. Genie space tool name — experiments show that there really isn’t any guarantee the agent will check the tool, we can try to hint but being specific will do the guarantee, so why not?
  3. re-emphasis the ask in the end — I provided a bit more clarification on the ask and suggest the report will need to be based on the provided data, so it won’t just go into the world wild web and search. No shortcuts please!

Now from the traces, we can see that the Genie space got hit multiple times, confirming that our research is working:


Open Deep Agent traces in Databricks

Conclusion

With Open Deep Research now supports multiple MCP tools, we can certain plug in a lot of useful tools to enhance our research power!

And the full report can be found below:

Access the report

Jason Yip

AUTHOR - FOLLOW
Jason Yip
Director of Data and AI, Tredence Inc.


Next Topic

Enhancing Healthcare Supply Chain Data Quality: A 3-Phase Approach to Building Trust and Wellbeing



Next Topic

Enhancing Healthcare Supply Chain Data Quality: A 3-Phase Approach to Building Trust and Wellbeing


Ready to talk?

Join forces with our data science and AI leaders to navigate your toughest challenges.

×
Thank you for a like!

Stay informed and up-to-date with the most recent trends in data science and AI.

Share this article
×

Ready to talk?

Join forces with our data science and AI leaders to navigate your toughest challenges.