Game Arena — DeepMind on Databricks

Data Engineering

Date : 08/18/2025

Data Engineering

Date : 08/18/2025

Game Arena — DeepMind on Databricks

Explore how Google DeepMind and Databricks partnered to create the Kaggle Game Arena, a new platform for AI models to compete. Learn how Databricks’ features, like MLflow and AI Gateway, were used to manage tournaments and analyze AI thinking in this groundbreaking AI and chess collaboration.

Jason Yip

AUTHOR - FOLLOW
Jason Yip
Director of Data and AI, Tredence Inc.

Like the blog

OpenAI released GPT5 to public during the first week of August 2025, that was one of the biggest AI news. On the other hand, in the chess world, Google quietly partnered with the world chess champion Magnus Carlsen to launch Kaggle Game Arena, a new benchmarking platform where AI models and agents compete head-to-head in a variety of strategic games to help chart new frontiers for trustworthy AI evaluation. The code is also open sourced here: https://github.com/google-deepmind/game_arena/


Kaggle Game Arena Launch

Onboarding Game Arena to Databricks

While Google released the source code but it is not always easy to manage the infrastructure to run the tournaments and then review not only the results but also the AI thinking behind the scenes, not to mention a way to manage various AI models.

Fortunately, Databricks has all the components readily available:

  • Mosaic AI Gateway — allows us to register external models and manage performance as well as API keys
  • Foundation Model API — it comes with Llama, Claude and gpt-oss models out of the box
  • Managed MLflow — experiments tracking and agents tracing
  • Databricks Apps — securely host the interface for Game Arena
  • Databricks SQL — for post-game analysis

Vibe coding with GPT5 in Github Copilot

With so much hype on GPT5, we will see if it lives up to the expectation. We will start with the following prompts:

Can you migrate the code so that it uses Databricks’ features

  1. Foundation model API if available
  2. AI Gateway that encapsulates external models
  3. MLflow tracing: https://mlflow.org/docs/latest/genai/tracing/ for agent chain of thoughts, pls create one experiment per game, aka no autolog()
  4. chess move logging in databricks sql, along with the mlflow experiment id and a unique gameid, format is one game per PGN per row
  5. create a databricks app to host game areana (https://docs.databricks.com/aws/en/dev-tools/databricks-apps/get-started)
  6. create a UI wrapper on top of game areana for parameters selection
  7. Allow tournament setup in UI
  8. Allow live boardcast in lichess: https://github.com/lichess-org/broadcaster
  9. For each round, display a game bracket in the UI: https://www.kaggle.com/benchmarks/kaggle/chess-text/tournament
  10. Update ReadMe for Databricks Apps deployment instructions

We still need to understand code and Databricks

I’d say GPT created 50% of the code in no time, which was a great start but the remaining requires expertise in Databricks to make it happen.

We will start by examine what GPT5 did correctly:

  1. It understands Game Arena and able to write a wrapper
  2. It created a very barebone UI that works but not impressive
  3. It understands how to do MLflow and got the code setup correctly
  4. It updated the documentation for me, despite it follows the favor of GameArena (we will host it on Databricks Apps, so it’s a little different)

Now where do we go from here?

It was a good start because very quickly I understand what needs to be changed in the codebase, so I don’t need to spend a lot of time to understand Game Arena. Unfortunately, I can only dream of driving this from start to end with English. Perhaps GPT5 still needs to read some Databricks books to understand how things are done.

Below are things that I had to teach GPT5:

  1. We need to prioritize the Foundation Model API endpoints and use Databricks SDK to query them.
  2. We need to create a special class to in Game Arena (model_generation_sdk) to handle all Databricks requests
  3. Logging the thoughts into managed MLflow is the most tricky one because it involves client side and server side both. We need very comprehensive understanding of auto vs manual tracing. In this case, I opted to use the decorator approach.
  4. GPT5 does not understand the folder structure and the app.yml file for Databricks Apps. We need to provide documentations.

Finally, after some back and forth with AI, we can capture the Game Arena output in Databricks:


The results align with the recordings from Game Arena:


Finally, we can also replay the game in our Databricks Chess App! The possibilities are endless!

The source code can be found on GitHub:

https://github.com/rwforest/game_arena/tree/databricks_apps

Jason Yip

AUTHOR - FOLLOW
Jason Yip
Director of Data and AI, Tredence Inc.


Next Topic

Do You Need a Transportation Management System Now? A Complete Guide for Modern Logistics



Next Topic

Do You Need a Transportation Management System Now? A Complete Guide for Modern Logistics


Ready to talk?

Join forces with our data science and AI leaders to navigate your toughest challenges.

×
Thank you for a like!

Stay informed and up-to-date with the most recent trends in data science and AI.

Share this article
×

Ready to talk?

Join forces with our data science and AI leaders to navigate your toughest challenges.