
In recommendation systems, the two-tower architecture has emerged as a powerful approach for tackling the complex task of matching users with relevant items. This architecture is particularly effective for large-scale systems where the ability to sift through vast amounts of data quickly is crucial.
What is the two-tower Architecture?
The two-tower architecture employs two separate neural networks, referred to as “towers,” each responsible for processing different types of information. One tower is dedicated to encoding user data, such as past behavior and preferences, into a vector space. The other tower focuses on item data, encoding attributes like descriptions, categories, and other metadata.
The magic of the two-tower architecture lies in its ability to project both users and items into the same vector space, where the similarity between vectors can be measured. Typically, this similarity is calculated using the dot product, which serves as a proxy for an item's relevance to a user.
Advantages of the Two-Tower Architecture
- Scalability: The two-tower architecture can efficiently handle millions of items and users by separating the user and item encodings.
- Flexibility: It allows for the incorporation of various types of data, including contextual information, into the recommendation process.
- Performance: The architecture is known for producing high-quality embeddings that capture the complex relationships between users and items.
Understanding the Mathematical Framework
At its core, the Two-Tower Architecture leverages sophisticated mathematical concepts to model user-item interactions and generate personalized recommendations. Let's delve into the key mathematical components that constitute this architecture:
Embedding Space:
Central to the Two-Tower Architecture is the notion of an embedding space—a mathematical representation where users and items are mapped to low-dimensional vectors, known as embeddings. These embeddings capture latent features and characteristics that encapsulate user preferences and item attributes. Mathematically, we can represent the embedding space as follows:
- Let U denote the set of users, and I denote the set of items.
- Each user u ∈ U is associated with a user embedding vector u ∈ Rd, where d represents the dimensionality of the embedding space.
- Similarly, each item i ∈ I is associated with an item embedding vector i ∈ Rd.
- The goal is to learn these embeddings such that the dot product between a user embedding and an item embedding reflects their compatibility or preference alignment.
Tower Networks:
The Two-Tower Architecture comprises two neural networks—referred to as "towers"—dedicated to learning user and item embeddings separately. These towers employ mathematical operations, such as matrix multiplications, nonlinear activations (e.g., ReLU), and optimization techniques (e.g., stochastic gradient descent), to transform raw input data into meaningful embeddings. Mathematically, we can express the operations performed by each tower as a series of mathematical functions:
- User Tower: fu: Rm → Rd, where m represents the dimensionality of the user input space.
- Item Tower: fi: Rn → Rd, where n represents the dimensionality of the item input space.
Mathematical Formulation of Recommendation:
Once the user and item embeddings are obtained through the tower networks, recommendation generation involves computing the compatibility score between a user and an item. This score is typically derived from the dot product or cosine similarity between their respective embeddings. Mathematically, the recommendation score rui for user u and item i can be expressed as:
rui = uTi
Implementing the Two-Tower Architecture
To illustrate the implementation of the two-tower architecture, let’s consider an open-source repository that provides a practical example: GitHub - kirajano/two_tower_recommenders: Building Recommender System with the Two-Tower Architecture.
User Tower
The user tower is designed to understand the user’s profile and preferences. It takes input such as past interactions, search queries, and demographic information. The neural network then processes this data to create a dense vector representing the user.
class UserTower(tf.keras.Model):
def __init__(self, ...):
# Initialization code
def call(self, inputs, ...):
# Forward pass for user data
return user_embedding
Item Tower
Similarly, the item tower encodes information about the items. This could include text descriptions, images, price, and more. The goal is to create a vector that encapsulates the essence of each item.
class ItemTower(tf.keras.Model):
def __init__(self, ...):
# Initialization code
def call(self, inputs, ...):
# Forward pass for item data
return item_embedding
Training and Serving
During training, the model learns to adjust the weights of both towers so that the embeddings of users and their preferred items are closer in the vector space. For serving, these embeddings can be used to retrieve a shortlist of relevant items for each user query in real-time.
Conclusion
The two-tower architecture is a robust solution for modern recommendation systems, offering scalability and precision. By leveraging deep learning, it opens possibilities for personalized and context-aware recommendations that can significantly enhance user experience.

AUTHOR - FOLLOW
Hazim Bashir
Associate Manager, Data Science