Google AI Introduces Multi-Agent System Search (MASS): A New AI Agent Optimization Framework for Better Prompts and Topologies

Understanding the Target Audience

The target audience for this article primarily includes AI researchers, business managers, and decision-makers in technology sectors interested in enhancing their understanding of artificial intelligence systems. These readers typically face challenges related to:

Complexity in designing multi-agent systems (MAS) and optimizing their performance.
Understanding the interplay between prompts and topologies in AI models.
Implementing scalable and efficient AI solutions in their organizations.
Accessing reliable information on the latest advancements in AI technology.

Their goals include:

Improving operational efficiency through advanced AI applications.
Staying updated on cutting-edge AI developments to guide strategic decisions.
Exploring frameworks that can simplify the design and optimization of AI systems.

Interests often revolve around practical applications of AI, particularly in business contexts, and they prefer clear, concise, and technical communication that provides actionable insights.

Introduction to Multi-Agent Systems

Multi-agent systems are becoming a critical development in artificial intelligence due to their ability to coordinate multiple large language models (LLMs) to solve complex problems. Instead of relying on a single model’s perspective, these systems distribute roles among agents, each contributing a unique function. This division of labor enhances the system’s ability to analyze, respond, and act more robustly. Whether applied to code debugging, data analysis, retrieval-augmented generation, or interactive decision-making, LLM-driven agents achieve results that single models cannot consistently match.

The power of these systems lies in their design, particularly the configuration of inter-agent connections, known as topologies, and the specific instructions given to each agent, referred to as prompts. As this model of computation matures, the challenge has shifted from proving feasibility to optimizing architecture and behavior for superior results.

Challenges in Designing Multi-Agent Systems

A significant challenge in designing these systems efficiently is prompt sensitivity. When prompts—structured inputs that guide each agent’s role—are slightly altered, performance can swing dramatically. This sensitivity makes scalability risky, especially when agents are linked together in workflows where one output serves as another’s input. Errors can propagate or even amplify. Moreover, topological decisions, such as determining the number of agents involved, their interaction style, and task sequence, are often reliant on manual configuration and trial-and-error. The design space is vast and nonlinear, combining numerous options for both prompt engineering and topology construction. Optimizing both simultaneously has largely been out of reach for traditional design methods.

Introducing the Multi-Agent System Search (MASS) Framework

Researchers at Google and the University of Cambridge have introduced a new framework named Multi-Agent System Search (MASS). This method automates MAS design by interleaving the optimization of both prompts and topologies in a staged approach. Unlike earlier attempts that treated the two components independently, MASS identifies which elements—both prompts and topological structures—are most likely to influence performance. By narrowing the search to this influential subspace, the framework operates more efficiently while delivering higher-quality outcomes.

The method progresses in three phases:

Localized prompt optimization.
Selection of effective workflow topologies based on the optimized prompts.
Global optimization of prompts at the system-wide level.

This approach reduces computational overhead and removes the burden of manual tuning from researchers.

Technical Implementation of MASS

The technical implementation of MASS is structured and methodical. Each building block of a MAS undergoes prompt refinement. These blocks are agent modules with specific responsibilities, such as aggregation, reflection, or debate. For example, prompt optimizers generate variations that include both instructional guidance (e.g., “think step by step”) and example-based learning (e.g., one-shot or few-shot demos). The optimizer evaluates these using a validation metric to guide improvements.

Once optimized, the system explores valid combinations of agents to form topologies, informed by earlier results and constrained to a pruned search space identified as most influential. Finally, the best topology undergoes global-level prompt tuning, where instructions are fine-tuned in the context of the entire workflow to maximize collective efficiency.

Performance Results

In tasks such as reasoning, multi-hop understanding, and code generation, the optimized MAS consistently surpassed existing benchmarks. In performance testing using Gemini 1.5 Pro on the MATH dataset, prompt-optimized agents achieved an average accuracy of around 84% with enhanced prompting techniques, compared to 76–80% for agents scaled through self-consistency or multi-agent debate. In the HotpotQA benchmark, using the debate topology within MASS yielded a 3% improvement. In contrast, other topologies, such as reflect or summarize, failed to yield gains or even led to a 15% degradation.

Key Takeaways from the Research

MAS design complexity is significantly influenced by prompt sensitivity and topological arrangement.
Prompt optimization, both at the block and system level, is more effective than agent scaling alone, with 84% accuracy achieved through enhanced prompts versus 76% with self-consistency scaling.
Not all topologies are beneficial; debate added +3% in HotpotQA, while reflection caused a drop of up to -15%.
The MASS framework integrates prompt and topology optimization in three phases, drastically reducing computational and design burden.
Topologies like debate and executor are effective, while others, such as reflect and summarize, can degrade system performance.
MASS avoids full search complexity by pruning the design space based on early influence analysis, improving performance while saving resources.
The approach is modular and supports plug-and-play agent configurations, making it adaptable to various domains and tasks.
Final MAS models from MASS outperform state-of-the-art baselines across multiple benchmarks like MATH, HotpotQA, and LiveCodeBench.

Conclusion

This research identifies prompt sensitivity and topology complexity as major bottlenecks in multi-agent system (MAS) development and proposes a structured solution that strategically optimizes both areas. The MASS framework demonstrates a scalable, efficient approach to MAS design, minimizing the need for human input while maximizing performance. The findings provide compelling evidence that better prompt design is more effective than merely adding agents and that targeted search within influential topology subsets leads to meaningful gains in real-world tasks.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.

Google AI Introduces Multi-Agent System Search MASS: A New AI Agent Optimization Framework for Better Prompts and Topologies