←back to Blog

DeepAgent: A Deep Reasoning AI Agent that Performs Autonomous Thinking, Tool Discovery, and Action Execution within a Single Reasoning Process

DeepAgent: A Deep Reasoning AI Agent for Autonomous Thinking and Tool Discovery

Understanding the Target Audience

The target audience for DeepAgent includes AI researchers, business managers integrating AI tools, and tech-savvy professionals looking to optimize their workflows. Their pain points include:

  • Difficulty managing complex tasks with large toolsets.
  • Challenges in adapting strategies when reasoning processes change.
  • Need for efficient and stable tool utilization in AI solutions.

Goals of this audience involve:

  • Implementing AI solutions that enhance decision-making.
  • Improving operational efficiency through advanced tool integration.
  • Staying updated with the latest advancements in AI technologies.

Interests include autonomous AI agents, reinforcement learning, and scalable business applications of AI. They prefer clear, concise communication that offers actionable insights and technical specifications.

Overview of DeepAgent

Most agent frameworks operate on a predefined Reason, Act, Observe loop, limiting them to tools predefined in the prompt. While this approach may suffice for small tasks, it falters with larger toolsets or when tasks require strategy alterations. The team from Renmin University of China and Xiaohongshu has developed DeepAgent, an end-to-end deep reasoning agent that performs autonomous thinking, tool discovery, and action execution within a single coherent reasoning process.

Key Features of DeepAgent

Unified Reasoning with On-Demand Tool Discovery

DeepAgent enables the model to output four action types: internal thoughts, tool searches, tool calls, and memory folds. When the agent searches, it queries a dense index containing tool descriptions from extensive registries, such as over 16,000 RapidAPI tools and 3,900 ToolHop tools. This dynamic tool access means the model does not rely on a pre-loaded tool list, staying aligned with real environments where tools frequently change.

Autonomous Memory Folding for Long Horizon Tasks

To manage long sequences of tool calls and responses that might overflow context, DeepAgent employs an autonomous memory folding step. Upon emitting a fold token, it compresses interaction histories into three memory types:

  • Episodic Memory, which records task events,
  • Working Memory, which tracks the current sub-goal and recent issues, and
  • Tool Memory, documenting tool names, arguments, and outcomes.

This information is structured text, allowing the agent to continue from a compact yet information-rich state.

Tool Policy Optimization (ToolPO)

Traditional supervised traces do not effectively teach robust tool usage, as correct tool calls often only represent a few tokens within a lengthy generation. The research team introduces Tool Policy Optimization (ToolPO) to address this issue. ToolPO runs rollouts on simulated APIs, ensuring stable and cost-effective training while attributing rewards to specific tool call tokens. This method enables the agent to learn not just how to call tools but also when to search for them and when to fold memory.

Performance Evaluation

DeepAgent has been evaluated against five general tool use benchmarks and four downstream tasks. In the labeled tool setting, DeepAgent with a 32B model achieved the following scores:

  • 69.0 on ToolBench,
  • 75.3 on API Bank,
  • 89.0 on TMDB,
  • 75.4 on Spotify,
  • 51.3 on ToolHop.

This represents the strongest 32B level result across all datasets compared to workflow baselines like ReAct and CodeAct, which struggle to maintain high performance across the board.

In an open set retrieval environment, which reflects realistic scenarios, DeepAgent reached 64.0 on ToolBench and 40.6 on ToolHop. In comparison, the strongest baseline methods achieved 55.0 on ToolBench and 36.2 on ToolHop, demonstrating that DeepAgent remains superior.

For downstream environments such as ALFWorld, WebShop, GAIA, and HLE, DeepAgent reported success rates of 91.8% on ALFWorld, 34.4% on WebShop, and 53.3 on GAIA. This indicates that the combination of memory folding and ToolPO contributes significantly to performance in longer and noisier tasks.

Conclusion

DeepAgent represents a significant advancement in agent architectures, integrating autonomous thinking, dynamic tool retrieval, structured tool calling, and memory folding into one continuous reasoning stream. This innovative approach not only enhances the usability of extensive toolsets for LLM agents but also establishes a new standard in AI agent development. The use of simulated APIs in ToolPO effectively addresses issues of latency and instability associated with previous tool agents.

Explore further through the original research paper and the GitHub repository for tutorials and resources.