←back to Blog

Together AI Releases DeepSWE: A Fully Open-Source RL-Trained Coding Agent Based on Qwen3-32B and Achieves 59% on SWEBench

Together AI Releases DeepSWE: A Fully Open-Source RL-Trained Coding Agent Based on Qwen3-32B Achieving 59% on SWEBench

Together AI has launched DeepSWE, a fully open-sourced software engineering agent trained using reinforcement learning (RL). This agent is built on the Qwen3-32B language model and has achieved 59% accuracy on the SWEBench-Verified benchmark, with a 42.2% Pass@1 score, placing it at the top among open-weight models. This release signifies a shift for Together AI, moving from traditional pretraining pipelines to developing autonomous language agents capable of continuous learning through real-world feedback.

Reinforcement Learning Meets Code Generation

DeepSWE is the outcome of post-training the Qwen3-32B foundation model utilizing rLLM, Agentica’s modular reinforcement learning framework designed for language agents. Unlike conventional supervised fine-tuning methods, rLLM allows agents to adapt to real-world workflows through experiential learning. DeepSWE has been specifically trained to tackle complex software engineering tasks using a feedback-driven loop rather than relying on static datasets.

The training pipeline includes Agentica’s R2EGym dataset—a benchmark for software engineering designed for RL-style agent development. This framework emphasizes training language models with action-oriented objectives, such as fixing bugs, completing functions, and editing code, aligning DeepSWE’s functionality more closely with how human engineers iterate and learn from their outputs.

Performance Benchmarks and Capabilities

On the SWEBench-Verified benchmark, which is the most stringent for software engineering agents, DeepSWE scores 59% with test-time scaling, significantly outperforming previous open-weight models. In Pass@1 evaluations—measuring the probability that the agent solves a problem correctly on the first attempt—DeepSWE achieves an impressive 42.2%. These results highlight the effectiveness of RL-based training in enhancing agent behavior, particularly in domains requiring iterative reasoning and precise outputs, such as code synthesis. The model’s architecture, inherited from Qwen3-32B, enables effective scaling while remaining applicable for real-world usages.

Open Source and Reproducibility at Its Core

A key feature of this release is its full transparency. Together AI and Agentica have open-sourced not only the DeepSWE model but also the entire training recipe, including the rLLM framework, the R2EGym dataset, and training configuration scripts. This commitment to openness promotes reproducibility and encourages the broader research and development communities to extend or build upon DeepSWE without restrictions.

Developers can access DeepSWE and rLLM through the following:

From Language Reasoners to Language Agents

DeepSWE represents a philosophical and practical transition: from developing models that reason about language to creating agents that learn through interaction. Traditional large language models (LLMs) have demonstrated strong reasoning capabilities but often lack the ability to adapt based on feedback or improve over time. Reinforcement learning allows these models to not only perform well at launch but also to enhance their functionality as they encounter new problem distributions and domains.

This approach opens up possibilities for local deployment. Because DeepSWE is fully open-source and modular, it can be extended and retrained for organization-specific use cases. Developers and researchers can leverage DeepSWE using rLLM to serve diverse applications, including web navigation, robotics, or autonomous research assistance.

Conclusion

DeepSWE marks a significant advancement in the evolution of generative AI for software engineering. By applying reinforcement learning to large language models such as Qwen3-32B and providing the entire training infrastructure openly, Together AI is paving the way for a future where agents are not only pretrained and deployed but also continuously trained and improved. This shift from language understanding to action-oriented agency has profound implications across programming, automation, and intelligent system design.

For further information, refer to:

All credit for this research goes to the researchers involved in this project. For updates, follow us on Twitter and join our ML SubReddit community.