MiniMax Open-Sources MiniMax M2: A Mini Model Built for Max Coding and Agentic Workflows
Understanding the Target Audience
The target audience for MiniMax Open-Sources MiniMax M2 includes software developers, data scientists, and AI researchers who are engaged in coding and agentic workflows. These professionals often face challenges such as:
- High costs associated with flagship AI models.
- Latency issues in coding and development processes.
- Need for efficient memory usage while working on complex tasks.
- Desire for open-source solutions that encourage collaboration and customization.
Their goals include improving coding efficiency, reducing operational costs, and leveraging advanced AI capabilities to streamline workflows. They prefer clear, technical communication that focuses on performance metrics and practical applications.
Overview of MiniMax M2
Can an open-source Mixture of Experts (MoE) model truly enhance agentic coding workflows at a fraction of flagship model costs while sustaining long-horizon tool use across various platforms? The MiniMax team has recently released MiniMax-M2, an optimized MoE model aimed at coding and agent workflows. The model is available on Hugging Face under the MIT license, featuring:
- 229 billion total parameters with approximately 10 billion active parameters per token.
- Optimized for lower memory usage and reduced latency during agent loops.
Architecture and Importance of Activation Size
MiniMax-M2 employs a compact MoE architecture that activates about 10 billion parameters per token. This design minimizes memory pressure and tail latency in planning, acting, and verifying loops, enabling more concurrent runs in continuous integration (CI), browsing, and retrieval chains. This performance budget supports claims of enhanced speed and cost efficiency relative to dense models of similar quality.
Internal Reasoning and Interaction Format
MiniMax-M2 is an interleaved thinking model, encapsulating internal reasoning within
Benchmarking Performance
The MiniMax team has conducted evaluations that focus on coding and agent workflows, providing results that are more representative of developer activities than static question-answering tasks. Key benchmarks include:
- Terminal Bench: 46.3
- Multi SWE Bench: 36.2
- BrowseComp: 44.0
- SWE Bench Verified: 69.4
Comparison of MiniMax M1 and M2
| Aspect | MiniMax M1 | MiniMax M2 |
|---|---|---|
| Total parameters | 456 billion | 229 billion |
| Active parameters per token | 45.9 billion | 10 billion |
| Core design | Hybrid Mixture of Experts with Lightning Attention | Sparse Mixture of Experts targeting coding and agent workflows |
| Thinking format | Variants in RL training with no specific protocol | Interleaved thinking requiring |
| Benchmarks highlighted | AIME, LiveCodeBench, SWE-bench Verified | Terminal-Bench, Multi SWE-Bench, BrowseComp |
| Inference defaults | Temperature 1.0, Top-p 0.95 | Temperature 1.0, Top-p 0.95, Top-k 20 |
| Serving guidance | vLLM recommended | vLLM and SGLang recommended |
Key Takeaways
MiniMax M2 is released as open weights on Hugging Face under the MIT license, featuring a compact MoE design with 229 billion total parameters and approximately 10 billion active per token. This model is tailored for agent loops and coding tasks, focusing on lower memory usage and consistent latency. It also provides deployment notes, including API documentation and specific details for local serving and benchmarking.
For more information, check out the API Doc, Weights, and Repo. Follow us on Twitter, join our community on Reddit, and subscribe to our Newsletter. You can also connect with us on Telegram.