←back to Blog

MiniMax Open-Sources MiniMax M2: A Mini Model Built for Max Coding and Agentic Workflows at 8% Claude Sonnet Price and ~2x Faster

MiniMax Open-Sources MiniMax M2: A Mini Model Built for Max Coding and Agentic Workflows

Understanding the Target Audience

The target audience for MiniMax Open-Sources MiniMax M2 includes software developers, data scientists, and AI researchers who are engaged in coding and agentic workflows. These professionals often face challenges such as:

  • High costs associated with flagship AI models.
  • Latency issues in coding and development processes.
  • Need for efficient memory usage while working on complex tasks.
  • Desire for open-source solutions that encourage collaboration and customization.

Their goals include improving coding efficiency, reducing operational costs, and leveraging advanced AI capabilities to streamline workflows. They prefer clear, technical communication that focuses on performance metrics and practical applications.

Overview of MiniMax M2

Can an open-source Mixture of Experts (MoE) model truly enhance agentic coding workflows at a fraction of flagship model costs while sustaining long-horizon tool use across various platforms? The MiniMax team has recently released MiniMax-M2, an optimized MoE model aimed at coding and agent workflows. The model is available on Hugging Face under the MIT license, featuring:

  • 229 billion total parameters with approximately 10 billion active parameters per token.
  • Optimized for lower memory usage and reduced latency during agent loops.

Architecture and Importance of Activation Size

MiniMax-M2 employs a compact MoE architecture that activates about 10 billion parameters per token. This design minimizes memory pressure and tail latency in planning, acting, and verifying loops, enabling more concurrent runs in continuous integration (CI), browsing, and retrieval chains. This performance budget supports claims of enhanced speed and cost efficiency relative to dense models of similar quality.

Internal Reasoning and Interaction Format

MiniMax-M2 is an interleaved thinking model, encapsulating internal reasoning within blocks. Users are instructed to retain these segments in the conversation history across turns; removal of these segments can degrade performance in multi-step tasks and tool chains. This requirement is clearly stated on the model page on Hugging Face.

Benchmarking Performance

The MiniMax team has conducted evaluations that focus on coding and agent workflows, providing results that are more representative of developer activities than static question-answering tasks. Key benchmarks include:

  • Terminal Bench: 46.3
  • Multi SWE Bench: 36.2
  • BrowseComp: 44.0
  • SWE Bench Verified: 69.4

Comparison of MiniMax M1 and M2

Aspect MiniMax M1 MiniMax M2
Total parameters 456 billion 229 billion
Active parameters per token 45.9 billion 10 billion
Core design Hybrid Mixture of Experts with Lightning Attention Sparse Mixture of Experts targeting coding and agent workflows
Thinking format Variants in RL training with no specific protocol Interleaved thinking requiring tags
Benchmarks highlighted AIME, LiveCodeBench, SWE-bench Verified Terminal-Bench, Multi SWE-Bench, BrowseComp
Inference defaults Temperature 1.0, Top-p 0.95 Temperature 1.0, Top-p 0.95, Top-k 20
Serving guidance vLLM recommended vLLM and SGLang recommended

Key Takeaways

MiniMax M2 is released as open weights on Hugging Face under the MIT license, featuring a compact MoE design with 229 billion total parameters and approximately 10 billion active per token. This model is tailored for agent loops and coding tasks, focusing on lower memory usage and consistent latency. It also provides deployment notes, including API documentation and specific details for local serving and benchmarking.

For more information, check out the API Doc, Weights, and Repo. Follow us on Twitter, join our community on Reddit, and subscribe to our Newsletter. You can also connect with us on Telegram.