MiroMind-M1: Advancing Open-Source Mathematical Reasoning via Context-Aware Multi-Stage Reinforcement Learning

«`html

Understanding the Target Audience for MiroMind-M1

The MiroMind-M1 initiative targets a range of professionals involved in mathematics, AI, and machine learning. This includes researchers, data scientists, and AI developers who are seeking robust and transparent tools for mathematical reasoning. Their pain points often include a lack of transparency and reproducibility in proprietary models, as well as the complexity of multi-step reasoning tasks.

Key goals for this audience include:

Access to open-source tools for advanced mathematical reasoning.
Improving their own model performance in mathematical problem-solving.
Ensuring reproducibility in research and development across various applications.

Interests for this audience typically center around innovations in AI, new methodologies for training reinforcement learning models, and data integrity in machine learning. Communication preferences generally lean toward technical documentation, peer-reviewed articles, and community discussions on platforms like GitHub and relevant forums.

MiroMind-M1 Overview

The MiroMind-M1 series, developed by MiroMind AI, offers a fully open-source pipeline focusing on mathematical reasoning powered by advanced multi-stage reinforcement learning techniques. It aims to set new standards for transparency and effectiveness in the field.

Architectural Foundation

MiroMind-M1 leverages the Qwen-2.5 model backbone, incorporating:

Supervised Fine-Tuning (SFT): Utilizing a dataset of 719K curated mathematical problems.
Reinforcement Learning with Verifiable Rewards (RLVR): Involving 62K challenging math problems and external verification for rewards.

The dual approach enhances both logic and reasoning capabilities while mimicking successful methodologies used in current leading models.

Data Transparency and Quality

Central to MiroMind-M1 are rigorous transparency standards:

SFT Corpus Composition: Composed of high-quality datasets like OpenR1 and Light-R1.
Deduplication and Decontamination: N-gram filtering ensures clean training data.
Long Trajectories Preference: Emphasis on deeper reasoning paths enhances benchmark performance.

Model Performance

MiroMind-SFT-7B has demonstrated outstanding results against benchmarks, achieving scores in the following ranges:

AIME24: 60.4
AIME25: 45.0
MATH500: 94.6

This performance underscores the effectiveness of their selective data curation and unique training design.

CAMPO: Innovative Reinforcement Learning

One notable advancement in MiroMind-M1 involves the CAMPO algorithm, designed to address common reinforcement learning challenges:

Implementing multi-stage training with gradually increasing context limits.
Utilizing a dynamic repetition penalty to reduce output redundancy.
Enhancing external verification systems to ensure accurate model scoring.

Benchmark Performance

The MiroMind-M1 models show comparable or superior performance to peer open models:

MiroMind-RL-7B: AIME24 — 73.4, AIME25 — 57.8, MATH500 — 96.7
MiroMind-RL-32B: AIME24 — 77.5, AIME25 — 65.6, MATH500 — 96.4

Commitment to Open Research

MiroMind-M1 is committed to reproducibility by providing:

Open model weights for various scales.
Comprehensive datasets, including 719K SFT and 62K RLVR samples.
Training scripts optimized for multi-node distributed setups.
Standardized evaluation code for community use.

This openness not only encourages replication but also propels further research and innovation.

Conclusion

MiroMind-M1 highlights the potential of collective effort in advancing open-source AI models for rigorous mathematical reasoning, presenting a robust alternative to proprietary systems.

For further details, you can explore the GitHub Page and check out the models on Hugging Face.

«`