←back to Blog

Meta AI Released MobileLLM-R1: A Edge Reasoning Model with less than 1B Parameters and Achieves 2x–5x Performance Boost Over Other Fully Open-Source AI Models

Meta AI Released MobileLLM-R1: An Edge Reasoning Model with Less Than 1B Parameters and Achieves 2x–5x Performance Boost Over Other Fully Open-Source AI Models

Meta has launched MobileLLM-R1, a family of lightweight edge reasoning models now accessible on Hugging Face. The release encompasses models with parameters ranging from 140M to 950M, emphasizing efficient mathematical, coding, and scientific reasoning at a sub-billion parameter scale.

Target Audience Analysis

The target audience for MobileLLM-R1 primarily consists of:

  • Data Scientists and AI Researchers: Interested in the technical specifications and performance metrics of the model.
  • Business Decision-Makers: Seeking scalable, cost-effective AI solutions for deploying on edge devices.
  • Developers and Engineers: Looking for lightweight models to integrate into applications that require efficient reasoning capabilities.

Pain Points: These audiences may struggle with high computational costs, long training times, and the need for efficient models that can operate under resource constraints.

Goals: They aim to improve AI functionality in applications, enable faster deployment, and achieve optimal performance without incurring heavy resource demands.

Interests: Topics related to model efficiency, edge computing, industry applications of AI, and advancements in machine learning technologies.

Communication Preferences: They prefer technical content that is concise, data-driven, and includes practical examples or case studies of implementation.

Architectural Overview of MobileLLM-R1

The largest model in the family, MobileLLM-R1-950M, employs several architectural optimizations:

  • 22 Transformer layers with 24 attention heads and 6 grouped KV heads
  • Embedding dimension: 1,536; hidden dimension: 6,144
  • Grouped-Query Attention (GQA) to reduce compute and memory usage
  • Block-wise weight sharing to decrease parameter count without significant latency increases
  • SwiGLU activations for improved representation in smaller models
  • Context length: 4K for base models, 32K for post-trained models
  • 128K vocabulary with shared input/output embeddings

This architecture focuses on minimizing compute and memory requirements, making it suitable for deployment on constrained devices.

Training Efficiency

MobileLLM-R1 stands out for its efficiency in training:

  • Trained on approximately 4.2 TB of tokens
  • Utilizes about 11.7% of the training data compared to Qwen3’s 0.6B model, which was trained on 36 TB of tokens

This efficiency leads to lower training costs and reduced resource demands, making it a compelling option for enterprises.

Performance Benchmarking

In benchmark tests, MobileLLM-R1-950M demonstrates substantial performance gains:

  • On the MATH dataset (MATH500), it achieved approximately 5× higher accuracy than Olmo-1.24B and about 2× higher accuracy than SmolLM2-1.7B.
  • In reasoning and coding tasks (GSM8K, AIME, LiveCodeBench), it matches or surpasses Qwen3-0.6B, even with significantly fewer tokens used.

This allows the model to deliver results typically associated with larger architectures while maintaining a smaller footprint.

Limitations of MobileLLM-R1

Despite its strengths, MobileLLM-R1 has some limitations:

  • It excels in structured reasoning, math, and coding but is weaker in general conversation, commonsense, and creative tasks compared to larger models.
  • The model is available under a FAIR NC (non-commercial) license, limiting its usage in production settings.
  • Longer context lengths (32K) increase KV-cache and memory demands during inference.

Comparison with Other Models

Here is a comparative performance summary of MobileLLM-R1 against other open models:

  • MobileLLM-R1-950M: 0.949B parameters, trained on 4.2 TB of tokens, scores 74.0 on MATH500, 67.5 on GSM8K, and 15.5 on AIME’24.
  • Qwen3-0.6B: 0.596B parameters, trained on 36.0 TB of tokens, scores 73.0 on MATH500, 79.2 on GSM8K, and 11.3 on AIME’24.
  • SmolLM2-1.7B: 1.71B parameters, estimated 11.0 TB of tokens, scores 19.2 on MATH500, 41.8 on GSM8K, and 0.3 on AIME’24.
  • OLMo-2-1B: 1.48B parameters, estimated 3.95 TB of tokens, scores 19.2 on MATH500, 69.7 on GSM8K, and 0.6 on AIME’24.

Key Insights:

  • MobileLLM-R1-950M matches Qwen3-0.6B in math while requiring approximately 8.6× fewer tokens.
  • Performance disparities are significant across reasoning tasks when compared to SmolLM2 and OLMo.

Conclusion

Meta’s MobileLLM-R1 highlights a shift towards smaller, domain-optimized models that deliver competitive reasoning capabilities without imposing significant training budgets. By achieving 2×–5× performance improvements over larger open models while utilizing only a fraction of the data, it exemplifies that efficiency—not just scale—will shape the next evolution of LLM deployment, especially for math, coding, and scientific applications on edge devices.

Check out the model on Hugging Face. Explore our GitHub Page for tutorials, code, and notebooks. Follow us on Twitter and join our 100k+ ML SubReddit community. Subscribe to our newsletter for updates.