«`html
Tencent Hunyuan Open-Sources Hunyuan-MT-7B and Hunyuan-MT-Chimera-7B: A State-of-the-Art Multilingual Translation Models
Introduction
Tencent’s Hunyuan team has released Hunyuan-MT-7B and Hunyuan-MT-Chimera-7B, both designed for multilingual machine translation. These models were introduced during Tencent’s participation in the WMT2025 General Machine Translation shared task, where Hunyuan-MT-7B ranked first in 30 out of 31 language pairs.
Model Overview
Hunyuan-MT-7B
A 7B parameter translation model that supports mutual translation across 33 languages, including Chinese ethnic minority languages such as Tibetan, Mongolian, Uyghur, and Kazakh. It is optimized for both high-resource and low-resource translation tasks, achieving state-of-the-art results among models of comparable size.
Hunyuan-MT-Chimera-7B
This integrated weak-to-strong fusion model combines multiple translation outputs at inference time, producing a refined translation using reinforcement learning and aggregation techniques. It represents the first open-source translation model of this type, improving translation quality beyond single-system outputs.
Training Framework
The models were trained using a five-stage framework designed for translation tasks:
- General Pre-training: 1.3 trillion tokens covering 112 languages and dialects, ensuring diversity through disciplinary, industry, and thematic tagging systems.
- MT-Oriented Pre-training: Utilized monolingual corpora from mC4 and OSCAR, filtered for quality and relevance.
- Supervised Fine-Tuning (SFT): Involves two stages with a total of approximately 3M parallel pairs and high-quality pairs selected through automated scoring and manual verification.
- Reinforcement Learning (RL): Utilizes algorithms and reward functions to enhance translation quality.
- Weak-to-Strong RL: Generates multiple candidate outputs aggregated through reward-based output, applied in Hunyuan-MT-Chimera-7B.
Benchmark Results
Automatic Evaluation
In the WMT24pp evaluation, Hunyuan-MT-7B achieved a score of 0.8585 (XCOMET-XXL), outperforming larger models such as Gemini-2.5-Pro (0.8250) and Claude-Sonnet-4 (0.8120). In the FLORES-200 evaluation, it scored 0.8758, surpassing open-source baselines including Qwen3-32B (0.7933).
Comparative Results
Hunyuan-MT-7B outperformed Google Translator by 15–65% across evaluation categories and specialized translation models like Tower-Plus-9B and Seed-X-PPO-7B despite having fewer parameters. The Chimera-7B model added approximately 2.3% improvement on FLORES-200.
Human Evaluation
A custom evaluation set covering various domains showed that Hunyuan-MT-7B achieved an average score of 3.189, approaching the quality of larger proprietary models.
Case Studies
The report highlights several real-world cases demonstrating the model’s capabilities:
- Cultural References: Correctly translates “小红薯” as the platform “REDnote.”
- Idioms: Interprets “You are killing me” as “你真要把我笑死了,” avoiding literal misinterpretation.
- Medical Terms: Precisely translates “uric acid kidney stones.”
- Minority Languages: Produces coherent translations for Kazakh and Tibetan.
- Chimera Enhancements: Improves translations in gaming jargon and sports terminology.
Conclusion
Tencent’s release of Hunyuan-MT-7B and Hunyuan-MT-Chimera-7B establishes a new standard for open-source translation. By combining a carefully designed training framework with a specialized focus on low-resource and minority language translation, these models achieve quality on par with or exceeding larger closed-source systems. The launch provides the AI research community with accessible, high-performance tools for multilingual translation research and deployment.
Check out the Paper, GitHub Page, and Model on Hugging Face. All credit for this research goes to the researchers of this project. Feel free to check out our GitHub Page for Tutorials, Codes, and Notebooks. Also, feel free to follow us on Twitter and join our 100k+ ML SubReddit and Subscribe to our Newsletter.
«`