←back to Blog

Liquid AI Open-Sources LFM2: A New Generation of Edge LLMs

«`html

Liquid AI Open-Sources LFM2: A New Generation of Edge LLMs

The landscape of on-device artificial intelligence has improved significantly with Liquid AI’s release of LFM2, their second-generation Liquid Foundation Models. This new series of generative AI models represents a shift in edge computing, delivering performance optimizations designed for on-device deployment while maintaining competitive quality standards.

Performance Breakthroughs

LFM2 establishes new benchmarks in the edge AI space by achieving efficiency improvements across multiple dimensions. The models deliver 2x faster decode and prefill performance compared to Qwen3 on CPU architectures, which is crucial for real-time applications. The training process has been optimized to achieve 3x faster training compared to the previous LFM generation, making LFM2 an effective path to building capable, general-purpose AI systems.

These improvements are fundamental in making powerful AI accessible on resource-constrained devices. The models are engineered to unlock millisecond latency, offline resilience, and data-sovereign privacy, which are essential for devices like phones, laptops, cars, robots, wearables, and satellites that require real-time reasoning.

Hybrid Architecture Innovation

The technical foundation of LFM2 lies in its hybrid architecture, which combines convolution and attention mechanisms. The model employs a 16-block structure consisting of 10 double-gated short-range convolution blocks and 6 blocks of grouped query attention (GQA). This approach leverages Liquid AI’s work on Liquid Time-constant Networks (LTCs), introducing continuous-time recurrent neural networks modulated by nonlinear input interlinked gates.

The architecture utilizes the Linear Input-Varying (LIV) operator framework, generating weights on-the-fly from the input they act on. This allows various layers to function within one unified framework. LFM2 convolution blocks implement multiplicative gates and short convolutions, creating linear first-order systems that converge to zero after a finite time.

The architecture selection process utilized STAR, Liquid AI’s neural architecture search engine, modified to evaluate language modeling capabilities beyond traditional metrics. It employs over 50 internal evaluations assessing knowledge recall, multi-hop reasoning, understanding of low-resource languages, instruction following, and tool use.

Comprehensive Model Lineup

LFM2 is available in three configurations: 350M, 700M, and 1.2B parameters, optimized for different deployment scenarios while maintaining efficiency benefits. All models were trained on 10 trillion tokens, with a corpus comprising approximately 75% English, 20% multilingual content, and 5% code data sourced from web and licensed materials.

The training methodology incorporates knowledge distillation using the existing LFM1-7B as a teacher model. Cross-entropy between LFM2’s student outputs and the teacher outputs serves as the primary training signal throughout the training process. The context length was extended to 32k during pretraining, enabling the models to handle longer sequences effectively.

Superior Benchmark Performance

Evaluation results show that LFM2 significantly outperforms similarly-sized models across multiple benchmark categories. The LFM2-1.2B model competes with Qwen3-1.7B despite having 47% fewer parameters. Similarly, LFM2-700M outperforms Gemma 3 1B IT, while the smallest LFM2-350M remains competitive with Qwen3-0.6B and Llama 3.2 1B Instruct.

Beyond automated benchmarks, LFM2 demonstrates superior conversational capabilities in multi-turn dialogues. Using the WildChat dataset and LLM-as-a-Judge evaluation framework, LFM2-1.2B shows significant preference advantages over Llama 3.2 1B Instruct and Gemma 3 1B IT while matching Qwen3-1.7B performance despite being smaller and faster.

Edge-Optimized Deployment

The models excel in real-world deployment scenarios, having been exported to multiple inference frameworks, including PyTorch’s ExecuTorch and the open-source llama.cpp library. Testing on target hardware like Samsung Galaxy S24 Ultra and AMD Ryzen platforms demonstrates that LFM2 dominates the Pareto frontier for prefill and decode inference speed relative to model size.

This strong CPU performance translates effectively to accelerators such as GPU and NPU after kernel optimization, making LFM2 suitable for various hardware configurations. This flexibility is crucial for the diverse ecosystem of edge devices requiring on-device AI capabilities.

Conclusion

The release of LFM2 addresses a critical gap in the AI deployment landscape where the shift from cloud-based to edge-based inference is accelerating. By enabling millisecond latency, offline operation, and data-sovereign privacy, LFM2 unlocks new possibilities for AI integration across consumer electronics, robotics, smart appliances, finance, e-commerce, and education sectors.

The technical achievements in LFM2 signal a maturation of edge AI technology, where the trade-offs between model capability and deployment efficiency are being successfully optimized. As enterprises pivot from cloud LLMs to fast, private, and on-premises intelligence, LFM2 positions itself as a foundational technology for the next generation of AI-powered devices and applications.

Check out the Technical Details and Model on Hugging Face. All credit for this research goes to the researchers of this project. “Your AI deserves a smarter stage. Ours reaches 1M minds a month.”

«`