Researchers from the National University of Singapore Introduce ‘Thinkless,’ an Adaptive Framework that Reduces Unnecessary Reasoning by up to 90% Using DeGRPO

Introducing ‘Thinkless’: An Adaptive Framework for Efficient Language Model Reasoning

Researchers from the National University of Singapore have developed a new framework known as Thinkless, designed to enhance the efficiency of language models by reducing unnecessary reasoning by up to 90%. This framework addresses a core challenge in current language models, where extensive reasoning processes are often employed even for simple queries, leading to increased token usage, extended response times, and heightened system latency.

The current approaches to optimize reasoning in language models often rely on static heuristics or external models that fail to capitalize on the target model’s capabilities. Static prompt-based cues like “reasoning on/off” do not provide the adaptive control necessary for real-world applications. Thinkless tackles these limitations by enabling the model to autonomously determine when to engage in short or long-form reasoning.

Technical Overview

Thinkless utilizes Decoupled Group Relative Policy Optimization (DeGRPO) to segregate the model’s training focus between the selection of reasoning modes and the accuracy of responses. This approach consists of two stages:

Warm-up Distillation: The model is initially trained using outputs from two expert models, one for short responses and another for detailed reasoning. This phase establishes a direct connection between control tokens and the intended reasoning format.
Reinforcement Learning: In this stage, the model refines its ability to choose reasoning modes dynamically. DeGRPO bifurcates the learning objectives, thus ensuring balanced updates for both the <short> and <think> tokens, leading to stable learning.

Performance Metrics

In evaluations, Thinkless significantly lowered the frequency of long-form reasoning while maintaining high accuracy levels:

On the Minerva Algebra benchmark, Thinkless utilized the <think> token only 25.88% of the time while achieving a 94.59% accuracy rate.
In the AIME 2024 dataset, it achieved a 27.33% accuracy rate with complete use of the reasoning mode, demonstrating robustness in complex reasoning tasks.
On the GSM8K dataset, the model used the <think> token 13.31% of the time, yet still secured an accuracy of 84.18%.

These outcomes highlight the model’s adaptability, effectively managing both simple and complex questions with minimal unnecessary processing.

Conclusion

The work presented by the National University of Singapore highlights an innovative solution to the inefficiencies observed in conventional reasoning practices within language models. By incorporating a mechanism to evaluate task complexity, Thinkless aligns reasoning depth with response precision, enhancing overall model performance without depending on fixed rules.

For further insights, explore the original research paper and the GitHub page associated with this project. All credit for this research goes to the team behind it. Additionally, consider following us on Twitter and joining our community on the ML SubReddit with over 95k members, and subscribe to our newsletter for more updates.