←back to Blog

AbstRaL: Teaching LLMs Abstract Reasoning via Reinforcement to Boost Robustness on GSM Benchmarks

«`html

AbstRaL: Teaching LLMs Abstract Reasoning via Reinforcement to Boost Robustness on GSM Benchmarks

Understanding the Target Audience

The target audience for AbstRaL includes AI researchers, data scientists, and business leaders interested in enhancing the robustness of large language models (LLMs). Key pain points for this audience involve the limitations of existing LLMs in handling out-of-distribution (OOD) scenarios and the inefficiencies in current training methodologies. Their goals include improving model performance, ensuring reliability in varied contexts, and integrating advanced reasoning capabilities into practical applications. They are particularly interested in findings backed by empirical research and prefer clear, concise communication that focuses on technical details and real-world implications.

Abstracting the Core Logic of LLM Reasoning Failures

Recent studies have shown that LLMs, especially smaller ones, often struggle with robust reasoning. They perform well on familiar questions but exhibit significant accuracy drops when faced with variations in phrasing, numbers, or the introduction of irrelevant information. This issue, known as poor out-of-distribution (OOD) generalization, is particularly evident in tasks involving logic, mathematics, and commonsense reasoning. Traditional solutions, such as data augmentation, have improved robustness but at a high computational cost. Alternative approaches, like abstraction-of-thought and chain-of-abstraction, have been explored to enhance abstract reasoning through structured problem-solving techniques.

AbstRaL’s Symbolic Learning Method to Improve Reasoning Consistency

The AbstRaL framework, developed by researchers from Apple and EPFL, aims to teach LLMs to grasp abstract reasoning patterns rather than relying on surface details. By utilizing reinforcement learning, AbstRaL reduces the need for extensive training examples, focusing instead on the underlying structure of reasoning problems. This method has shown promise in improving the performance of LLMs on GSM benchmarks, particularly in scenarios involving input changes and distractions. Compared to models trained solely via supervised learning, AbstRaL promotes more consistent and context-independent reasoning.

Four Steps to Abstract Symbolic Reasoning via AbstRaL

AbstRaL is structured around a four-step framework:

  • Identify key variables in a question and replace them with symbolic placeholders.
  • Utilize specially crafted data (GranulAR) to facilitate step-by-step reasoning with abstract symbols.
  • Retrieve the general reasoning structure (abstraction) from the symbolic answer.
  • Apply this abstraction with original values to compute the correct answer.

Reinforcement learning enhances this process by incorporating two rewards: one for correctness and another for symbolic similarity, fostering the model’s ability to generate accurate, context-independent reasoning patterns.

GSM8K Variations Reveal AbstRaL’s Robustness Across LLM Sizes

Evaluation of AbstRaL on math reasoning tasks utilized models such as Llama-3 and Qwen2, employing the GranulAR dataset to rewrite math problems into an abstract symbolic format. This approach allows models to concentrate on structural reasoning rather than surface details. By testing robustness with altered GSM8K problems—where numbers, names, and phrasing are modified—researchers found that AbstRaL demonstrated greater consistency and reduced accuracy drops compared to standard Chain-of-Thought prompting. This is particularly beneficial for smaller models, enhancing reliability across varied input formats.

Teaching LLMs Abstract Thinking through Reinforcement Yields Robust Reasoning

In summary, AbstRaL is a method designed to improve abstract reasoning in LLMs, making them more resilient to superficial changes in problems. By leveraging reinforcement learning and integrating GranulAR rationales, AbstRaL effectively helps models eliminate surface-level distractions while better connecting with symbolic tools. Tested against challenging GSM8K perturbation benchmarks, it significantly reduces performance declines under distribution shifts, particularly in smaller models, indicating that learning to abstract enhances reasoning robustness more effectively than traditional fine-tuning approaches.

Further Reading

Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter, Youtube, and Spotify. Don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

«`