«`html

A Coding Guide to Building a Brain-Inspired Hierarchical Reasoning AI Agent with Hugging Face Models

This tutorial aims to recreate the essence of the Hierarchical Reasoning Model (HRM) using a free Hugging Face model that operates locally. We will design a lightweight yet structured reasoning agent. By breaking problems into subgoals, solving them with Python, critiquing outcomes, and synthesizing final answers, this guide demonstrates how hierarchical planning and execution can enhance reasoning performance. The process shows how a brain-inspired workflow can be implemented without relying on large model sizes or costly APIs.

Understanding the Target Audience

The primary audience for this guide includes:

Data scientists and AI practitioners: Seeking practical applications of hierarchical reasoning using accessible tools.
Students and researchers: Interested in understanding AI model architectures and implementation techniques.
Business managers: Looking to leverage AI for improved decision-making processes and problem-solving capabilities.

Their pain points include:

Lack of hands-on experience with AI tools.
Difficulty in grasping complex AI concepts without clear examples.
Concerns over the cost and accessibility of powerful AI models.

Goals for these readers include:

Developing practical AI skills for real-world applications.
Understanding how to effectively deploy AI technologies.
Experimenting with AI without incurring high costs.

Common interests include:

Exploring innovative use cases for AI.
Learning about the latest advancements in AI technologies.
Practical coding and implementation techniques.

Preferred communication methods are typically concise, technical documentation with clear examples and step-by-step processes.

Setting Up the Environment

We begin by installing the required libraries and loading the Qwen2.5-1.5B-Instruct model from Hugging Face. The data type is determined by GPU availability to ensure efficient model execution in Colab.

!pip -q install -U transformers accelerate bitsandbytes rich

import os, re, json, textwrap, traceback
from typing import Dict, Any, List
from rich import print as rprint
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

MODEL_NAME = "Qwen/Qwen2.5-1.5B-Instruct"
DTYPE = torch.bfloat16 if torch.cuda.is_available() else torch.float32

Next, we load the tokenizer and model, configure it to operate in 4-bit for efficiency, and wrap everything in a text-generation pipeline for easy interaction in Colab.

Defining Key Functions

We define several key functions:

def chat(prompt: str, system: str = "", max_new_tokens: int = 512, temperature: float = 0.3) -> str:
    msgs = []
    if system:
        msgs.append({"role":"system","content":system})
    msgs.append({"role":"user","content":prompt})
    inputs = tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
    out = gen(inputs, max_new_tokens=max_new_tokens, do_sample=(temperature>0), temperature=temperature, top_p=0.9)
    return out[0]["generated_text"].strip()

This function sends prompts to the model with optional system instructions and sampling controls. Additionally, we implement an extract_json function that reliably parses structured JSON outputs from the model.

Implementing the Hierarchical Reasoning Model Loop

The full HRM loop involves planning subgoals, solving each by generating and executing Python code, critiquing the results, optionally refining the plan, and synthesizing a final answer.

def hrm_agent(task: str, context: Dict[str, Any] | None = None, budget: int = 2) -> Dict[str, Any]:
    ctx = dict(context or {})
    trace, plan_json = [], plan(task)
    for round_id in range(1, budget + 1):
        logs = [solve_subgoal(sg, ctx) for sg in plan_json.get("subgoals", [])]
        for L in logs:
            ctx_key = f"g{len(trace)}_{abs(hash(L['subgoal'])) % 9999}"
            ctx[ctx_key] = L["run"].get("result")
        verdict = critic(task, logs)
        trace.append({"round": round_id, "plan": plan_json, "logs": logs, "verdict": verdict})
        if verdict.get("action") == "submit": break
        plan_json = refine(task, logs) or plan_json
    final = synthesize(task, trace[-1]["logs"], plan_json.get("final_format", "Answer: "))
    return {"final": final, "trace": trace}

This implementation allows for iterative improvement of the reasoning process, culminating in a final answer that leverages the brain-inspired structure.

Conclusion

The system built in this guide illustrates how hierarchical reasoning enhances the performance of smaller models significantly. By combining planning, solving, and critiquing, we can empower a free Hugging Face model to tackle tasks with notable robustness.

For further exploration, check out the Paper and FULL CODES. Additionally, you can visit our GitHub Page for more tutorials, codes, and notebooks.

This journey demonstrates that advanced cognitive-like workflows are accessible to anyone willing to experiment and learn.

«`