Andrej Karpathy Releases ‘nanochat’: A Minimal, End-to-End ChatGPT-Style Pipeline You Can Train in ~4 Hours for ~$100

Understanding the Target Audience

The target audience for Andrej Karpathy’s release of nanochat primarily includes:

AI Researchers and Developers: Professionals interested in natural language processing and machine learning. They often seek ways to train and fine-tune language models efficiently.
Business Analysts: Individuals who are motivated to leverage AI for data analysis and decision-making processes.
Startups and Entrepreneurs: These users aim to implement AI solutions with limited resources, looking for cost-effective tools to enhance their offerings.
Technical Enthusiasts and Hobbyists: Individuals eager to learn and experiment with cutting-edge technology in the AI space.

Their pain points include:

High costs and complexity associated with training large language models.
Need for reproducible and scalable AI solutions that can be adapted to specific business needs.
Desire for streamlined workflows that reduce setup time and resource requirements.

Goals and interests encompass:

Building and fine-tuning AI models with minimal investment.
Accessing straightforward documentation and resources to aid in implementation.
Exploring innovative applications of AI in various business sectors.

Communication preferences lean toward:

Clear, technical documentation that provides actionable insights.
Access to community discussions and peer support.
Updates on new features or improvements in a concise format.

Overview of nanochat

Andrej Karpathy has open-sourced nanochat, a compact, dependency-light codebase implementing a full ChatGPT-style stack—from tokenizer training to web UI inference—focused on reproducible and hackable large language model (LLM) training on a single multi-GPU node.

The repository offers a single-script “speedrun” that executes the full loop: tokenization, base pretraining, mid-training on chat/multiple-choice/tool-use data, supervised fine-tuning (SFT), optional reinforcement learning (RL) on GSM8K, evaluation, and serving (CLI + ChatGPT-like web UI). The recommended setup uses an 8×H100 node; at approximately $24/hour, the 4-hour speedrun totals around $100. A post-run report.md summarizes metrics (CORE, ARC-E/C, MMLU, GSM8K, HumanEval, ChatCORE).

Technical Highlights

The tokenizer employs a custom Rust byte pair encoding (BPE), featuring a vocabulary size of 65,536 tokens. Training utilizes FineWeb-EDU shards, re-packaged for easy access. The evaluation bundle includes CORE (22 autocompletion datasets like HellaSwag, ARC, and BoolQ), stored in ~/.cache/nanochat/eval_bundle.

The speedrun configuration trains a depth-20 Transformer with approximately 560 million parameters, consistent with Chinchilla-style scaling (params × ~20 tokens). This model is estimated to have a training capability of approximately 4e19 FLOPs. Training employs Muon for matrix multiplication parameters and AdamW for embeddings/unembeddings, reporting loss in bits-per-byte (bpb) to ensure tokenizer invariance.

Mid-training, SFT, and Tool Use

Post-pretraining, mid-training adapts the base model for conversations (SmolTalk) and instructs multiple-choice behavior using 100,000 MMLU auxiliary-train questions. Tool use is incorporated within the training process through specific code blocks. The default mixture includes:

SmolTalk: 460,000 rows
MMLU auxiliary-training: 100,000 rows
GSM8K main: 8,000 rows

The SFT process fine-tunes the model on higher-quality conversations while ensuring test-time formatting aligns with training to minimize mismatch. Example metrics from the speedrun tier following SFT include:

ARC-Easy: 0.3876
ARC-Challenge: 0.2807
MMLU: 0.3151
GSM8K: 0.0455
HumanEval: 0.0854
ChatCORE: 0.0884

Tool use is integrated end-to-end, with a custom engine implementing key functionalities such as KV cache, prefill/decode inference, and a Python interpreter sandbox for enhanced tool-augmented training and evaluation.

Optional Reinforcement Learning on GSM8K

The final stage allows for optional RL on GSM8K, utilizing a simplified Generalized Relative Policy Optimization (GRPO) loop. The walkthrough clarifies omitted components compared to standard Proximal Policy Optimization (PPO) methods, focusing on maintaining a practical approach to reinforcement learning.

Cost and Quality Scaling

The README outlines additional scaling options beyond the ~$100 speedrun:

~$300 tier: d=26 (~12 hours), slightly surpassing GPT-2 CORE; requires additional pretraining shards and batch-size adjustments.
~$1,000 tier: ~41.6 hours, yielding significantly improved coherence and basic reasoning/coding ability.

Prior experimental runs suggested that a d=30 model trained for ~24 hours could achieve scores of 40s on MMLU, 70s on ARC-Easy, and 20s on GSM8K.

Evaluation Snapshot

An example report.md for the ~$100/≈4-hour run shows:

CORE: 0.2219 (base)
ARC-Easy: 0.3876
ARC-Challenge: 0.2807
MMLU: 0.3151
GSM8K: 0.0455
HumanEval: 0.0854

ChatCORE: 0.0884

Wall-clock time: 3h51m

Conclusion

Karpathy’s nanochat strikes a balance between accessibility and functionality. It offers a single, clean, dependency-light repository integrating tokenizer training, pretraining, mid-training, SFT, and an optional simplified RL process, resulting in a reproducible speedrun capable of producing detailed evaluation metrics.

For further details, check out the official repository and community discussions on GitHub.

External illustration