Sakana AI Released ShinkaEvolve: An Open-Source Framework that Evolves Programs for Scientific Discovery with Unprecedented Sample-Efficiency
Table of contents
- What problem is it actually solving?
- Does the sample-efficiency claim hold beyond toy problems?
- How does the evolutionary loop look in practice?
- What are the concrete results?
- How does this compare to AlphaEvolve and related systems?
- Summary
- FAQs — ShinkaEvolve
What problem is it actually solving?
Most “agentic” code-evolution systems explore by brute force: they mutate code, run it, score it, and repeat—consuming enormous sampling budgets. ShinkaEvolve specifically targets that waste with three interacting components:
- Adaptive parent sampling to balance exploration and exploitation. Parents are drawn from “islands” using fitness- and novelty-aware policies rather than always selecting the current best.
- Novelty-based rejection filtering to avoid re-evaluating near-duplicates. Mutable code segments are embedded, and if cosine similarity exceeds a threshold, a secondary LLM acts as a “novelty judge” before execution.
- Bandit-based LLM ensembling, where the system learns which model (e.g., GPT, Gemini, Claude, or DeepSeek families) yields the most significant relative fitness jumps and routes future mutations accordingly.
Does the sample-efficiency claim hold beyond toy problems?
The research team evaluates four distinct domains, demonstrating consistent gains with small budgets:
- Circle packing (n=26): Reaches an improved configuration in roughly 150 evaluations.
- AIME math reasoning (2024 set): Evolves agentic scaffolds that trace a Pareto frontier (accuracy vs. LLM-call budget), outperforming hand-built baselines under limited query budgets.
- Competitive programming (ALE-Bench LITE): Starting from ALE-Agent solutions, ShinkaEvolve delivers ~2.3% mean improvement across 10 tasks.
- LLM training (Mixture-of-Experts): Evolves a new load-balancing loss that improves perplexity and downstream accuracy.
How does the evolutionary loop look in practice?
ShinkaEvolve maintains an archive of evaluated programs with fitness, public metrics, and textual feedback. For each generation, the system samples an island and parent(s), constructs a mutation context with top-K and random “inspiration” programs, and proposes edits through diff edits, full rewrites, and LLM-guided crossovers, while protecting immutable code regions. Executed candidates update both the archive and the bandit statistics that steer subsequent LLM/model selection.
What are the concrete results?
ShinkaEvolve has demonstrated significant results in various domains:
- Circle packing: Combined structured initialization, hybrid global-local search, and escape mechanisms discovered by the system, not hand-coded.
- AIME scaffolds: A three-stage expert ensemble achieving accuracy at a cost sweet spot.
- ALE-Bench: Targeted engineering wins that enhance scores without complete rewrites.
- MoE loss: An entropy-modulated under-use penalty that reduces miss-routing and improves perplexity/benchmarks.
How does this compare to AlphaEvolve and related systems?
While AlphaEvolve demonstrated strong closed-source results, it required a higher number of evaluations. ShinkaEvolve replicates and surpasses the circle-packing result using orders-of-magnitude fewer samples and releases all components as open-source.
Summary
ShinkaEvolve is an Apache-2.0 framework for LLM-driven program evolution that reduces evaluations from thousands to hundreds by combining fitness/novelty-aware parent sampling, embedding-plus-LLM novelty rejection, and a UCB1-style adaptive LLM ensemble. It sets a new state-of-the-art on circle packing (~150 evaluations), finds stronger AIME scaffolds under strict query budgets, and improves ALE-Bench solutions by ~2.3% mean gain.
FAQs — ShinkaEvolve
- What is ShinkaEvolve? An open-source framework that couples LLM-driven program mutations with evolutionary search to automate algorithm discovery and optimization.
- How does it achieve higher sample-efficiency than prior evolutionary systems? Through adaptive parent sampling, novelty-based rejection, and a bandit-based selector that routes mutations to the most promising LLMs.
- What supports the results? It achieves state-of-the-art circle packing with ~150 evaluations and improves ALE-Bench solutions over strong baselines.
- Where can I run it and what’s the license? The GitHub repo provides a WebUI and examples; ShinkaEvolve is released under Apache-2.0.
Check out the Technical details, Paper, and GitHub Page. Feel free to follow us on Twitter and subscribe to our Newsletter.