Stanford Researchers Introduced Biomni: A Biomedical AI Agent for Automation Across Diverse Tasks and Data Types

Biomedical research is a rapidly evolving field aiming to advance human health through understanding disease mechanisms, identifying new therapeutic targets, and developing effective treatments. This area encompasses various disciplines, such as genetics, molecular biology, pharmacology, and clinical studies, which demand specialized tools and deep expertise. The increasing complexity of biomedical data, experiments, and literature presents both opportunities and challenges. Researchers must integrate findings from genomics, proteomics, and other data sources to generate hypotheses, design experiments, and interpret results effectively. Efficiently managing this complexity is crucial for accelerating scientific discovery and translating results into clinical applications.

The core challenges in biomedical research include managing the vast amount of data, methods, and tools necessary for meaningful results. Researchers often encounter fragmented workflows, relying on numerous specialized tools that do not integrate smoothly. This can create bottlenecks when designing experiments, processing large datasets, or interpreting multimodal biomedical information. The situation is further complicated by the limited availability of expert researchers, making it difficult to keep pace with the growing body of scientific knowledge. Consequently, significant portions of biomedical data remain underutilized, and connections between findings across different subfields are frequently overlooked. A new approach is needed—one that can scale expertise, manage data complexity, and support integrated workflows across various biomedical domains.

Existing tools for biomedical research often focus on narrow tasks, such as specific gene analysis or drug-target interactions. These tools require careful setup, domain-specific knowledge, and manual integration into broader workflows. While large language models (LLMs) have shown promise in tasks like biomedical question answering, they typically cannot interact directly with specialized tools or databases. Previous efforts to create AI agents for biomedical tasks have relied on predefined workflows or templates, limiting their flexibility. As a result, researchers have struggled to find AI systems that can adapt to diverse biomedical tasks or execute complex analyses end-to-end.

Introduction to Biomni

Researchers from Stanford University, Genentech, the Arc Institute, the University of Washington, Princeton University, and the University of California, San Francisco, introduced Biomni, a general-purpose biomedical AI agent. Biomni combines a foundational biomedical environment, Biomni-E1, with an advanced task-executing architecture, Biomni-A1. Biomni-E1 was constructed by mining tens of thousands of biomedical publications across 25 subfields, extracting 150 specialized tools, 105 software packages, and 59 databases, forming a unified biomedical action space. Biomni-A1 dynamically selects tools, formulates plans, and executes tasks by generating and running code, enabling the system to adapt to diverse biomedical problems. This integration of reasoning, code-based execution, and resource selection allows Biomni to perform a wide range of tasks autonomously, including bioinformatics analyses, hypothesis generation, and protocol design.

Unlike static function-calling models, Biomni’s architecture allows it to flexibly interleave code execution, data querying, and tool invocation, creating a seamless pipeline for complex biomedical workflows.

Performance and Capabilities

Biomni-A1 employs an LLM-based tool selection mechanism to identify relevant resources based on user goals. It uses code as a universal interface to compose complex workflows with procedural logic, including loops, parallelization, and conditional steps. An adaptive planning strategy enables Biomni to iteratively refine plans during task execution, ensuring context-aware and responsive behavior. Biomni’s performance has been rigorously evaluated through multiple benchmarks. On the LAB-Bench benchmark, Biomni achieved 74.4% accuracy in DbQA and 81.9% in SeqQA, outperforming human experts (74.7% and 78.8%, respectively). On the HLE benchmark covering 14 subfields, Biomni scored 17.3%, surpassing base LLMs by 402.3% and coding agents by 43.0%. Real-world case studies highlighted Biomni’s ability to autonomously generate 10-step pipelines analyzing 458 wearable sensor files, identifying a postprandial temperature increase of 2.19°C across individuals, and analyzing 227 nights of sleep data to explore mid-week peaks in sleep efficiency and circadian regularity.

Biomni’s ability to address complex multi-omics analyses is notable; it processed over 336,000 single-nucleus RNA-seq and ATAC-seq profiles from human embryonic skeletal data. Biomni constructed a 10-stage analysis pipeline to predict transcription factor-target gene links, filter results using chromatin accessibility data, and summarize findings in a structured report. The agent managed all aspects of the analysis, including code generation, error debugging, and results interpretation, producing outputs such as trajectory plots, heatmaps, and PCA biplots. These capabilities showcase Biomni’s potential in managing large-scale, multi-modal datasets, identifying biological patterns, and accelerating the transition from raw data to testable hypotheses. By executing between 6 and 24 distinct steps per task and integrating up to 4 specialized tools, eight software packages, and three unique data lake items, Biomni mirrors human scientists’ workflows while drastically reducing manual effort.

Key Takeaways

Biomni-E1 integrates 150 specialized tools, 105 software packages, and 59 databases for biomedical research.
Performance gains average 402.3% over base LLMs, 43.0% over coding agents, and 20.4% over its ablated variant.
Biomni executed a 10-step pipeline analyzing 458 wearable sensor files, revealing a 2.19°C average postprandial temperature rise.
On the LAB-Bench benchmark, Biomni achieved 74.4% accuracy in DbQA and 81.9% in SeqQA, surpassing human experts.
Biomni handled a complex multi-omics dataset of over 336,162 profiles, generating interpretable outputs, including gene regulatory networks.
Average task execution involves 6-24 steps, utilizing up to 4 tools, eight software packages, and 3 data lake items.
Biomni autonomously generates PCA plots, heatmaps, trajectory plots, and cluster maps, producing human-readable reports without manual intervention.

In conclusion, Biomni represents a significant advancement in biomedical AI, merging reasoning, code execution, and dynamic resource integration into a single system. Researchers have demonstrated its ability to generalize across tasks, execute complex workflows without predefined templates, and produce results that rival or exceed human expertise in various areas. The system’s capability to manage large datasets, compose complex pipelines, and generate human-readable reports suggests it has the potential to significantly accelerate biomedical discovery, reduce the burden on researchers, and enable new insights.

Check out the Paper, Code and Try it here. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and join our 95k+ ML SubReddit and Subscribe to our Newsletter.