«`html

Meet Kosmos: An AI Scientist that Automates Data-Driven Discovery

Kosmos, developed by Edison Scientific, is an autonomous discovery system designed to conduct extensive research campaigns focused on a single objective. Given a dataset and an open-ended natural language goal, it performs iterative cycles of data analysis, literature search, and hypothesis generation, ultimately synthesizing the findings into a fully cited scientific report. A typical run lasts up to 12 hours, involves approximately 200 agent rollouts, executes around 42,000 lines of code, and reviews about 1,500 papers.

Architecture, World Model, and Agent Roles

The core design of Kosmos features a structured world model that serves as the system’s long-term memory. This world model is a database containing entities, relationships, experimental results, and open questions, which is updated after each task. Unlike a simple context window, it is queryable and structured, ensuring that information from earlier steps remains accessible even after processing vast amounts of data.

Kosmos employs two primary agents: a data analysis agent and a literature search agent. Each cycle, the system proposes up to 10 specific tasks based on the research objective and the current state of the world model. Tasks may include conducting a differential abundance analysis on a metabolomics dataset or searching for pathways linking a candidate gene to a disease phenotype. The agents write code, execute it in a notebook environment, or retrieve and read papers, subsequently writing structured outputs and citations back into the world model.

This iterative process continues for multiple cycles. At the conclusion of each run, a separate synthesis component reviews the world model and generates a report where every statement is linked to either a Jupyter notebook cell or a specific passage in the primary literature. This explicit provenance is crucial in scientific contexts, as it allows human collaborators to audit individual claims rather than treating the system as a black box.

Accuracy and Research Time Equivalence

The quality of Kosmos reports is evaluated by sampling 102 statements from three representative reports and asking domain experts to classify each statement as supported or refuted. Overall, 79.4 percent of statements are deemed accurate. Data analysis statements are the most reliable at approximately 85.5 percent accuracy, while literature statements are correct about 82.1 percent of the time. Synthesis statements, which combine evidence, have a lower accuracy rate of around 57.9 percent.

To estimate the equivalent human effort, the authors assume 2 hours for a typical data analysis trajectory and 15 minutes for reading a paper, then tally the trajectories and papers processed per run. This results in an estimated 4.1 expert months for a typical run, based on a 40-hour work week. In a separate survey, seven collaborating scientists rated a 20-step Kosmos run as equivalent to approximately 6.14 months of their own work on the same objective, with perceived effort scaling roughly linearly with the number of cycles up to 20.

Representative Discoveries

Kosmos has been tested on seven case studies across various fields, including metabolomics, materials science, neuroscience, statistical genetics, and neurodegeneration. In three instances, it independently reproduced prior human results without accessing the original preprints during the run. In four cases, it proposed mechanisms that the authors described as novel contributions to the literature.

In the first discovery, Kosmos analyzed metabolomics data from a mouse hypothermia experiment, identifying nucleotide metabolism as the dominant altered pathway in hypothermic brains. The system concluded that nucleotide salvage pathways dominate over de novo synthesis during protective hypothermia, aligning with an independent human analysis that was unpublished at the time of the run.

In the second discovery, Kosmos examined environmental logs from a perovskite solar cell fabrication system, confirming that absolute humidity during thermal annealing is the main determinant of device efficiency and identifying a critical humidity threshold described as a fatal filter, beyond which devices fail. This finding corresponds with a preprint in materials science that was not accessible to Kosmos during the run.

In the third discovery, Kosmos was provided with neuron-level reconstructions across several species and fitted distributions for neurite length, degree, and synapse counts. It concluded that degree and synapse distributions are better modeled as log-normal rather than scale-free, recovering power law scaling between neurite length and synapse count in most datasets. These results align with connectivity rules reported in an earlier neuroscience preprint.

The remaining four discoveries are classified as novel, including a Mendelian randomization analysis implicating circulating superoxide dismutase 2 as a protective factor for myocardial fibrosis, the definition of a Mechanistic Ranking Score integrating posterior inclusion probabilities and multiomic evidence for type 2 diabetes loci, a proteomic analysis ordering molecular events along a pseudotime axis in Alzheimer’s disease, and a large-scale single nucleus transcriptomic analysis linking age-related loss of flippase expression and exposure of phosphatidylserine signals to entorhinal cortex neuron vulnerability.

Key Takeaways

Kosmos is an autonomous AI scientist that runs up to 12 hours per objective, executing about 42,000 lines of code and reviewing approximately 1,500 papers per run, coordinated through a structured world model.
The system utilizes parallel data analysis and literature search agents that share a central world model, enabling coherent long-horizon reasoning across about 200 agent rollouts.
Expert evaluators found 79.4 percent of sampled report statements to be accurate, with data analysis and literature statements exceeding 80 percent accuracy, while interpretation statements remain less reliable.
A 20-cycle Kosmos run is rated by collaborators as equivalent to about 6 months of expert research effort, with the number of valuable findings scaling approximately linearly with cycle count up to 20.
Across seven case studies in metabolomics, materials science, neuroscience, statistical genetics, and neurodegeneration, Kosmos both reproduces unpublished or post-cutoff results and proposes novel mechanisms, while still requiring human scientists for dataset selection and validation.

Editorial Comments

Kosmos demonstrates the potential of a structured world model and domain-agnostic Edison agents when integrated with current LLM tooling, delivering measurable gains in reasoning depth, reproducibility, and traceability. However, it still relies on scientists for data curation, objective setting, and interpretation of synthesis statements that remain less reliable than data analysis and literature statements. Overall, Kosmos serves as a robust template for AI-accelerated science, rather than a replacement for human researchers.

Check out the Paper and Technical details. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! Are you on Telegram? Now you can join us on Telegram as well.

«`