AI Scientist V2: First Fully AI‑Generated Research Paper

Table of Contents

When AI Becomes the Scientist: Sakana AI’s Breakthrough with The AI Scientist‑v2

Human Curiosity Meets Machine Discovery

Science has always been driven by human curiosity, creativity, and a willingness to challenge conventional wisdom. But what if machines — not just tools — could perform the scientific process themselves: proposing ideas, conducting experiments, and writing papers? That future isn’t distant. It’s happening now with AI Scientist V2.

A Milestone in AI-Generated Research

In early 2025, Tokyo-based startup Sakana AI unveiled AI Scientist V2, an autonomous research agent capable of producing a fully AI-generated scientific paper. This paper passed peer review at the ICLR 2025 workshop — a prestigious conference in machine learning and AI research. The submitted system handled everything end-to-end: generating ideas, designing experiments, executing code, analyzing data, visualizing results, and composing the manuscript.

The experiment was conducted in collaboration with researchers from the University of British Columbia and the University of Oxford, under full cooperation from ICLR leadership and workshop organizers, including IRB approval from UBC.

The Submission and Review Process of AI Scientist V2

Sakana generated three fully autonomous papers on topics related to deep learning challenges, such as compositional generalization. One paper received an average review score of 6.33—above the threshold for acceptance—and was officially accepted into the ICLR 2025 workshop, “I Can’t Believe It’s Not Better: Challenges in Applied Deep Learning”. However, in line with the agreed experimental protocol, Sakana withdrew the accepted paper before publication to uphold transparency and respect conference ethics norms.

What The AI Scientist V2 Brings to the Table

Fully Autonomous Research Workflow: The AI Scientist V2 autonomously generated novel hypotheses, wrote experimental code, ran experiments, visualized results, and produced a formatted manuscript with figures and references.

Agentic Tree Search Architecture: Unlike the earlier V1 model that relied on templates, V2 uses a progressive tree search guided by an experiment manager. This enables broader domains and more exploratory research directions.

AI Reviewer Feedback Loop: The system includes an AI-based review component that iteratively refines papers using near-human critique and feedback on visuals.

Limits and Ethical Considerations of AI Scientist V2

Independent evaluations and post-mortems revealed notable shortcomings in the system:

High Percentage of Coding Errors: About 42% of the generated code contained errors.
Poor Novelty Assessments: The generated papers lacked the novel insights typically expected in academic research.
Hallucinated Citations & Incomplete Figures: The generated citations were sometimes fabricated, and some figures were incomplete or incorrectly formatted.
Structural Anomalies: Papers resembled undergraduate-level submissions in rigor, lacking full scientific insight.

Despite these challenges, AI Scientist V2 provides a fascinating glimpse into the future of research. However, it’s clear that some limitations persist in real reasoning, understanding, and experimental robustness.

Debates and Expert Perspectives on AI-Generated Research

This milestone sparked intense debate in the AI research community:

Leopold Ashen Brener, an AI researcher, predicts that machines with advanced reasoning could eventually surpass human researchers as early as 2027.
On the other hand, Yan LeCun, Meta’s chief AI scientist, cautioned that today’s systems excel at spotting patterns but lack real understanding or creativity, critical elements for true scientific innovation.

Sakana acknowledged these concerns and emphasized full transparency in their experiment. The withdrawal of the accepted paper was not due to technical failure but to maintain scientific integrity and uphold conference norms.

What the Experiment Means for Science

This experiment marks an important inflection point in scientific discovery:

It demonstrates that AI can produce research strong enough to pass peer review under double-blind conditions.
It pushes the conversation forward about AI-authored science, scientific integrity, and transparency in AI contributions.
It reveals the gap between pattern-based generation and the deeper creative insights that characterize genuine scientific innovation.

Moving Toward a Hybrid Future in Scientific Discovery

The future of scientific research likely belongs to AI-human collaboration. Here’s why:

AI’s Role in Accelerating Research: AI can accelerate experimentation, scale ideation, and handle repetitive workflows at low cost and high speed. For instance, full paper generation can be done for less than $15 USD, with minimal human oversight (AI Critique).
Humans’ Indispensable Role: Humans are still essential for intuition, domain expertise, novel thinking, critical ethics, and interpreting the broader implications beyond the data.

Rather than replacing researchers, systems like AI Scientist V2 may become powerful collaborators—supercharging the scientific process while leaving ultimate judgment and oversight in human hands.

Key Takeaways from AI Scientist V2’s Milestone

Insight	Summary
Milestone	First fully AI-generated paper passed peer review at ICLR 2025 workshop
Autonomy	AI Scientist V2 handled every phase: from concept to paper
Limitations	Issues with novelty, reproducibility, reasoning, and structure
Ethics	Submission was withdrawn to maintain scientific trust and transparency
Looking Ahead	AI-human partnerships may reshape future scientific discovery

The Road Ahead: What to Watch

Research on Reliability: Further studies will evaluate reproducibility, error rates, and verification gaps in AI-generated science (e.g., recent critiques show failures without strong implementation capability).
Policy and Norms: Scientific communities will define guidelines for disclosing AI involvement in research, peer-review standards, and authorship ethics.
Future Versions: As LLMs evolve, newer versions of AI Scientist V2 may produce work that rivals human scientists in novelty and impact.
Collaborative Science: Hybrid workflows could emerge, where AI accelerates hypothesis testing and data exploration, while humans guide interpretation, ethics, and vision.

For further reading, read Sakana’s papers on this topic by clicking on the links below: