Google AI Research has introduced PaperOrchestra, a multi-agent framework designed to transform raw experimental data and lab notes into submission-ready LaTeX research papers. While previous autonomous research systems were limited to internal experimental pipelines or lacked the contextual awareness to produce comprehensive literature reviews, PaperOrchestra functions as a standalone writing tool. By processing unstructured inputs, it generates manuscripts complete with figures, verified citations, and logically coherent sections formatted for academic conferences.
A Multi-Agent Pipeline for Scientific Writing
The PaperOrchestra framework utilizes five specialized agents to manage the writing process. An Outline Agent first converts raw materials into a structured JSON plan, which includes visualization strategies and citation hints. This triggers parallel workflows: a Plotting Agent uses the PaperBanana tool to create and iteratively refine figures, while a Literature Review Agent performs a two-phase search. This review agent uses web search and the Semantic Scholar API to verify references, ensuring that citations are accurate and relevant to the conference submission deadline.
Once the foundational elements are prepared, a Section Writing Agent authors the core manuscript, including the methodology and experiments, by extracting numeric values directly from experimental logs. Finally, a Content Refinement Agent employs a simulated peer-review system called AgentReview. This agent iteratively optimizes the manuscript, ensuring that revisions only proceed if they improve the paper's overall quality score, a step that has proven critical in achieving higher simulated acceptance rates.
Benchmarking Performance and Quality
To evaluate the system, the research team developed PaperWritingBench, a standardized benchmark consisting of 200 accepted papers from CVPR 2025 and ICLR 2025. Testing revealed that PaperOrchestra significantly outperforms existing AI baselines in both literature review quality and overall manuscript coherence. In automated side-by-side evaluations, the system achieved win margins of 88% to 99% in literature review quality compared to other AI models.
Human evaluations further validated these results, with researchers noting that PaperOrchestra’s literature synthesis achieved a 43% tie or win rate against human-written ground truth. The system also demonstrated superior citation depth, averaging 45 to 48 citations per paper—closely mirroring the approximately 59 citations found in human-authored works—whereas competing AI systems often struggled to move beyond basic, "must-cite" references.
Maintaining Research Integrity
Despite its high level of automation, PaperOrchestra is positioned as an assistive tool rather than a replacement for human researchers. The system is incapable of fabricating experimental results, and its refinement agent is programmed to ignore requests for data not present in the original logs. Consequently, human researchers retain full accountability for the validity and originality of the final manuscript. The framework completes the entire writing process in a mean of 39.6 minutes, providing a scalable solution for researchers looking to bridge the gap between finished experiments and a polished submission.

Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!