An Agentic AI Framework to Accelerate Scientific Di...

An Agentic AI Framework to Accelerate Scientific Di... | AI Research

Key Takeaways

This paper introduces an agentic AI framework designed to transform plant phenotyping from a slow, manual process into an interactive, autonomous discovery p...
High-throughput plant phenotyping now generates image derived datasets far faster than scientists can analyze them.
We present an end-to-end agentic AI framework that turns the facility from a data factory into an interactive autonomous, discovery platform, where scientists partner with AI agents to accelerate time to insight.
The framework turns a days- to weeks-long analysis process into an interactive loop where agents reason over results, recommend next analyses, and respond to follow-up questions in seconds.
This paper introduces an agentic AI framework designed to transform plant phenotyping from a slow, manual process into an interactive, autonomous discovery platform.

Paper AbstractExpand

High-throughput plant phenotyping now generates image derived datasets far faster than scientists can analyze them. At Oak Ridge National Laboratory's Advanced Plant Phenotyping Laboratory (APPL), automated stations image hundreds of plants daily across multiple remote sensing modalities; yet, trait extraction and interpretation remain manual, expert-bound, and strictly post-hoc, making analysis, not acquisition, the binding constraint on discovery. We present an end-to-end agentic AI framework that turns the facility from a data factory into an interactive autonomous, discovery platform, where scientists partner with AI agents to accelerate time to insight. A conversational Co-Scientist Agent translates a scientist's natural-language question into a structured analysis plan, and a headless Compute Agent dispatches Vision Transformer segmentation and trait extraction on the Frontier exascale supercomputer. The two agents run in separate security and resource domains and communicate over a secure, token-authenticated streaming channel, a design that accounts for the federation, data-movement, and provenance realities cloud-native agentic frameworks ignore, ensuring end-to-end provenance is captured for every interaction. The framework turns a days- to weeks-long analysis process into an interactive loop where agents reason over results, recommend next analyses, and respond to follow-up questions in seconds.

This paper introduces an agentic AI framework designed to transform plant phenotyping from a slow, manual process into an interactive, autonomous discovery platform. At Oak Ridge National Laboratory’s Advanced Plant Phenotyping Laboratory (APPL), automated stations generate massive amounts of image data daily, but the actual analysis—extracting meaningful biological traits—has historically been a bottleneck that takes days or weeks. By integrating AI agents with high-performance computing, this framework allows scientists to ask natural-language questions and receive actionable insights in seconds.

Turning Data Factories into Discovery Platforms

The framework replaces traditional, brittle analysis pipelines with an interactive loop. When a scientist asks a question—such as identifying which plant genotypes grew the most over a specific period—a "Co-Scientist Agent" translates that request into a structured plan. This plan is then executed by a "Compute Agent" on the Frontier exascale supercomputer. This approach allows researchers to iterate on their hypotheses in real-time, rather than waiting for post-hoc analysis after an experiment has concluded.

Scaling Analysis with Vision Transformers

To handle the terabytes of multimodal imagery produced by the facility, the system uses a Vision Transformer (ViT) model. Unlike older methods that struggle with complex plant shapes or varying growth stages, this model is fine-tuned on APPL imagery to accurately segment plants from their backgrounds. By using a sliding-window strategy, the system processes high-resolution images efficiently, converting raw sensor data into quantitative traits like height, leaf area, and physiological markers.

Bridging Security and Provenance

A key challenge in scientific AI is ensuring that results are reproducible and secure. The framework operates across two separate security and resource domains, communicating through a secure, token-authenticated channel. This design ensures that every step of the analysis—from the initial prompt to the final trait extraction—is recorded. By capturing full provenance, the system provides an audit trail that supports scientific trust and helps curate high-quality data for future biological foundation models.

Moving Beyond Manual Bottlenecks

The primary goal of this framework is to shift the burden of analysis from the scientist to the machine. By automating the extraction of traits and providing an interface for follow-up questions, the system allows researchers to focus on biological discovery rather than data processing. This interactive loop enables scientists to refine their research during active experiments, effectively doubling the productivity of the facility by removing the "analysis gap" that currently limits high-throughput phenotyping.