From Research Question to Scientific Workflow: Leveraging Agentic AI for Science Automation
Scientific research often involves complex computational workflows, such as processing large genomic datasets. While systems exist to manage the execution of these tasks, the process of translating a scientist's natural language research question into a precise, executable workflow remains a manual, error-prone, and difficult task. This paper introduces an agentic architecture designed to bridge this gap, automating the translation from research intent to a reproducible workflow while ensuring that the process remains auditable and reliable.
A Three-Layer Architecture
The researchers propose a system that decomposes the translation process into three distinct layers to balance the flexibility of AI with the need for scientific rigor:
Semantic Layer: An LLM interprets the scientist's natural language request and converts it into a structured "intent." This intent captures the core parameters of the research, such as specific populations or genomic regions, without dictating the technical execution details.
Deterministic Layer: Once the intent is defined, the system uses validated, non-AI generators to create the actual workflow. Because this layer does not rely on the LLM, the system guarantees that identical intents will always produce the exact same workflow, ensuring reproducibility.
Knowledge Layer: Domain experts create "Skills"—simple, version-controlled markdown documents. These documents act as a source of truth for the system, providing necessary vocabulary mappings, parameter constraints, and optimization strategies that the LLM uses to interpret queries accurately.
The Role of Expert-Authored Skills
A central innovation of this architecture is the use of Skills. Unlike traditional AI approaches that rely on ephemeral few-shot examples or opaque model training, Skills are human-readable documents that allow domain experts to encode their knowledge directly. For example, a geneticist can define how to map "European" to the specific population code "EUR" or how to translate a disease name into precise genomic coordinates. Because these files are version-controlled and auditable, they provide a transparent way to manage the system's logic without requiring the user to have machine learning expertise.
Improving Accuracy and Efficiency
The researchers evaluated their system using the 1000 Genomes population genetics workflow. By incorporating Skills, they significantly improved the accuracy of intent extraction, raising full-match accuracy from 44% to 83% in their testing.
Furthermore, the system employs "deferred workflow generation." Instead of guessing resource needs upfront, the system provisions the infrastructure first, measures the actual data sizes, and then calibrates the workflow's parallelism accordingly. This approach proved highly effective, reducing data transfer by 92% compared to standard methods. The entire end-to-end pipeline is designed to be efficient, with the LLM overhead for processing a query remaining under 15 seconds at a cost of less than $0.001 per query.
Key Considerations
The architecture is designed to be collaborative, meaning it keeps a "human-in-the-loop" at critical stages. The system's orchestrator, known as the Conductor, manages the interaction and requires human validation before any infrastructure is provisioned or execution begins. This ensures that while the system automates the heavy lifting of translation and configuration, the scientist retains full authority over the research process.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!