Cisco AI has introduced FAPO, a Fully Automated Prompt Optimization system designed to streamline the development of reliable large language model (LLM) applications. By leveraging Claude Code orchestration, FAPO autonomously optimizes multi-step LLM pipelines, moving beyond simple prompt adjustments to address complex structural failures. The project is now available as an open-source tool under the Apache 2.0 license.
A Multi-Level Optimization Framework
FAPO functions as a multi-tenant framework where each optimization project is isolated within its own directory. The system utilizes a core engine named hephaestus, which is domain-agnostic and manages chain execution, scoring, and evaluation. To begin, users provide a dataset of paired inputs and expected outputs, which FAPO splits into validation and held-out test sets. From a basic task description, the system can scaffold the initial prompt, the chain, and the necessary scoring logic.
The optimization process operates through three escalating levels: prompt edits, parameter adjustments, and structural changes. By using step-level failure attribution, FAPO identifies whether a failure stems from retrieval issues, cascading errors, format constraints, or reasoning gaps. It exhausts lower-cost interventions, such as prompt refinement, before escalating to more complex structural modifications like adding self-reflection nodes or altering chain topology.
Evaluation and Performance
In performance evaluations conducted by Cisco, FAPO was compared against the Generalized Evolutionary Prompt Architecture (GEPA), a state-of-the-art prompt optimizer. Across 18 model-benchmark comparisons, FAPO outperformed GEPA in 15 instances, achieving a mean gain of 14.1 percentage points. On benchmarks such as HoVer and IFBench, where FAPO successfully escalated to structural pipeline changes, it won all six model-benchmark pairs with a mean gain of 33.8 percentage points.
To ensure reliability and prevent overfitting, FAPO incorporates several guardrails. The system inspects only the training split of the provided data, while validation and test sets are reserved for aggregate scoring. Furthermore, every proposed variant is saved as an immutable file rather than being edited in place, and an independent reviewer agent validates each proposal for scope compliance and potential data leakage before it is executed.
Practical Applications
FAPO is specifically designed for multi-step pipelines rather than single-prompt tasks. Its capabilities are well-suited for complex workflows such as multi-hop question answering, where the system can retrieve documents, extract facts, and reason over evidence. It also supports instruction-following tasks, classification projects, and the optimization of ReAct agents through trajectory and LLM-as-Judge scoring.
The system supports multiple providers, including OpenAI, Baseten, and SageMaker, and allows for the use of either Claude Code or Codex as the optimization agent. Once an optimization run is complete, users can review the results, including every prompt variant and per-variant analysis, through a local read-only interface known as FAPO Explorer.

Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!