Multi-hop question answering (QA) systems often struggle with efficiency. To answer complex questions that require connecting multiple pieces of information, many systems automatically trigger expensive, multi-step retrieval processes for every single query. This approach is often wasteful, as many questions can be answered correctly with a single, simple retrieval step. This paper introduces RASER, a "Recoverability-Aware Selective Escalation Router" designed to intelligently decide when a complex question actually requires extra retrieval, saving significant computational costs without sacrificing accuracy.
The Problem with "Always-On" Retrieval
Current multi-hop QA systems frequently rely on strategies like iterative retrieval or question decomposition to find answers. While these methods are effective at finding hidden information, they are computationally expensive and require multiple calls to a Large Language Model (LLM). The authors performed a "recoverability analysis" and discovered that for a large portion of questions, the extra effort provides no measurable benefit—either the initial, simple retrieval is already correct, or the question is too difficult for any of the tested retrieval methods to solve. Consequently, running these expensive processes on every question leads to unnecessary token costs and wasted resources.
How RASER Makes Decisions
RASER functions as a lightweight decision layer that sits after an initial "one-shot" retrieval attempt. Instead of using an LLM to decide whether to escalate, it uses a fast, non-LLM-based classifier (a Gradient Boosting Machine) to analyze six simple features, such as the confidence of the initial answer, the length of the draft answer, and the similarity scores of the retrieved text chunks.
The system comes in two versions:
RASER-2: A binary router that decides whether to stop after the first attempt or "escalate" to a single bridge-retrieval step (PRUNE).
RASER-3: A more advanced, cost-aware router that chooses between three options: stopping, performing a single bridge step, or engaging in a more intensive iterative retrieval process (IRCoT). It uses a cost-penalty formula to determine which option provides the best balance between expected accuracy and the number of tokens spent.
Efficiency and Performance
By using these routers, the system avoids the "one-size-fits-all" approach to retrieval. Across six different LLMs and three major QA benchmarks, RASER maintained competitive accuracy (F1 scores) compared to state-of-the-art baselines. Most notably, it achieved these results while using only 41–49% of the tokens required by systems that perform extra retrieval on every question.
Key Takeaways
The core contribution of this research is the shift in perspective: multi-hop QA should be treated as a "recoverability-aware" problem rather than a simple retrieval problem. By identifying which questions are actually "bridgeable"—meaning they can be improved by extra steps—the system can focus its budget on the cases where it matters most. Because the router itself requires no extra LLM calls, it provides a highly efficient way to optimize existing RAG (Retrieval-Augmented Generation) pipelines for tighter budgets.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!