Back to AI Research

AI Research

RASER: Recoverability-Aware Selective Escalation Ro... | AI Research

Key Takeaways

  • Multi-hop question answering (QA) systems often struggle with efficiency.
  • To answer complex questions that require connecting multiple pieces of information,...
  • Multi-hop question-answering systems often use expensive retrieval on every question.
  • They may decompose the question, run several retrieval rounds, or search through bridge entities before answering.
  • All of these strategies rely on repeated LLM calls to rewrite or decompose the question, which increases extra token cost, and it is not fitting when the LLM budget is tight.
Paper AbstractExpand

Multi-hop question-answering systems often use expensive retrieval on every question. They may decompose the question, run several retrieval rounds, or search through bridge entities before answering. All of these strategies rely on repeated LLM calls to rewrite or decompose the question, which increases extra token cost, and it is not fitting when the LLM budget is tight. However, our analysis shows that lots of multi-hop questions are already answered correctly by a single one-shot RAG, so running an extra retrieval on every question wastes the budget. We introduce RASER (Recoverability-Aware Selective Escalation Router), a family of cheap routers built on one-shot RAG and six features from it. RASER-2 decides whether to stop or escalate to the extra-retrieval action PRUNE. RASER-3 chooses among one-shot RAG, PRUNE, and iterative retrieval IRCoT, using the same features but adding an explicit cost-accuracy trade-off. Neither router makes an extra LLM call to decide. Across six LLMs and three multi-hop QA benchmarks, both routers stay competitive with the other state-of-the-art (SOTA) baselines in F1 while spending only 41-49% of always-prune's tokens and also less than the iterative and decomposition retrieval baselines.

Multi-hop question answering (QA) systems often struggle with efficiency. To answer complex questions that require connecting multiple pieces of information, many systems automatically trigger expensive, multi-step retrieval processes for every single query. This approach is often wasteful, as many questions can be answered correctly with a single, simple retrieval step. This paper introduces RASER, a "Recoverability-Aware Selective Escalation Router" designed to intelligently decide when a complex question actually requires extra retrieval, saving significant computational costs without sacrificing accuracy.

The Problem with "Always-On" Retrieval

Current multi-hop QA systems frequently rely on strategies like iterative retrieval or question decomposition to find answers. While these methods are effective at finding hidden information, they are computationally expensive and require multiple calls to a Large Language Model (LLM). The authors performed a "recoverability analysis" and discovered that for a large portion of questions, the extra effort provides no measurable benefit—either the initial, simple retrieval is already correct, or the question is too difficult for any of the tested retrieval methods to solve. Consequently, running these expensive processes on every question leads to unnecessary token costs and wasted resources.

How RASER Makes Decisions

RASER functions as a lightweight decision layer that sits after an initial "one-shot" retrieval attempt. Instead of using an LLM to decide whether to escalate, it uses a fast, non-LLM-based classifier (a Gradient Boosting Machine) to analyze six simple features, such as the confidence of the initial answer, the length of the draft answer, and the similarity scores of the retrieved text chunks.
The system comes in two versions:

  • RASER-2: A binary router that decides whether to stop after the first attempt or "escalate" to a single bridge-retrieval step (PRUNE).

  • RASER-3: A more advanced, cost-aware router that chooses between three options: stopping, performing a single bridge step, or engaging in a more intensive iterative retrieval process (IRCoT). It uses a cost-penalty formula to determine which option provides the best balance between expected accuracy and the number of tokens spent.

Efficiency and Performance

By using these routers, the system avoids the "one-size-fits-all" approach to retrieval. Across six different LLMs and three major QA benchmarks, RASER maintained competitive accuracy (F1 scores) compared to state-of-the-art baselines. Most notably, it achieved these results while using only 41–49% of the tokens required by systems that perform extra retrieval on every question.

Key Takeaways

The core contribution of this research is the shift in perspective: multi-hop QA should be treated as a "recoverability-aware" problem rather than a simple retrieval problem. By identifying which questions are actually "bridgeable"—meaning they can be improved by extra steps—the system can focus its budget on the cases where it matters most. Because the router itself requires no extra LLM calls, it provides a highly efficient way to optimize existing RAG (Retrieval-Augmented Generation) pipelines for tighter budgets.

Comments (0)

No comments yet

Be the first to share your thoughts!