Back to AI Research

AI Research

Uncertainty-Aware Hybrid Retrieval for Long-Documen... | AI Research

Key Takeaways

  • Uncertainty-Aware Hybrid Retrieval for Long-Document RAG addresses a fundamental challenge in Retrieval Augmented Generation (RAG): the trade-off between ret...
  • Retrieval augmented generation (RAG) depends critically on the quality and granularity of retrieved evidence.
  • Large retrieval units preserve context but often introduce irrelevant content, which can dilute answer bearing evidence and worsen long context utilization.
  • Fine-grained units are more compact, but they may be difficult to retrieve reliably because short chunks can lack semantic, lexical, or bridging cues needed to match the query.
  • We propose Uncertainty-aware Multi-Granularity RAG (UMG-RAG), a training-free hybrid retrieval framework that treats chunk granularity as query-specific reliability estimation.
Paper AbstractExpand

Retrieval augmented generation (RAG) depends critically on the quality and granularity of retrieved evidence. Large retrieval units preserve context but often introduce irrelevant content, which can dilute answer bearing evidence and worsen long context utilization. Fine-grained units are more compact, but they may be difficult to retrieve reliably because short chunks can lack semantic, lexical, or bridging cues needed to match the query. We propose Uncertainty-aware Multi-Granularity RAG (UMG-RAG), a training-free hybrid retrieval framework that treats chunk granularity as query-specific reliability estimation. Instead of training a new retriever or modifying the generator, UMG-RAG uses existing dense and sparse retrievers as complementary experts across multiple chunk granularities. For each query, it converts each expert-granularity score list into an evidence distribution, estimates reliability from distribution entropy, and fuses candidates according to query-specific semantic, lexical, and granularity confidence. We further introduce UMGP-RAG, a parent promotion variant that uses fine-grained hits to locate relevant evidence while returning broader non-redundant parent chunks for local coherence. Experiments on question answering benchmarks show that uncertainty-aware fusion and parent promotion improve generation quality while maintaining a lightweight, plug-and-play retrieval pipeline.

Uncertainty-Aware Hybrid Retrieval for Long-Document RAG addresses a fundamental challenge in Retrieval Augmented Generation (RAG): the trade-off between retrieval granularity and context quality. When retrieval units are too large, they often include irrelevant information that distracts the language model. When they are too small, they may lack the necessary context for the model to understand the retrieved evidence. This paper introduces a framework that dynamically balances these needs without requiring additional training or modifications to the underlying language model.

The Granularity Trade-off

The core problem is that neither coarse nor fine-grained retrieval is universally effective. Coarse units preserve document-level context but introduce noise, which can lead to the "lost-in-the-middle" phenomenon where models struggle to use evidence buried in long prompts. Conversely, fine-grained units are compact and precise but are often difficult to retrieve reliably because they may lack the semantic or lexical cues required to match a specific query. The authors propose that an effective system should be able to leverage both, using fine-grained units for precision and broader units for coherence.

How the Framework Works

The proposed solution, UMG-RAG, treats chunk granularity as a query-specific reliability estimation problem. It uses existing dense and sparse retrievers as "experts" to retrieve candidates across multiple granularities. Instead of using fixed weights to combine these results, the system calculates the "sharpness" of each expert's score distribution.
If an expert produces a peaked distribution, it indicates high confidence in its results; a flat distribution suggests high uncertainty. By calculating the entropy of these distributions, the framework assigns higher importance to the most reliable retrieval sources for each specific query.

Parent Promotion for Coherence

To further improve performance, the authors introduced UMGP-RAG, a variant that incorporates "parent promotion." In this approach, the system uses fine-grained chunks to precisely locate relevant information, but then returns a broader "parent" chunk to the generator to ensure the context remains coherent. The system also includes a deduplication mechanism that removes redundant or overlapping chunks, ensuring the final context provided to the generator is both compact and informative.

Performance and Flexibility

Experiments on question-answering benchmarks demonstrate that this uncertainty-aware fusion and parent promotion strategy improves generation quality across various retrievers and language models. Because the framework is training-free and plug-and-play, it can be integrated into existing RAG pipelines without the need for complex model retraining or the construction of additional data structures like knowledge graphs. This makes it a lightweight, adaptable solution for improving long-document RAG performance.

Comments (0)

No comments yet

Be the first to share your thoughts!