Back to AI Research

AI Research

RADD: Retrieval-Augmented Discrete Diffusion for Mu... | AI Research

Key Takeaways

  • RADD: Retrieval-Augmented Discrete Diffusion for Multi-Modal Knowledge Graph Completion Multi-modal knowledge graph completion (MMKGC) aims to predict missin...
  • Most multi-modal knowledge graph completion (MMKGC) models use one embedding scorer to do both retrieval over the full entity set and final decision making.
  • We argue that this coupling is a core bottleneck: global high-recall search and local fine-grained disambiguation require different inductive biases.
  • Therefore, we propose a Retrieval-Augmented Discrete Diffusion (RADD) framework to decouple retrieve and reranking for MMKGC.
  • A relation-aware multimodal KGE retriever serves as both global retriever and distillation teacher, while a conditional discrete denoiser performs shortlist-level entity-identity generation for reranking.
Paper AbstractExpand

Most multi-modal knowledge graph completion (MMKGC) models use one embedding scorer to do both retrieval over the full entity set and final decision making. We argue that this coupling is a core bottleneck: global high-recall search and local fine-grained disambiguation require different inductive biases. Therefore, we propose a Retrieval-Augmented Discrete Diffusion (RADD) framework to decouple retrieve and reranking for MMKGC. A relation-aware multimodal KGE retriever serves as both global retriever and distillation teacher, while a conditional discrete denoiser performs shortlist-level entity-identity generation for reranking. Training combines KGE supervision, denoising cross-entropy, and temperature-scaled distillation from the retriever to the denoiser. At inference, the designed Diff-Rerank first forms a top-$K$ shortlist with the retriever and then reranks it with the denoiser, ensuring that recall is a strict prerequisite for precision. Experiments on three MMKGC benchmarks show that RADD achieves the best performance and consistent gains over strong unimodal, multimodal, and LLM-based baselines, while ablations further verify the contribution of each component.

RADD: Retrieval-Augmented Discrete Diffusion for Multi-Modal Knowledge Graph Completion
Multi-modal knowledge graph completion (MMKGC) aims to predict missing entities in a graph by using structural, visual, and textual information. A common problem in existing models is that they rely on a single "scorer" to perform two different tasks: searching through the entire set of possible entities and making a final, precise decision. The authors argue that these two tasks require different approaches—global search needs broad, high-recall efficiency, while final decision-making requires fine-grained, sharp discrimination. By forcing a single model to do both, performance is compromised. To solve this, the authors propose the RADD framework, which separates these tasks into two specialized modules.

Decoupling Search and Precision

The RADD framework splits the workload into two distinct stages. First, a relation-aware multimodal retriever scans the entire entity set to identify a small, high-quality "shortlist" of candidates. This retriever is designed for speed and broad coverage. Once this shortlist is formed, a second module—a conditional discrete denoiser—takes over. This denoiser focuses exclusively on the shortlist, performing fine-grained analysis to select the correct entity. By ensuring that the retriever handles the "search" and the denoiser handles the "precision," the model avoids the bottlenecks found in traditional single-scorer systems.

How the Components Work

The retriever uses a "relation gate" to combine structural, visual, and textual data, allowing the model to emphasize the most relevant information for any given relationship. The denoiser, meanwhile, operates as a discrete diffusion model. Instead of working with continuous vectors that require complex conversion, it predicts clean entity identities directly as discrete indices. During training, the two modules are linked through a process called teacher-student distillation, where the retriever’s knowledge is transferred to the denoiser. This ensures that the denoiser begins its work with a probability distribution already focused on the most plausible candidates.

Performance and Results

The authors tested RADD against 27 different baselines, including traditional KGC models, multimodal approaches, and LLM-augmented systems. Across three standard benchmarks, RADD consistently achieved the best performance. The experiments also highlighted the importance of the "Diff-Rerank" inference mechanism, which enforces a strict rule: the denoiser can only rank entities that the retriever has already included in its shortlist. This ensures that the final output is always constrained by the high-recall search performed in the first stage.

Key Takeaways

The research demonstrates that the "dual-objective coupling" of search and precision is a major structural bottleneck in current MMKGC models. By treating the task as a retrieval-augmented discrete diffusion problem, the authors provide a principled way to allocate these objectives to the modules best suited for them. The study also notes that head and tail prediction in knowledge graphs are inherently asymmetric—head prediction is often more difficult—and the RADD framework addresses this by using an asymmetric loss function that provides extra training focus on the harder task.

Comments (0)

No comments yet

Be the first to share your thoughts!