RADD: Retrieval-Augmented Discrete Diffusion for Multi-Modal Knowledge Graph Completion
Multi-modal knowledge graph completion (MMKGC) aims to predict missing entities in a graph by using structural, visual, and textual information. A common problem in existing models is that they rely on a single "scorer" to perform two different tasks: searching through the entire set of possible entities and making a final, precise decision. The authors argue that these two tasks require different approaches—global search needs broad, high-recall efficiency, while final decision-making requires fine-grained, sharp discrimination. By forcing a single model to do both, performance is compromised. To solve this, the authors propose the RADD framework, which separates these tasks into two specialized modules.
Decoupling Search and Precision
The RADD framework splits the workload into two distinct stages. First, a relation-aware multimodal retriever scans the entire entity set to identify a small, high-quality "shortlist" of candidates. This retriever is designed for speed and broad coverage. Once this shortlist is formed, a second module—a conditional discrete denoiser—takes over. This denoiser focuses exclusively on the shortlist, performing fine-grained analysis to select the correct entity. By ensuring that the retriever handles the "search" and the denoiser handles the "precision," the model avoids the bottlenecks found in traditional single-scorer systems.
How the Components Work
The retriever uses a "relation gate" to combine structural, visual, and textual data, allowing the model to emphasize the most relevant information for any given relationship. The denoiser, meanwhile, operates as a discrete diffusion model. Instead of working with continuous vectors that require complex conversion, it predicts clean entity identities directly as discrete indices. During training, the two modules are linked through a process called teacher-student distillation, where the retriever’s knowledge is transferred to the denoiser. This ensures that the denoiser begins its work with a probability distribution already focused on the most plausible candidates.
Performance and Results
The authors tested RADD against 27 different baselines, including traditional KGC models, multimodal approaches, and LLM-augmented systems. Across three standard benchmarks, RADD consistently achieved the best performance. The experiments also highlighted the importance of the "Diff-Rerank" inference mechanism, which enforces a strict rule: the denoiser can only rank entities that the retriever has already included in its shortlist. This ensures that the final output is always constrained by the high-recall search performed in the first stage.
Key Takeaways
The research demonstrates that the "dual-objective coupling" of search and precision is a major structural bottleneck in current MMKGC models. By treating the task as a retrieval-augmented discrete diffusion problem, the authors provide a principled way to allocate these objectives to the modules best suited for them. The study also notes that head and tail prediction in knowledge graphs are inherently asymmetric—head prediction is often more difficult—and the RADD framework addresses this by using an asymmetric loss function that provides extra training focus on the harder task.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!