GS-Quant: Granular Semantic and Generative Structur...

GS-Quant: Granular Semantic and Generative Structur... | AI Research

Key Takeaways

Large Language Models (LLMs) have shown great promise in Knowledge Graph Completion (KGC)—the task of inferring missing links in structured data—but they str...
Large Language Models (LLMs) have shown immense potential in Knowledge Graph Completion (KGC), yet bridging the modality gap between continuous graph embeddings and discrete LLM tokens remains a critical challenge.
In this paper, we propose GS-Quant, a novel framework that generates semantically coherent and structurally stratified discrete codes for KG entities.
Unlike prior methods, GS-Quant is grounded in the insight that entity representations should follow a linguistic coarse-to-fine logic.
We introduce a Granular Semantic Enhancement module that injects hierarchical knowledge into the codebook, ensuring that earlier codes capture global semantic categories while later codes refine specific attributes.

Paper AbstractExpand

Large Language Models (LLMs) have shown immense potential in Knowledge Graph Completion (KGC), yet bridging the modality gap between continuous graph embeddings and discrete LLM tokens remains a critical challenge. While recent quantization-based approaches attempt to align these modalities, they typically treat quantization as flat numerical compression, resulting in semantically entangled codes that fail to mirror the hierarchical nature of human reasoning. In this paper, we propose GS-Quant, a novel framework that generates semantically coherent and structurally stratified discrete codes for KG entities. Unlike prior methods, GS-Quant is grounded in the insight that entity representations should follow a linguistic coarse-to-fine logic. We introduce a Granular Semantic Enhancement module that injects hierarchical knowledge into the codebook, ensuring that earlier codes capture global semantic categories while later codes refine specific attributes. Furthermore, a Generative Structural Reconstruction module imposes causal dependencies on the code sequence, transforming independent discrete units into structured semantic descriptors. By expanding the LLM vocabulary with these learned codes, we enable the model to reason over graph structures isomorphically to natural language generation. Experimental results demonstrate that GS-Quant significantly outperforms existing text-based and embedding-based baselines. Our code is publicly available at this https URL .

Large Language Models (LLMs) have shown great promise in Knowledge Graph Completion (KGC)—the task of inferring missing links in structured data—but they struggle to process graph information effectively. Because LLMs operate on discrete text tokens while knowledge graphs consist of continuous, dense embeddings, there is a significant "modality gap" that hinders reasoning. GS-Quant is a new framework designed to bridge this gap by converting graph entities into structured, hierarchical discrete codes that the LLM can interpret as naturally as language.

Moving Beyond Flat Compression

Existing methods that attempt to turn graph embeddings into discrete codes often treat the process as simple numerical compression. This results in "entangled" codes that lack logical structure, making it difficult for an LLM to perform the hierarchical reasoning—moving from broad categories to specific details—that is central to human thought. GS-Quant addresses this by ensuring that the discrete codes generated for an entity follow a "coarse-to-fine" linguistic logic, where early codes represent global categories and later codes refine specific attributes.

How GS-Quant Works

The framework introduces two primary innovations to create these structured codes:

Granular Semantic Enhancement: This module uses hierarchical clustering to inject structural knowledge into the codebook. By forcing the quantization process to align with a hierarchy tree, the model ensures that the resulting codes are semantically organized rather than just mathematically compressed.
Generative Structural Reconstruction: Instead of treating codes as independent units, this module uses a small Transformer decoder to reconstruct the entity and its hierarchical ancestors from the code sequence. This forces the codes to form a coherent, "sentence-like" structure that captures complex contextual interactions.

Enhanced Reasoning Capabilities

By expanding the LLM’s vocabulary to include these learned codes, GS-Quant allows the model to reason over graph structures in a way that is isomorphic to natural language generation. This means the LLM can leverage its inherent generative strengths to navigate the graph, leading to more accurate link predictions. Experimental results show that this approach significantly outperforms both traditional embedding-based models and existing text-based LLM approaches, establishing a new standard for how LLMs can interact with structured knowledge.

Implementation and Integration

To implement this, the framework first encodes entities using both relational and textual data. After training the quantization modules, the learned codes are integrated into the LLM by freezing the model's original parameters and using Low-Rank Adaptation (LoRA) to fine-tune only the necessary components. This allows the model to incorporate domain-specific graph knowledge while preserving its general language capabilities.