Back to AI Research

AI Research

COREKG: Coreset-Guided Personalized Summarization o... | AI Research

Key Takeaways

  • COREKG: Coreset-Guided Personalized Summarization of Knowledge Graphs Knowledge Graphs (KGs) are essential for modern applications like search engines and qu...
  • Knowledge Graphs (KGs) are extensively used across different domains and in several applications.
  • Often, these KGs are very large in size.
  • Such KGs become unwieldy for tasks such as question answering and visualization.
  • Summarization of KGs offers a viable alternative in such cases.
Paper AbstractExpand

Knowledge Graphs (KGs) are extensively used across different domains and in several applications. Often, these KGs are very large in size. Such KGs become unwieldy for tasks such as question answering and visualization. Summarization of KGs offers a viable alternative in such cases. Furthermore, personalized KG summarization is crucial in the current data-driven world as it captures the specific requirements of users based on their query patterns. Since it only maintains relevant information, the personalized summaries of KG are small, resulting in significantly smaller storage requirements and query runtime. In this work, we adapt the coreset theory to create personalized KG summaries. For a given dataset and a user-specific query workload, we present an approach that samples a relevant subset of triples using sensitivity-based importance sampling. We ensure that the subset approximates the characteristics of the full dataset with bounded approximation error. We define sensitivity scores that measure the importance of a triple with respect to a user's query workload, which are then used by our coreset construction algorithm. We explicitly focus on personalized knowledge graph summarization by constructing summaries independently for each user based on their query behaviour. Our evaluation on Freebase, WikiData, and DBpedia shows that COREKG delivers higher query-answering accuracy and structural coverage than the state-of-the-art methods, such as GLIMPSE, PPR, iSummary, PEGASUS and APEX$^2$ while requiring only a tiny fraction of the original graph.

COREKG: Coreset-Guided Personalized Summarization of Knowledge Graphs
Knowledge Graphs (KGs) are essential for modern applications like search engines and question-answering systems, but their massive size often makes them difficult to manage, slow to query, and cluttered to visualize. While many existing summarization methods attempt to shrink these graphs, they often focus on the global structure, failing to account for the specific needs of individual users. COREKG addresses this by introducing a personalized summarization framework that creates compact, user-specific graph summaries. By adapting coreset theory, the approach ensures that these small summaries remain highly accurate for a user’s unique query patterns, providing a balance between data footprint reduction and query performance.

How the Approach Works

The COREKG framework operates by identifying the specific interests of a user and building a summary tailored to them. First, it processes a user’s query history to extract "seed nodes"—entities that represent the user's primary interests. It then filters the vast knowledge graph to focus only on queries related to these seeds.
The core of the method relies on "sensitivity-based importance sampling." The system assigns a sensitivity score to every triple (a fact within the graph) based on how much it contributes to the user’s specific query workload. Triples that are highly relevant to a user’s frequent queries receive higher scores. These scores are then used to sample a small, weighted subset of the graph. Because these samples are weighted, the system can mathematically guarantee that the resulting summary approximates the original graph's behavior with a bounded error, ensuring that query results remain reliable despite the significant reduction in size.

Why Personalization Matters

Traditional summarization techniques often produce generalized results that may not contain the specific information a particular user needs. By contrast, COREKG constructs summaries independently for each user. Because the summary is built only from triples relevant to that user's query behavior, it requires significantly less storage and results in faster query runtimes. This personalized approach ensures that the "budget" of the summary—the amount of data kept—is spent on information that actually matters to the individual, rather than on irrelevant global data.

Performance and Results

The researchers evaluated COREKG against several state-of-the-art methods, including GLIMPSE, PPR, iSummary, PEGASUS, and APEX². Testing across large-scale datasets such as Freebase, WikiData, and DBpedia demonstrated that COREKG consistently achieves higher query-answering accuracy and better structural coverage than these existing methods. Even when using only a tiny fraction of the original graph, the framework maintains strong performance, proving that its coreset-based approach is both efficient and effective for handling large, complex knowledge graphs in a personalized way.

Comments (0)

No comments yet

Be the first to share your thoughts!