COREKG: Coreset-Guided Personalized Summarization of Knowledge Graphs
Knowledge Graphs (KGs) are essential for modern applications like search engines and question-answering systems, but their massive size often makes them difficult to manage, slow to query, and cluttered to visualize. While many existing summarization methods attempt to shrink these graphs, they often focus on the global structure, failing to account for the specific needs of individual users. COREKG addresses this by introducing a personalized summarization framework that creates compact, user-specific graph summaries. By adapting coreset theory, the approach ensures that these small summaries remain highly accurate for a user’s unique query patterns, providing a balance between data footprint reduction and query performance.
How the Approach Works
The COREKG framework operates by identifying the specific interests of a user and building a summary tailored to them. First, it processes a user’s query history to extract "seed nodes"—entities that represent the user's primary interests. It then filters the vast knowledge graph to focus only on queries related to these seeds.
The core of the method relies on "sensitivity-based importance sampling." The system assigns a sensitivity score to every triple (a fact within the graph) based on how much it contributes to the user’s specific query workload. Triples that are highly relevant to a user’s frequent queries receive higher scores. These scores are then used to sample a small, weighted subset of the graph. Because these samples are weighted, the system can mathematically guarantee that the resulting summary approximates the original graph's behavior with a bounded error, ensuring that query results remain reliable despite the significant reduction in size.
Why Personalization Matters
Traditional summarization techniques often produce generalized results that may not contain the specific information a particular user needs. By contrast, COREKG constructs summaries independently for each user. Because the summary is built only from triples relevant to that user's query behavior, it requires significantly less storage and results in faster query runtimes. This personalized approach ensures that the "budget" of the summary—the amount of data kept—is spent on information that actually matters to the individual, rather than on irrelevant global data.
Performance and Results
The researchers evaluated COREKG against several state-of-the-art methods, including GLIMPSE, PPR, iSummary, PEGASUS, and APEX². Testing across large-scale datasets such as Freebase, WikiData, and DBpedia demonstrated that COREKG consistently achieves higher query-answering accuracy and better structural coverage than these existing methods. Even when using only a tiny fraction of the original graph, the framework maintains strong performance, proving that its coreset-based approach is both efficient and effective for handling large, complex knowledge graphs in a personalized way.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!