Hierarchical Multi-Persona Induction from User Behavioral Logs: Learning Evidence-Grounded and Truthful Personas
This paper introduces a new framework designed to turn messy, fragmented user activity logs into clear, reliable, and human-readable "personas." While many systems currently use AI to summarize user behavior, these summaries often lack transparency or fail to accurately reflect the underlying data. The authors propose a hierarchical method that organizes raw actions into meaningful "intent memories," which are then grouped into distinct personas. By training models to prioritize specific quality standards, the researchers aim to create user profiles that are not only useful for predicting future behavior but are also grounded in actual evidence and free from hallucinations.
From Raw Logs to Structured Memories
The process begins by taking a user’s daily behavioral logs—such as search queries and clicks—and using an AI model to summarize them into "intent memories." These memories represent specific goals or interests. Instead of treating a user’s history as a single, jumbled stream, the framework clusters these memories into multiple personas. Each persona acts as a distinct profile of a user’s characteristic, and crucially, each one is linked to an explicit set of supporting evidence (the specific memories used to create it). This ensures that every part of a persona can be traced back to the user's actual actions.
Defining and Measuring Persona Quality
A central challenge in user modeling is determining what makes a "good" persona. The authors define persona quality through three core criteria:
Cohesion: The memories grouped into a single persona must be semantically consistent and distinct from other memories.
Alignment: The persona’s label and description must accurately reflect the shared traits of the assigned memories.
Truthfulness: The persona must be supported by the evidence and avoid overgeneralizing or inventing information that isn't present in the logs.
To enforce these standards, the researchers developed a reward-based training system. They use a technique called Direct Preference Optimization (DPO), which trains the model to prefer outputs that score highly on these quality metrics. This allows the model to learn how to generate personas that are both accurate and trustworthy.
Improving Performance and Reliability
The experimental results demonstrate that this quality-focused approach does more than just create better-looking profiles; it also improves the model's ability to predict future user interactions. By ensuring that personas are grounded in evidence, the system becomes more effective at understanding a user's long-term preferences. The researchers tested their method across different datasets, including search and shopping logs, and found that their approach consistently outperformed traditional clustering-based methods and general-purpose large language models. This suggests that explicitly optimizing for persona quality is a highly effective strategy for building more transparent and useful personalization systems.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!