Back to AI Research

AI Research

TailorMind: Towards Preference-Aligned Multimodal C... | AI Research

Key Takeaways

  • TailorMind: Towards Preference-Aligned Multimodal Content Generation addresses the challenge of creating personalized content for users without relying on ex...
  • Personalized content systems depend on available UGC and struggle when suitable content is absent, delayed, or costly to create.
  • Although multimodal generators can synthesize content on demand, how to translate behavioral traces into generation-ready preferences remains underexplored.
  • We study personalized multimodal content generation: creating user-tailored multimodal content without existing item pools or waiting for matching UGC.
  • We propose TailorMind, linking collaborative preference modeling with controllable multimodal generation.
Paper AbstractExpand

Personalized content systems depend on available UGC and struggle when suitable content is absent, delayed, or costly to create. Although multimodal generators can synthesize content on demand, how to translate behavioral traces into generation-ready preferences remains underexplored. We study personalized multimodal content generation: creating user-tailored multimodal content without existing item pools or waiting for matching UGC. We propose TailorMind, linking collaborative preference modeling with controllable multimodal generation. TailorMind enriches sparse user histories via hypergraph collaborative filtering and optimizes textual profiles with ranking-error feedback and textual gradient descent. Retrieval-augmented style control grounds outputs in authentic UGC patterns, while cross-modal cohesion reflection reduces semantic drift. We construct TailorBench, a benchmark from three mainstream platforms evaluated along five dimensions: coherence, novelty, aesthetic, hallucination, profiling. Experiments show that TailorMind achieves competitive or stronger coherence, improves novelty and aesthetic quality over representative generation baselines and ground-truth UGC, demonstrating advantages over retrieving available content or comparable UGC, while achieving up to 29% Recall gains in reranking. Our code is released at: this https URL .

TailorMind: Towards Preference-Aligned Multimodal Content Generation addresses the challenge of creating personalized content for users without relying on existing item pools or waiting for new user-generated content (UGC) to be published. While current generative models can create images or videos, they often struggle to translate a user's specific behavioral history into content that feels truly tailored to their unique tastes. This research introduces a framework that links collaborative preference modeling with controllable multimodal generation, ensuring that synthesized content is both high-quality and deeply aligned with individual user preferences.

Bridging Preferences and Generation

The core of TailorMind is its ability to turn sparse, noisy user interaction histories into actionable, natural-language profiles. It uses hypergraph collaborative filtering to enrich a user's history by identifying connections between their interests and broader community trends. Once an initial profile is created, the system uses a "textual gradient descent" process. By treating the user profile as a piece of text, the system iteratively refines it based on how well it predicts the user's actual interactions, effectively "training" the profile to be more accurate through feedback.

Ensuring Style and Cohesion

To prevent the generated content from drifting away from the user's intended style, TailorMind employs two key safeguards. First, it uses retrieval-augmented style control, which pulls examples from the user's own past interactions to ground the new content in authentic, familiar patterns. Second, it uses a cross-modal cohesion reflection mechanism. This acts as a quality check, monitoring the consistency between the generated text and visual elements (like images or videos) to ensure they remain semantically aligned and do not suffer from "semantic drift," where the output loses its original focus.

Benchmarking Performance

To evaluate these capabilities, the researchers developed TailorBench, a new benchmark derived from real-world data from platforms like Rednote, Bilibili, and Hupu. The framework is evaluated across five dimensions: coherence, novelty, aesthetic quality, hallucination (the absence of fabricated information), and profiling accuracy. Experimental results show that TailorMind outperforms representative generation baselines, achieving higher aesthetic quality and novelty while maintaining strong cross-modal coherence. Additionally, the system demonstrated significant gains in reranking performance, proving that it can successfully synthesize content that is more relevant to users than simply retrieving existing items.

Comments (0)

No comments yet

Be the first to share your thoughts!