Back to AI Research

AI Research

EPIG: Emotion-Based Prompting for Personalised Imag... | AI Research

Key Takeaways

  • EPIG: Emotion-Based Prompting for Personalised Image Generation introduces a new way to control the emotional tone of images created by AI.
  • Text-to-image diffusion models have achieved impressive results in synthesizing high-quality images from natural language prompts.
  • However, commonly used prompting strategies remain relatively generic, limiting the model's ability to accurately express emotional intent and nuanced affective attributes.
  • This work proposes EPIG, a method that enhances emotional expressiveness at the prompt level prior to image generation.
  • The resulting emotion-aware prompts guide the generative process toward more emotionally coherent visual outputs, with particular effectiveness in controlling arousal.
Paper AbstractExpand

Text-to-image diffusion models have achieved impressive results in synthesizing high-quality images from natural language prompts. However, commonly used prompting strategies remain relatively generic, limiting the model's ability to accurately express emotional intent and nuanced affective attributes. This work proposes EPIG, a method that enhances emotional expressiveness at the prompt level prior to image generation. Grounded in psychologically informed emotion representations (valence-arousal) and leveraging structured, role-aware prompt enrichment, EPIG enriches emotion-related components of prompts without modifying or retraining the image generation backbone. The resulting emotion-aware prompts guide the generative process toward more emotionally coherent visual outputs, with particular effectiveness in controlling arousal. EPIG is lightweight, training-free, and well suited for resource-constrained and personalized image generation scenarios. Experimental results on a benchmark of 10 diverse prompts show that EPIG reduces mean arousal error compared to strong baselines, including naive insertion and LLM-based prompt expansion, with reductions of 14% and 12%, respectively. These improvements are statistically significant. EPIG also preserves valence alignment and semantic consistency, as measured by CLIPScore and supported by ablation studies. The effect is more pronounced on prompts containing explicit subjects such as humans, children, or animals, where the reduction reaches 17%, highlighting the subject-sensitive behavior of the proposed method.

EPIG: Emotion-Based Prompting for Personalised Image Generation introduces a new way to control the emotional tone of images created by AI. While current text-to-image models are excellent at generating high-quality visuals, they often struggle to capture specific emotional nuances, leading to inconsistent results. This research provides a lightweight, training-free method to guide these models toward more emotionally accurate outputs by enriching user prompts with psychologically grounded descriptors before the image generation process begins.

How EPIG Works

Instead of relying on generic prompts or retraining the AI model, EPIG acts as a smart pre-processing layer. It uses a "role-aware" strategy that breaks a user's prompt into three distinct parts: the subject (who is experiencing the emotion), the stimulus (what is causing the emotion), and the context (the surrounding environment).
The system then uses the NRC Valence-Arousal-Dominance (VAD) lexicon—a psychological framework for measuring emotion—to select descriptive words that match the user’s desired emotional state. By calculating the mathematical distance between potential words and the target emotion, the system selects the most appropriate terms and assigns them to the correct part of the prompt. This ensures that an emotional descriptor, such as "joyful," is applied to the subject rather than accidentally changing the colors of the background.

Key Advantages

Because EPIG operates entirely at the prompt level, it does not require any modifications to the underlying image generation model. This makes it highly efficient and suitable for users with limited computing resources. It also provides a high level of transparency and reproducibility, as the process is rule-based rather than random. By separating the emotional roles within a scene, the method effectively prevents "semantic bleeding," where emotional traits intended for one part of an image unintentionally leak into others.

Performance and Results

The researchers tested EPIG against standard prompting methods, such as naive keyword insertion and LLM-based prompt expansion. The results showed that EPIG significantly improved the model's ability to control "arousal"—the intensity of the emotion—reducing the mean error by 14% compared to naive insertion and 12% compared to LLM-based expansion.
The method proved particularly effective when the prompt included a clear subject, such as a person or an animal, where the error reduction reached 17%. Furthermore, the study confirmed that these improvements in emotional control did not come at the cost of image quality or semantic consistency, as verified by standard metrics like CLIPScore.

Important Considerations

EPIG is designed to be a flexible tool for applications where emotional accuracy is critical, such as psychological research, therapeutic visualization, and personalized content creation. While it excels at controlling emotional dimensions like valence and arousal, it is intended to complement existing diffusion models rather than replace them. Because it relies on a fixed linguistic pipeline and lexicon-based mapping, its performance is consistent and predictable, making it a reliable choice for users who need to maintain specific affective tones across different generations.

Comments (0)

No comments yet

Be the first to share your thoughts!