EPIG: Emotion-Based Prompting for Personalised Image Generation introduces a new way to control the emotional tone of images created by AI. While current text-to-image models are excellent at generating high-quality visuals, they often struggle to capture specific emotional nuances, leading to inconsistent results. This research provides a lightweight, training-free method to guide these models toward more emotionally accurate outputs by enriching user prompts with psychologically grounded descriptors before the image generation process begins.
How EPIG Works
Instead of relying on generic prompts or retraining the AI model, EPIG acts as a smart pre-processing layer. It uses a "role-aware" strategy that breaks a user's prompt into three distinct parts: the subject (who is experiencing the emotion), the stimulus (what is causing the emotion), and the context (the surrounding environment).
The system then uses the NRC Valence-Arousal-Dominance (VAD) lexicon—a psychological framework for measuring emotion—to select descriptive words that match the user’s desired emotional state. By calculating the mathematical distance between potential words and the target emotion, the system selects the most appropriate terms and assigns them to the correct part of the prompt. This ensures that an emotional descriptor, such as "joyful," is applied to the subject rather than accidentally changing the colors of the background.
Key Advantages
Because EPIG operates entirely at the prompt level, it does not require any modifications to the underlying image generation model. This makes it highly efficient and suitable for users with limited computing resources. It also provides a high level of transparency and reproducibility, as the process is rule-based rather than random. By separating the emotional roles within a scene, the method effectively prevents "semantic bleeding," where emotional traits intended for one part of an image unintentionally leak into others.
Performance and Results
The researchers tested EPIG against standard prompting methods, such as naive keyword insertion and LLM-based prompt expansion. The results showed that EPIG significantly improved the model's ability to control "arousal"—the intensity of the emotion—reducing the mean error by 14% compared to naive insertion and 12% compared to LLM-based expansion.
The method proved particularly effective when the prompt included a clear subject, such as a person or an animal, where the error reduction reached 17%. Furthermore, the study confirmed that these improvements in emotional control did not come at the cost of image quality or semantic consistency, as verified by standard metrics like CLIPScore.
Important Considerations
EPIG is designed to be a flexible tool for applications where emotional accuracy is critical, such as psychological research, therapeutic visualization, and personalized content creation. While it excels at controlling emotional dimensions like valence and arousal, it is intended to complement existing diffusion models rather than replace them. Because it relies on a fixed linguistic pipeline and lexicon-based mapping, its performance is consistent and predictable, making it a reliable choice for users who need to maintain specific affective tones across different generations.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!