Back to AI Research

AI Research

Co-Director: Agentic Generative Video Storytelling | AI Research

Key Takeaways

  • Co-Director: Agentic Generative Video Storytelling is a new framework designed to solve the problem of creating coherent, high-quality video narratives using...
  • While diffusion models generate high-fidelity video clips, transforming them into coherent storytelling engines remains challenging.
  • Current agentic pipelines automate this via chained modules but suffer from semantic drift and cascading failures due to independent, handcrafted prompting.
  • We present Co-Director, a hierarchical multi-agent framework formalizing video storytelling as a global optimization problem.
  • This balances the exploration of novel narrative strategies with the exploitation of effective creative configurations.
Paper AbstractExpand

While diffusion models generate high-fidelity video clips, transforming them into coherent storytelling engines remains challenging. Current agentic pipelines automate this via chained modules but suffer from semantic drift and cascading failures due to independent, handcrafted prompting. We present Co-Director, a hierarchical multi-agent framework formalizing video storytelling as a global optimization problem. To ensure semantic coherence, we introduce hierarchical parameterization: a multi-armed bandit globally identifies promising creative directions, while a local multimodal self-refinement loop mitigates identity drift and ensures sequence-level consistency. This balances the exploration of novel narrative strategies with the exploitation of effective creative configurations. For evaluation, we introduce GenAD-Bench, a 400-scenario dataset of fictional products for personalized advertising. Experiments demonstrate that Co-Director significantly outperforms state-of-the-art baselines, offering a principled approach that seamlessly generalizes to broader cinematic narratives. Project Page: this https URL

Co-Director: Agentic Generative Video Storytelling is a new framework designed to solve the problem of creating coherent, high-quality video narratives using AI. While current generative video models can create impressive clips, they often struggle to maintain a consistent story or visual style over longer sequences. Co-Director addresses this by moving away from simple, linear automation toward a hierarchical, multi-agent system that treats storytelling as a global optimization problem, ensuring that every part of the video—from the script to the final edit—works together toward a unified creative vision.

A New Way to Direct AI

Traditional agentic pipelines often suffer from "cascading failures," where a small error early in the process—such as a slight change in a character's appearance—compounds until the final video feels disjointed. Co-Director replaces these rigid, "waterfall" chains with a more flexible, intelligent architecture. It uses an Orchestrator Agent to manage the entire process, supported by specialized sub-agents for pre-production (scripting and storyboarding), production (generating images, video, and audio), and post-production (assembling the final film). By coordinating these agents through a central creative plan, the system prevents the semantic drift that typically plagues automated video generation.

Balancing Exploration and Strategy

A key innovation in Co-Director is the use of a Multi-Armed Bandit (MAB) algorithm to steer the creative process. Instead of relying on fixed, handcrafted prompts, the system treats creative choices—such as the narrative style, the tone of the ad, and the visual aesthetic—as variables to be optimized. The MAB allows the system to balance "exploration" (trying new, unconventional narrative strategies) with "exploitation" (refining configurations that have proven successful). This allows the AI to learn which creative directions work best for a specific product or audience, effectively mimicking the decision-making process of a human director.

Rigorous Evaluation with GenAd-Bench

To measure how well these AI systems perform, the authors introduced GenAd-Bench, a new dataset containing 400 scenarios for fictional products. By using fictional brands, the researchers ensure that the AI cannot rely on memorized training data, forcing it to demonstrate genuine reasoning and creative flexibility. The benchmark evaluates videos based on four critical areas: how well they preserve the product's identity, how accurately they align with specific demographic targets, their overall marketing appeal, and the technical quality of the visuals.

Improving Consistency Through Feedback

Co-Director incorporates a self-refinement loop that acts as a quality control mechanism. Before a full video is produced, the system evaluates intermediate steps, such as the storyline and the initial keyframes. If a component falls below a certain quality threshold, the system provides specific, actionable feedback to the relevant sub-agent, allowing it to regenerate the content while keeping the rest of the project intact. This iterative process ensures that the final output is not just a collection of random clips, but a cohesive narrative that meets the user's original goals.

Comments (0)

No comments yet

Be the first to share your thoughts!