How to Install and Use HunyuanVideo in ComfyUI

<p>Creating AI-generated videos has become more straightforward than ever, thanks to the integration of Tencent's HunyuanVideo model with ComfyUI. This comprehensive guide walks you through…

Open original source

Creating AI-generated videos has become more straightforward than ever, thanks to the integration of Tencent's HunyuanVideo model with ComfyUI. This comprehensive guide walks you through the installation process and provides insights into maximizing the use of this advanced tool for your video generation projects.

What is HunyuanVideo? The HunyuanVideo model, developed by Tencent, is a cutting-edge AI video generation model that produces high-quality, coherent videos from simple text prompts. Its standout features include: Unified Image & Video Generation: Employs a “dual-stream to single-stream” hybrid model design for consistent video generation.

MLLM Text Encoder: Uses a multimodal large language model (MLLM) for text encoding, enabling exceptional instruction-following capabilities and text alignment. 3D VAE: Compresses pixel-space videos into a compact latent space, reducing memory usage significantly. Prompt Rewrite: Offers Normal and Master modes for automatically rewriting prompts to align better with user intent.

Why Choose ComfyUI? ComfyUI is an open-source graphical interface for Stable Diffusion, providing a flexible, node-based workflow for precise control over the generation process. Integrating HunyuanVideo into ComfyUI enhances its efficiency and video generation quality. Basic Installation Guide for Text-to-Video in ComfyUI Update ComfyUI Ensure you have the latest version of ComfyUI installed.

If not, download it from the ComfyUI GitHub page . Download the HunyuanVideo Model Download the hunyuan_video_t2v_720p_bf16.safetensors file from Hugging Face ( link ) and place it in the ComfyUI/models/diffusion_models directory. Download the Text Encoders Download the following files and save them in the ComfyUI/models/text_encoders directory: clip_l.safetensors llava_llama3_fp8_scaled.safetensors Download the VAE Download the hunyuan_video_vae_bf16.safetensors file from Hugging Face ( link ) and place it in the ComfyUI/models/vae directory.

Loading Example Workflows To simplify your setup, download example workflows from ComfyUI Examples . These include text-to-video and image-to-video workflows to kickstart your projects. Low VRAM Optimization Methods The HunyuanVideo model requires significant VRAM for optimal performance, and its public version operates in BF16 precision by default.

For systems with limited VRAM (e.g., an RTX 4080 with 16GB VRAM), consider these solutions: Use the officially provided FP8 version optimized for low VRAM The official FP8 model is specifically optimized for low VRAM and can be used directly with the existing ComfyUI Workflow. Download link: Hugging Face FP8 Model Use the GGUF Format by City96 The GGUF format is better suited for low VRAM setups.

Testing the Q8_0.gguf on an RTX 4080 took only 6 minutes to generate a video. This requires the ComfyUI GGUF Workflow: City96 GGUF Model Download: Hugging Face GGUF Model City96 ComfyUI GGUF Plugin: GitHub Plugin ComfyUI GGUF Workflow by vmirnv: Civitai Workflow If you’re unsure about how to choose a GGUF model, you can refer to one of my previous articles, which might help clarify your questions.

Advanced Usage HunyuanVideoWrapper for ComfyUI The HunyuanVideoWrapper extension simplifies the integration of HunyuanVideo into ComfyUI’s node-based workflow, adding advanced functionalities like: Features of HunyuanVideoWrapper Text-to-Video (T2V): Generate videos directly from text prompts.

Image-to-Video (I2V): Start with an image and generate a video sequence based on it. Image Prompting-to-Video (IP2V): Use an image as part of the prompt to guide the video’s concept and style. Video-to-Video (V2V): Transform or stylize an input video. Wrapper link: https://github.com/kijai/ComfyUI-HunyuanVideoWrapper Conclusion By integrating Tencent’s HunyuanVideo model into ComfyUI, video generation becomes more accessible and powerful.